Ticket #296 (new defect)

Opened 6 months ago

Unicode C1 (U+0080 - U+009F) set should roundtrip correctly

Reported by: anonymous Owned by: xi
Priority: normal Component: libyaml
Severity: normal Keywords: unicode, utf-8


Codepoints from U+0080 to U+009F are dumped as {'\', 'x', '8', '0'}, when they should be dumped as {'\', 'u', '0', '0', '8', '0'}. Since (0x80..0x9F) > 0x7F for all members (ie. the high bit is set), the range has a special meaning in UTF-8, and can not be dumped as single-byte characters.

My proposed fix changes the behaviour for the C1 set (0x80-0x9F) to dump as a Unicode codepoint literal (\u), while preserving the behaviour for the C0 set (0x00-0x20), which is to dump as a single-byte literal (\x).

Fix here:


patch.diff Download (679 bytes) - added by burke.libbey@… 6 months ago.

Change History

Changed 6 months ago by burke.libbey@…


Add a comment

Modify Ticket

Change Properties
<Author field>
as new
as The resolution will be set. Next status will be 'closed'
to The owner will be changed from xi. Next status will be 'new'
The owner will be changed from xi to anonymous. Next status will be 'assigned'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.