Modify

Ticket #296 (new defect)

Opened 6 months ago

Unicode C1 (U+0080 - U+009F) set should roundtrip correctly

Reported by: anonymous Owned by: xi
Priority: normal Component: libyaml
Severity: normal Keywords: unicode, utf-8
Cc:

Description

Codepoints from U+0080 to U+009F are dumped as {'\', 'x', '8', '0'}, when they should be dumped as {'\', 'u', '0', '0', '8', '0'}. Since (0x80..0x9F) > 0x7F for all members (ie. the high bit is set), the range has a special meaning in UTF-8, and can not be dumped as single-byte characters.

My proposed fix changes the behaviour for the C1 set (0x80-0x9F) to dump as a Unicode codepoint literal (\u), while preserving the behaviour for the C0 set (0x00-0x20), which is to dump as a single-byte literal (\x).

Fix here:  https://gist.github.com/burke/7220361

Attachments

patch.diff Download (679 bytes) - added by burke.libbey@… 6 months ago.

Change History

Changed 6 months ago by burke.libbey@…

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as new
as The resolution will be set. Next status will be 'closed'
to The owner will be changed from xi. Next status will be 'new'
The owner will be changed from xi to anonymous. Next status will be 'assigned'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.