Ticket #11 (closed defect: fixed)
|Reported by:||edemaine@…||Owned by:||xi|
I would like to bring up two issues with Unicode support in PyYAML's emitter. First, it emits a type annotation of !!python/unicode whenever emitting a unicode string that can be encoded in ASCII:
>>> print yaml.dump(u'Fran\xe7ais') "Fran\xE7ais" >>> print yaml.dump(u'hello') !!python/unicode 'hello'
I assume this is to force the value to be a unicode string when read back in. However, it makes for rather ugly files. In my case, and I imagine many others, I really don't care whether a string is stored as a 'str' or as a 'unicode' object in Python. And in YAML, the native string type is Unicode anyway. So it seems strange to have this distinction at the level of the YAML file. On the other hand, I understand the desire to have yaml.load(yaml.dump(x)) == x. Perhaps this should be another configuration option? (Of course, I could just convert my ASCII-encodable unicode objects to str objects...)
The second issue is that the emitter escapes non-ASCII characters even when all characters are printable (according to 'c-printable' in the YAML spec) when using an encoding (UTF8) that supports such characters. I don't find this as elegant as could be. Instead of the "Fran\xE7ais" output above, I would have hoped for the UTF8-encoded byte string Fran\xc3\xa7ais\n.
I guess this is as stylistic an issue as the previous one. It makes me wonder again whether there should be a Style object that can specify various emitting options, instead of many keyword arguments...
- Status changed from closed to reopened
- Resolution fixed deleted