Ticket #137 (closed defect: fixed)
PROPOSED FIX:memory corruption and bad aliases
| Reported by: | cegner@… | Owned by: | xi |
|---|---|---|---|
| Priority: | high | Component: | pyyaml |
| Severity: | blocker | Keywords: | long, bad reference |
| Cc: |
Description
Libyaml 0.1.2 fails to serialize python longs correctly. The pure python implementation produces correct output. This is a major issue for us since we make heavy use of yaml and the pure python implementation is too slow for our needs (not a criticism, just a statement of fact).
I've given this 'blocker' severity and high priority since long is a basic python type. If this is inappropriate, please let me know. When is the next scheduled release of libyaml?
Minimal test case:
>>> import yaml
>>> from yaml import Dumper
>>> from yaml import CDumper
>>> yaml.__version__
'3.08'
>>> # libyaml doesn't have __version__ support but is 0.1.2
>>> d = { 'hourEastern': 20L, 'hour_eastern': 20L }
>>> yaml.dump( d, Dumper = CDumper )
'{hourEastern: &20 !!python/long 20, hour_eastern: *id001}\n'
>>> yaml.dump( d, Dumper = Dumper )
"{hourEastern: &id001 !!python/long '20', hour_eastern: *id001}\n"
Attachments
Change History
comment:2 Changed 4 years ago by cegner@…
As far as I can tell, there's some sort of memory clobbering going on in _yaml.pyx. It looks like using the cython produced pyx_t_5 variable does it or a call to PyObject_repr. Code that should not be able to change the anchor variable in _serialize_node is. I don't know cython well at all and I'm not sure how to hook a debugger up to this.
comment:3 Changed 4 years ago by cegner@…
Fix underlining:
As far as I can tell, there's some sort of memory clobbering going on in _yaml.pyx. It looks like using the cython produced __pyx_t_5 variable does it or a call to PyObject_repr. Code that should not be able to change the anchor variable in _serialize_node is. I don't know cython well at all and I'm not sure how to hook a debugger up to this.
comment:4 Changed 4 years ago by cegner@…
Okie, I'm not an expert in embedded c or cython, but I think the problem is a dangling pointer in the line of _yaml.pyx:
if anchor_object is not None:
anchor = PyString_AS_STRING(PyUnicode_AsUTF8String(anchor_object))
Since the object created by !PyUnicode_AsUTF8String goes away (no reference is maintained, though I don't have a good handle on garbage collecting...), the pointer returned by !PyString_AS_STRING to the ephemeral object's internal buffer is dangling upon garbage collection. It seems that the rest of the code uses the idiom:
anchor_object = PyUnicode_AsUTF8String(anchor_object)
anchor = PyString_AS_STRING( anchor_object )
which seems to fix the problem.
comment:5 Changed 4 years ago by cegner@…
- Summary changed from libyaml serializes longs incorrectly to PROPOSED FIX:memory corruption and bad aliases

Test case:
>>> yaml.load( yaml.dump( (20L, 20L), Dumper = yaml.CDumper ) ) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/__init__.py", line 58, in load return loader.get_single_data() File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/constructor.py", line 42, in get_single_data node = self.get_single_node() File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 36, in get_single_node document = self.compose_document() File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 55, in compose_document node = self.compose_node(None, None) File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 82, in compose_node node = self.compose_sequence_node(anchor) File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 111, in compose_sequence_node node.value.append(self.compose_node(node, index)) File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 69, in compose_node % anchor.encode('utf-8'), event.start_mark) yaml.composer.ComposerError: found undefined alias 'id001' in "<string>", line 1, column 39: ... on/tuple [&20 !!python/long 20, *id001] ^