Modify

Ticket #137 (closed defect: fixed)

Opened 5 years ago

Last modified 40 hours ago

PROPOSED FIX:memory corruption and bad aliases

Reported by: cegner@… Owned by: xi
Priority: high Component: pyyaml
Severity: blocker Keywords: long, bad reference
Cc:

Description

Libyaml 0.1.2 fails to serialize python longs correctly. The pure python implementation produces correct output. This is a major issue for us since we make heavy use of yaml and the pure python implementation is too slow for our needs (not a criticism, just a statement of fact).

I've given this 'blocker' severity and high priority since long is a basic python type. If this is inappropriate, please let me know. When is the next scheduled release of libyaml?

Minimal test case:

>>> import yaml
>>> from yaml import Dumper
>>> from yaml import CDumper
>>> yaml.__version__
'3.08'
>>> # libyaml doesn't have __version__ support but is 0.1.2

>>> d = { 'hourEastern': 20L, 'hour_eastern': 20L }
>>> yaml.dump( d, Dumper = CDumper )
'{hourEastern: &20 !!python/long 20, hour_eastern: *id001}\n'
>>> yaml.dump( d, Dumper = Dumper )
"{hourEastern: &id001 !!python/long '20', hour_eastern: *id001}\n"

Attachments

Change History

comment:1 Changed 5 years ago by anonymous

Test case:

>>> yaml.load( yaml.dump( (20L, 20L), Dumper = yaml.CDumper ) )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/__init__.py", line 58, in load
    return loader.get_single_data()
  File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/constructor.py", line 42, in get_single_data
    node = self.get_single_node()
  File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
  File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
  File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
  File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 111, in compose_sequence_node
    node.value.append(self.compose_node(node, index))
  File "/home/y/share/alexandria/lib/python2.5/site-packages/yaml/composer.py", line 69, in compose_node
    % anchor.encode('utf-8'), event.start_mark)
yaml.composer.ComposerError: found undefined alias 'id001'
  in "<string>", line 1, column 39:
     ... on/tuple [&20 !!python/long 20, *id001]
                                         ^

comment:2 Changed 5 years ago by cegner@…

As far as I can tell, there's some sort of memory clobbering going on in _yaml.pyx. It looks like using the cython produced pyx_t_5 variable does it or a call to PyObject_repr. Code that should not be able to change the anchor variable in _serialize_node is. I don't know cython well at all and I'm not sure how to hook a debugger up to this.

comment:3 Changed 5 years ago by cegner@…

Fix underlining:

As far as I can tell, there's some sort of memory clobbering going on in _yaml.pyx. It looks like using the cython produced __pyx_t_5 variable does it or a call to PyObject_repr. Code that should not be able to change the anchor variable in _serialize_node is. I don't know cython well at all and I'm not sure how to hook a debugger up to this.

comment:4 Changed 5 years ago by cegner@…

Okie, I'm not an expert in embedded c or cython, but I think the problem is a dangling pointer in the line of _yaml.pyx:

    if anchor_object is not None:
        anchor = PyString_AS_STRING(PyUnicode_AsUTF8String(anchor_object))

Since the object created by !PyUnicode_AsUTF8String goes away (no reference is maintained, though I don't have a good handle on garbage collecting...), the pointer returned by !PyString_AS_STRING to the ephemeral object's internal buffer is dangling upon garbage collection. It seems that the rest of the code uses the idiom:

    anchor_object = PyUnicode_AsUTF8String(anchor_object)
    anchor = PyString_AS_STRING( anchor_object )

which seems to fix the problem.

comment:5 Changed 5 years ago by cegner@…

  • Summary changed from libyaml serializes longs incorrectly to PROPOSED FIX:memory corruption and bad aliases

comment:6 Changed 5 years ago by xi

  • Status changed from new to closed
  • Resolution set to fixed

Thank you for the report and the analysis. The bug is fixed in [350].

comment:7 Changed 5 years ago by xi

  • Component changed from libyaml to pyyaml

comment:8 Changed 40 hours ago by maskodok <galihadiputro87@…>

The only thing more I could hope for is documentation of all these features (other than reading through the code).  Cipto Junaedy Is this in process? Can I help? About  Unit Link Terbaik di Indonesia Commonwealth Life Investra Link

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.