Modify

Ticket #5 (closed enhancement: fixed)

Opened 8 years ago

Last modified 8 years ago

PYYAML3000 does not load recursive structures.

Reported by: pkmurphy@… Owned by: xi
Priority: normal Component: pyyaml
Severity: normal Keywords:
Cc:

Description

PYYAML does not load any structure with an anchor that contains its own alias. For example:

ourconst2 = "---\n!!seq &base [ *base ] \n...\n"; print ourconst2; for event in yaml.parse(ourconst2): print event ourload = yaml.safe_load(ourconst2) print ourload;

Attachments

composer.py Download (7.9 KB) - added by anonymous 8 years ago.
Patch on file for problem
constructor.py Download (16.3 KB) - added by pkmurphy@… 8 years ago.
Patch on file for problem
representer.py Download (19.0 KB) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
Possible patch
testrecu.py Download (1.2 KB) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
Small test file
constructor.2.py Download (23.9 KB) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
This is my modified version of constructor.py - second version.
constructor.3.py Download (28.5 KB) - added by Peter Murphy (pkmurphy at postmaster.co.uk) 8 years ago.
Allowing recursive tuples
representer.2.py Download (19.4 KB) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
Allowing class instances referncing itself to be dumped.
testrecutuple.py Download (4.2 KB) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
A small test file for recursive tuples
testrecuset.py Download (551 bytes) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
A test file for recursive sets
constructor.4.py Download (29.8 KB) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
The final version of constructor.py? (Solves sets.)
representer.3.py Download (19.4 KB) - added by Peter Murphy (pkmurphy at postmaster dot co dot uk) 8 years ago.
The final version of representer.py? (Solves sets.)

Change History

Changed 8 years ago by anonymous

Patch on file for problem

Changed 8 years ago by pkmurphy@…

Patch on file for problem

comment:1 Changed 8 years ago by xi

  • Status changed from new to assigned

Thanks for the patch.

It looks good, but it has a few drawbacks. For instance, it will not work for sets and tuples. Try to load the following documents:

--- &A !!set { foo: *A }
--- &A !!python/tuple [ [*A] ]

Support for recursive objects is also needed in representer.py and serializer.py.

I will apply the composer.py part of the patch. constructor.py needs to be reworked.

comment:2 Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

  • Type set to defect

I've changed representer.py to allow recursive links. I've attached that code. I've also added a little test file to see what the output YAML looks. Is there any need to modify serializer at all? It seems to work fine.

Is is actually possible to have recursive links in Python for sets and tuples?

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

Possible patch

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

Small test file

comment:3 Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

I spend a little bit of time investigating this proble, so I can answer my own question. It is quite easy to have recursive links in tuples. On the other hand, it is extremely difficult (if not impossible) to have recursive links in sets. Python sets do not like unhashable members. For example:

---

RecSet? = set(); OurTuple? = tuple([RecSet?]); RecSet?.add(OurTuple?);

Output:

Traceback (most recent call last):

File "testrecu.py", line 109, in ?

RecSet?.add(OurTuple?);

TypeError?: set objects are unhashable

---

This is a feature of the Python language, and it appears that we can't do anything about it. Therefore, no bug fix is possible. I'm being pessimistic here, and I could be wrong.

Tuples can be made recursive if they contain another element (not a tuple!) that contains itself. Tuples are immutable, so it's not possible to modify them in place (I think!) You use the assignment operator to create new tuples... but you lose recursive inclusion when you do it. For example:

---

print "Example 1" T1 = tuple(); print T1, id(T1), len(T1); T1 = tuple([T1]); print T1, id(T1), len(T1);

Output:

Example 1 () 10948656 0 ((),) 13072112 1

---

And this code seems to not work either, when tuple T1 contains tuple T2 contains tuple T1.

--- print "Example 2" T1 = tuple(); print T1, id(T1), len(T1); T2 = tuple([T1]); print T2, id(T2), len(T2); T1 = tuple([T2]); print T1, id(T1), len(T1);

Output:

Example 2 () 10948656 0 ((),) 13075920 1 (((),),) 13076656 1

---

On the other hand, a trivial case of making a recursive tuple is:

--- OurList? = []; print OurList?; OurTuple? = tuple([OurList?]); print OurTuple?; OurList?.append(OurTuple?); print OurList?; print OurTuple?; print id(OurList?); print id(OurTuple?[0]); print id(OurTuple?); print id(OurList?[0]); ---

Output:

[] ([],) [([...],)] ([([...],)],) 13010384 13010384 13000656 13000656


That's Python for you. I rewrote constructor.py on my machine, in the same way as my original patch and tried this following bit of YAML:

--- Q ="--- &A !!python/tuple [ [*A] ]" R = yaml.load(Q); print R; print type(R); print type(R[0]); print type(R[0][0]);

Output:

([...?],) <type 'tuple'> <type 'list'> <type 'list'>

---

That's not what we want. What happened is that the following bit of code:

def construct_python_tuple(self, node):

return tuple(self.construct_yaml_seq(node))

Created a new tuple, rather than assigning the old tuple to the structure. As for this:

--- Q = "--- &A !!set { *A }" R = yaml.load(Q); print R;

Output:

File "testrecu.py", line 127, in ?

R = yaml.load(Q);

File "D:\Python24\Lib\site-packages\yaml\init.py", line 61, in load

return loader.get_data()

File "D:\Python24\lib\site-packages\yaml\constructor.py", line 41, in get_data

return self.construct_document(self.get_node())

File "D:\Python24\lib\site-packages\yaml\constructor.py", line 49, in construct_document

data = self.construct_object(node)

File "D:\Python24\lib\site-packages\yaml\constructor.py", line 86, in construct_object

data = constructor(node) #Modify BBB1

File "D:\Python24\lib\site-packages\yaml\constructor.py", line 63, in <lambda>

constructor = lambda node: self.yaml_constructors[node.tag](self, node) #Modify BBB2

File "D:\Python24\lib\site-packages\yaml\constructor.py", line 356, in construct_yaml_set

value = self.construct_mapping(node)

File "D:\Python24\lib\site-packages\yaml\constructor.py", line 160, in construct_mapping

"found unacceptable key (%s)" % exc, key_node.start_mark)

yaml.constructor.ConstructorError?: while constructing a mapping found unacceptable key (dict objects are unhashable)

in "<string>", line 1, column 5:

--- &A !!set { *A }

---

No unhashable (i.e, mutable) contents provided. Finally, for this:

--- Q = "--- &A !!set { foo: *A }" R = yaml.load(Q); print R;

Output: set(foo?)

---

Well, that degraded nicely.

What are my recommendations? You can choose to do these or not, but this is what I think should happen.

(a) Accept that recursive sets are pretty unlikely, if not impossible, in Python. So this should not stand in the way of resolving Bug Fix 5. It's a little bit like having sequences or maps as keys in other YAML maps. YAML will allow you to do this, but Python won't. So document this.

(b) Accept constructor.py as a patch. I'll send it again. I modified it in the same way as I did 2 months ago, but you may have made other modifications. My modifications are on the latest CVS source. Don't use the old version of constructor.py - use the new version.

(c) Close this ticket. Allowing PyYAML to load recursive dictionaries and sequences would be a useful feature anyway.

(d) Open a new ticket: "PYYAML3000 does not load recursive tuples." We still admit there's a problem, but now we clearly delimit what it is.

The problem of having recursive tuples seems a few degrees harder that recursive maps or dictionaries. You can create them from sequences, but only after all of its contents have been created. If any tuples contain THEM, then you must do the sequence->tuple conversion on their parents afterwards. I've been thinking about the algorithm to do it.

(a) Create all tuples as sequences first. Hash each of these by their node id. (b) Once all of these sequences are created, convert them to tuples. Make sure all references are updated. But you must start from the leaves first, not the root.

It might be an ugly algorithm. Oh well...

Having recursive tuples is a pretty unPython thing to do, I think. They are generally used for small structures such as (say) timestamps, and the advantage of them is that they can easily be hashed. That's another reason why this should have a ticket of its own.

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

This is my modified version of constructor.py - second version.

comment:4 Changed 8 years ago by xi

  • Type changed from defect to enhancement

Thanks for the patch.

  1. Recursive sets are certainly possible, try this:
    >>> class C: pass
    ...
    >>> c = C()
    >>> s = set([c])
    >>> c.s = s
    >>> s
    set([<__main__.C instance at 0xb7d4f2ac>])
    
  1. Pickle supports recursive tuples, so PyYAML should support them as well in order to claim the complete pickle compatibility. You may check the pickle code to determine how they construct recursice tuples -- it's not extremely hard.
  1. Although recursive tuples are uncommon, consider recursive objects that redefine the __reduce__ or __getinitargs__ methods. They must be constructed in the same way as recursive tuples. The idea is to make an algorithm that support constructing any recursive objects that must be created in a single step if it's ever possible.
  1. Supporting recursive structures may require API changes, that's why I want to introduce the complete support at once -- in order to minimize API breaks.

comment:5 Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

Supporting recursive structures may require API changes, that's why I want to introduce the complete support at once -- in order to minimize API breaks.

That is the most important thing, I suppose.

I'll try to get cracking with tuples in the next day or so.

comment:6 Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

Kirill,

I think I have solved the problem with loading recursive tuples. A modification of constructor.py is attached. So is a test file: testrecutuple.py . Please see if it works on Linux. Also, if you feel like it, run a few more tests.

I have not yet modified constructor.py to allow recursive sets, I'm afraid. The algorithm will be a little bit more complex than the tuple case. There are a few questions I'd like to ask:

(A) How should sets handle merge keys? Should I mark it as an error? (B) How should sets handle "value" keys? Should I mark it as an error likewise?

I've had to do some further modications to representer.py . In the existing system, the following bit of code causes an error while dumping:

===============

class TestRecurseClass?:

def init(self):

self.a = "a"; self.b = self; self.c = "b";

D = TestRecurseClass?(); print D; E = yaml.dump(D); print E; F = yaml.load(E); print F;

============

(The load works fine.)

What happens is that represent_data calls (via yaml_multi_representers) represent_instance, where "data" is the class instance. Then represent_instance gives the .dict attribute to represent_mapping. The alias id is taken from the id of the dictionary, not the id of the class instance. So when represent_data is called again with the same class instance (as happens in a recursive situation), the alias is not found in represented_objects. So we raise RepresenterError?("recursive objects are not allowed...").

In other words, the alias for the object does not correspond to the object itself, but some derived structure.

The solution I gave was to add an extra parameter (extdata) to both represent_mapping and represent_sequence. This tells the function to take the alias id from extdata - not the respective mapping or sequence parameter present. This seems to have permitted recursive class instances.

You may want to look at the code quite closely. I've made modifications to represent_yaml_object, represent_object, and represent_instance (with #PKM added), and added the extra parameter there. It hasn't broken existing code on Windows - I ran the yaml module test suite with no problems. But I don't fully understand the code.

Best regards.

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster.co.uk)

Allowing recursive tuples

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

Allowing class instances referncing itself to be dumped.

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

A small test file for recursive tuples

comment:7 Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

This may have solved the problem for sets as well. I've rewritten representer.py (AGAIN), and added some modifications to constructor.py. Both these files are included. I've also added a test file for recursive sets.

See what you think - and please test the hell out of these modifications.

If you accept these patches, can we close this ticket?

Best regards. Peter

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

A test file for recursive sets

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

The final version of constructor.py? (Solves sets.)

Changed 8 years ago by Peter Murphy (pkmurphy at postmaster dot co dot uk)

The final version of representer.py? (Solves sets.)

comment:8 Changed 8 years ago by xi

  • Status changed from assigned to closed
  • Resolution set to fixed

Thanks for the patches, Peter. Fixed in [222].

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.