Modify

Ticket #29 (reopened enhancement)

Opened 8 years ago

Last modified 2 years ago

Keeping mapping keys ordered

Reported by: edemaine@… Owned by: xi
Priority: normal Component: pyyaml
Severity: normal Keywords:
Cc: xi@…, edemaine@…, strombrg@…

Description

Would you be interested in adding the following kind of functionality to the public distribution of PyYAML?

>>> import yaml
>>> d = yaml.load('z: 1\ny: 2\nx: 3\n', Loader=yaml.order.OrderedLoader)
>>> d
yaml.order.odict([('z', 1), ('y', 2), ('z', 3)])
>>> for key, value in d.iteritems (): print key, value
z 1
y 2
x 3
>>> print yaml.dump(d, Dumper=yaml.order.OrderedDumper, default_flow_style=False),
z: 1
y: 2
x: 3
>>> s = yaml.dump(d, default_flow_style=False)
>>> print s,
!!omap
z: 1
y: 2
x: 3
>>> yaml.load(y)
yaml.order.odict([('z', 1), ('y', 2), ('z', 3)])

There are two things going on here:

  1. Add real !!omap functionality. When loading an !!omap object, create an odict object (defined by a new class that maintains a dictionary along with a key order), instead of the current behavior of creating a regular Python dictionary. Conversely, when dumping such an object, preserve the key order (don't sort), and output an !!omap directive. Both of these features seem quite desirable from a YAML standard point of view.

Perhaps, more generally, dumping could check for a special 'keys_in_order' attribute, in which case it follows the order of keys(), instead of sorting the keys as in the recent patch.

  1. Add special yaml.order.OrderedLoader, which loads regular !!map values as if they were !!omap values, and yaml.order.OrderedDumper, which dumps odict types as regular !!map values (to avoid the ugly !!omap specifier).

Personally I would find this functionality very useful in many projects. It would enable a computer program edit a human-written YAML file, without messing up all the key orders, so that the computer output looks pretty similar to what the human had just before. I understand that YAML does not guarantee preservation of key order in a map type, or more precisely, it does not give it any significance to the order in absense of an !!omap or !!pairs type specification. But this is a practically useful feature in some cases, so it seems natural to provide it as an optional functionality in yaml.order.

I'd be happy to write the code for all of this, because I need it myself. My question is whether you'd consider including it in the PyYAML distribution.

Attachments

Change History

comment:1 Changed 7 years ago by m.mueller at ibgw-leipzig dot de

  • Cc xi@… added

I find YAML a very interesting format for input files of simulation models that typically need lots of data. I am just writing an input program in Python that reads YAML files. It was easy to implement some simple !File directive that reads data from other files either in YAML or in other formats. There are two reasons for this:

  1. Since the files can be rather large, it is very nice to be able distribute input over as many files as desired. This can also be nested, i.e. a YAML file that is included that way can include another file etc. Only YAML files can include other files. Files in other formats can not, they are dead ends.
  1. For data that fit very nicely in tabular representation, other format such as delimited text files (either space or semicolons), spreadsheet formats (EXCEL etc.), dBase file or other databases are often a good choice. They will be imported automatically into the Python data structures just like YAML itself.

I want the user to edit files as well as to read and write those files with my programs in between edits. Therefore, I would like mappings to be ordered. Otherwise the file would possibly be totally rearranged after processing, which is clearly something the user who edits the file would really dislike.

Furthermore, I plan to write a data-driven GUI for input files. It should find (nearly) all information for its appearance in the YAML file. One natural representation would be a tree view. Since the tree has an order I would need an order also for dictionaries.

To cut a long story short, I am really interested in an order mapping for YAML. I think both ways (1) doing it explicitly in YAML with !!odict and (2) doing it in source code with yaml.order.OrderedLoader? should be supported.

I’m willing to contributed to the implementation but would certainly need some help to look at the right places from the beginning and to do it properly, i.e. in agreement with the overall design of PyYAML.

comment:2 Changed 7 years ago by cems at lanl dot gov

An ordered dict may be the wrong tool for the problem. The basic problem is not just a dict issue but keeping all the yaml entries in the same order they were in the original file at dump time. I can see how this could be addressed by an ordered dict however this may be overkill.

An ordered dict is often implemented in a slower and more costly manner than a conventional dict. In this case the desired behavior is most cases I believe will be simply to have the YAML object remember the order of the items in the original file.

The distinction being made here if it is not clear is that one can use a conventional dict for storing and accessing the data quickly in python usage. Most of the time we will not care the order that keys are iterated or how they are stored or the voltility of the key order if the hash is resized. The only time we actually care, I believe, is when one wants to re-write the yaml file. Then one either wants to recreate an original order or specify a particular order.

Thus one can simply use a conventional dict, but also have some axillirary storage to specify the ordering of the keys. This can be consulted when needed and ignored for speed when not needed. Moreover, the user can even edit the order, rather than have it soley determined by the input order.

comment:3 Changed 6 years ago by mundt@…

Just wanted to show interest on this topic, especially on reading the YAML files without messing with the keys. I use YAML to configure a validation process and it would be really nice to be able to have a fixed order. At the moment I do have a extra array just for the order. And thats just double code... who wants that!? :)

However... Since nothing has changed for almost 5 month I just want to check if this topic is still in progress and if there are any news.

thanks for the module anyways.

comment:4 Changed 6 years ago by xi

  • Status changed from new to closed
  • Resolution set to wontfix

I won't add this kind of functionality to the PyYAML core for two reasons:

  • It breaks the YAML specs. The spec clearly indicates that the key order is a representation detail and should not be used for constructing native objects.
  • I don't like the idea of PyYAML defining custom types for generated objects. I'd prefer PyYAML to generate only objects of the types defined in the standard library.

The implementation is nearly trivial though, so anyone who wants this feature regardless what the specs says could implement it by themselves:

>>> import yaml
>>> def omap_constructor(loader, node):
...     return loader.construct_pairs(node)
... 
>>> yaml.add_constructor(u'!omap', omap_constructor)
>>> yaml.load('!omap { C: 1, B: 2, A: 1 }')
[('C', 1), ('B', 2), ('A', 1)]

comment:5 follow-up: ↓ 7 Changed 4 years ago by strombrg@…

  • Cc xi@…, edemaine@…, strombrg@… added; xi@… removed
  • Status changed from closed to reopened
  • Resolution wontfix deleted

I think allowing order to be kept would be very useful - I have a project that needs it, in fact.

I don't agree that it's only when you're writing things back out that order matters - what if you need to store a chronological sequence of events? Or what if you need things to be sorted after each small modification to the resulting dictionary? A repeated sort of keys can be pretty expensive, despite CPython's rather stellar sort algorithm.

I believe it would make the most sense to pass an optional dictionary-like-object constructor to yaml.load - it would default to a traditional dictionary, but could also be an OrderedDict? ( http://docs.python.org/dev/library/collections.html#collections.OrderedDict) or treap ( http://stromberg.dnsalias.org/~strombrg/treap/) or whatever.

As long as the documentation explains clearly that this is a violation of the spec, then people know what to expect when switching implementations. Consider the DB-API - there are plenty of extensions to it added by the various implementations of the API, and there's actually a list of common extensions in the documentation in an effort to unify them a bit.

comment:6 Changed 4 years ago by anonymous

On the growing popularity of ordered dictionaries across many languages:  http://www.mail-archive.com/python-list@python.org/msg58631.html

There is one included with CPython 2.7 and up, and there exists one that's pretty compatible available for 2.4 through 2.6:  http://pypi.python.org/pypi/ordereddict/1.1.

And I can't help but feel that someone might want to pass in a treap someday.

comment:7 in reply to: ↑ 5 Changed 4 years ago by anonymous

Replying to strombrg@gmail.com:

I believe it would make the most sense to pass an optional dictionary-like-object constructor to yaml.load - it would default to a traditional dictionary, but could also be an OrderedDict? ( http://docs.python.org/dev/library/collections.html#collections.OrderedDict) or treap ( http://stromberg.dnsalias.org/~strombrg/treap/) or whatever.

+1 for this. I think that adding the possibility to change the default behavior for dictionaries would be a very welcome feature.

comment:8 Changed 3 years ago by Aphex

I would greatly appreciate a way to do this. I love the simplicity of the yaml syntax and would like to integrate it into my control scheme program, however the control sequences must be ordered by definition. Does anyone have a modified version of pyyaml that reads and dumps mappings as OrderedDicts??

comment:9 Changed 3 years ago by anonymous

I, too, want to express strong interest in this feature. Contrast to other experiences, in nearly all cases I used YAML it was crucial to preserve the ordering as specified in the YAML text. I always had to resort to !!omap, but an !!odict would definitely be a more elegant alternative.

It would also ensure for equivalent behavior as in PHP, where dictionaries are always ordered, and provide for language-independent processing. (With the implementation as it is, processing of the same YAML file yields different ordering and hence different results in different languages, which kind of sucks.)

Regarding speed considerations: If implemented as suggested in this ticket, speed won't be an issue as using odict and/or the special OrderedLoader? would be entirely optional.

Sincerely,
Zoran of  hackadelic.com

comment:10 Changed 3 years ago by anonymous

Kirill would you be willing to add an OrderedDictionary? wiki page that collects recipes for how to do this? There are several alternatives of varying complexity that might be useful to the people (myself included) who have a real use case for ordered dictionaries in YAML.

Here are the links I've found for this. Please amend as you see fit:

(Of all of these the last comes closest to what I need, but there being different use cases, I think listing multiple working alternatives would be most helpful.)

comment:11 Changed 2 years ago by az@…

ordering is very important when a language/text should be human-used. What comes in should come out, or else people think it's wrong. But it took 10 years for python to eventualy get there (and u still cannot keep the order of { 1:a, 2:b ). btw just an ordered dict is not enough, because duplicate keys would be lost in it (last wins). So one should be able to decide what a map should be.

so here's my implementation, both in and out, separate from pyyaml. It could be much easier if the data={} in related 2 funcs were using self.dict() instead of {}...

#yaml_anydict.py
import yaml
from yaml.representer import Representer
from yaml.constructor import Constructor, MappingNode, ConstructorError

def dump_anydict_as_map( anydict):
    yaml.add_representer( anydict, _represent_dictorder)
def _represent_dictorder( self, data):
    return self.represent_mapping('tag:yaml.org,2002:map', data.items() )

class Loader_map_as_anydict( object):
    'inherit + Loader'
    anydict = None      #override
    @classmethod        #and call this
    def load_map_as_anydict( klas):
        yaml.add_constructor( 'tag:yaml.org,2002:map', klas.construct_yaml_map)

    'copied from constructor.BaseConstructor, replacing {} with self.anydict()'
    def construct_mapping(self, node, deep=False):
        if not isinstance(node, MappingNode):
            raise ConstructorError(None, None,
                    "expected a mapping node, but found %s" % node.id,
                    node.start_mark)
        mapping = self.anydict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError as exc:
                raise ConstructorError("while constructing a mapping", node.start_mark,
                        "found unacceptable key (%s)" % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping

    def construct_yaml_map( self, node):
        data = self.anydict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

'''usage: 
   ...dictOrder = whatever-dict-thing

   class Loader( yaml_anydict.Loader_map_as_anydict, yaml.Loader):
       anydict = dictOrder
   Loader.load_map_as_anydict()
   yaml_anydict.dump_anydict_as_map( dictOrder)
   ...
   p = yaml.load( a, Loader= Loader)
'''

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as reopened
as The resolution will be set. Next status will be 'closed'
to The owner will be changed from xi. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.