Ticket #29 (reopened enhancement)

Opened 8 years ago

Last modified 5 months ago

Keeping mapping keys ordered

Reported by: edemaine@… Owned by: xi
Priority: normal Component: pyyaml
Severity: normal Keywords:
Cc: xi@…, edemaine@…, strombrg@…

Description

Would you be interested in adding the following kind of functionality to the public distribution of PyYAML?

>>> import yaml
>>> d = yaml.load('z: 1\ny: 2\nx: 3\n', Loader=yaml.order.OrderedLoader)
>>> d
yaml.order.odict([('z', 1), ('y', 2), ('z', 3)])
>>> for key, value in d.iteritems (): print key, value
z 1
y 2
x 3
>>> print yaml.dump(d, Dumper=yaml.order.OrderedDumper, default_flow_style=False),
z: 1
y: 2
x: 3
>>> s = yaml.dump(d, default_flow_style=False)
>>> print s,
!!omap
z: 1
y: 2
x: 3
>>> yaml.load(y)
yaml.order.odict([('z', 1), ('y', 2), ('z', 3)])

There are two things going on here:

  1. Add real !!omap functionality. When loading an !!omap object, create an odict object (defined by a new class that maintains a dictionary along with a key order), instead of the current behavior of creating a regular Python dictionary. Conversely, when dumping such an object, preserve the key order (don't sort), and output an !!omap directive. Both of these features seem quite desirable from a YAML standard point of view.

Perhaps, more generally, dumping could check for a special 'keys_in_order' attribute, in which case it follows the order of keys(), instead of sorting the keys as in the recent patch.

  1. Add special yaml.order.OrderedLoader, which loads regular !!map values as if they were !!omap values, and yaml.order.OrderedDumper, which dumps odict types as regular !!map values (to avoid the ugly !!omap specifier).

Personally I would find this functionality very useful in many projects. It would enable a computer program edit a human-written YAML file, without messing up all the key orders, so that the computer output looks pretty similar to what the human had just before. I understand that YAML does not guarantee preservation of key order in a map type, or more precisely, it does not give it any significance to the order in absense of an !!omap or !!pairs type specification. But this is a practically useful feature in some cases, so it seems natural to provide it as an optional functionality in yaml.order.

I'd be happy to write the code for all of this, because I need it myself. My question is whether you'd consider including it in the PyYAML distribution.

Change History

comment:1 Changed 8 years ago by m.mueller at ibgw-leipzig dot de

  • Cc xi@… added

I find YAML a very interesting format for input files of simulation models that typically need lots of data. I am just writing an input program in Python that reads YAML files. It was easy to implement some simple !File directive that reads data from other files either in YAML or in other formats. There are two reasons for this:

  1. Since the files can be rather large, it is very nice to be able distribute input over as many files as desired. This can also be nested, i.e. a YAML file that is included that way can include another file etc. Only YAML files can include other files. Files in other formats can not, they are dead ends.
  1. For data that fit very nicely in tabular representation, other format such as delimited text files (either space or semicolons), spreadsheet formats (EXCEL etc.), dBase file or other databases are often a good choice. They will be imported automatically into the Python data structures just like YAML itself.

I want the user to edit files as well as to read and write those files with my programs in between edits. Therefore, I would like mappings to be ordered. Otherwise the file would possibly be totally rearranged after processing, which is clearly something the user who edits the file would really dislike.

Furthermore, I plan to write a data-driven GUI for input files. It should find (nearly) all information for its appearance in the YAML file. One natural representation would be a tree view. Since the tree has an order I would need an order also for dictionaries.

To cut a long story short, I am really interested in an order mapping for YAML. I think both ways (1) doing it explicitly in YAML with !!odict and (2) doing it in source code with yaml.order.OrderedLoader? should be supported.

I’m willing to contributed to the implementation but would certainly need some help to look at the right places from the beginning and to do it properly, i.e. in agreement with the overall design of PyYAML.

comment:2 Changed 8 years ago by cems at lanl dot gov

An ordered dict may be the wrong tool for the problem. The basic problem is not just a dict issue but keeping all the yaml entries in the same order they were in the original file at dump time. I can see how this could be addressed by an ordered dict however this may be overkill.

An ordered dict is often implemented in a slower and more costly manner than a conventional dict. In this case the desired behavior is most cases I believe will be simply to have the YAML object remember the order of the items in the original file.

The distinction being made here if it is not clear is that one can use a conventional dict for storing and accessing the data quickly in python usage. Most of the time we will not care the order that keys are iterated or how they are stored or the voltility of the key order if the hash is resized. The only time we actually care, I believe, is when one wants to re-write the yaml file. Then one either wants to recreate an original order or specify a particular order.

Thus one can simply use a conventional dict, but also have some axillirary storage to specify the ordering of the keys. This can be consulted when needed and ignored for speed when not needed. Moreover, the user can even edit the order, rather than have it soley determined by the input order.

comment:3 Changed 7 years ago by mundt@…

Just wanted to show interest on this topic, especially on reading the YAML files without messing with the keys. I use YAML to configure a validation process and it would be really nice to be able to have a fixed order. At the moment I do have a extra array just for the order. And thats just double code... who wants that!? :)

However... Since nothing has changed for almost 5 month I just want to check if this topic is still in progress and if there are any news.

thanks for the module anyways.

comment:4 Changed 7 years ago by xi

  • Status changed from new to closed
  • Resolution set to wontfix

I won't add this kind of functionality to the PyYAML core for two reasons:

  • It breaks the YAML specs. The spec clearly indicates that the key order is a representation detail and should not be used for constructing native objects.
  • I don't like the idea of PyYAML defining custom types for generated objects. I'd prefer PyYAML to generate only objects of the types defined in the standard library.

The implementation is nearly trivial though, so anyone who wants this feature regardless what the specs says could implement it by themselves:

>>> import yaml
>>> def omap_constructor(loader, node):
...     return loader.construct_pairs(node)
... 
>>> yaml.add_constructor(u'!omap', omap_constructor)
>>> yaml.load('!omap { C: 1, B: 2, A: 1 }')
[('C', 1), ('B', 2), ('A', 1)]

comment:5 follow-up: ↓ 7 Changed 4 years ago by strombrg@…

  • Cc xi@…, edemaine@…, strombrg@… added; xi@… removed
  • Status changed from closed to reopened
  • Resolution wontfix deleted

I think allowing order to be kept would be very useful - I have a project that needs it, in fact.

I don't agree that it's only when you're writing things back out that order matters - what if you need to store a chronological sequence of events? Or what if you need things to be sorted after each small modification to the resulting dictionary? A repeated sort of keys can be pretty expensive, despite CPython's rather stellar sort algorithm.

I believe it would make the most sense to pass an optional dictionary-like-object constructor to yaml.load - it would default to a traditional dictionary, but could also be an OrderedDict? ( http://docs.python.org/dev/library/collections.html#collections.OrderedDict) or treap ( http://stromberg.dnsalias.org/~strombrg/treap/) or whatever.

As long as the documentation explains clearly that this is a violation of the spec, then people know what to expect when switching implementations. Consider the DB-API - there are plenty of extensions to it added by the various implementations of the API, and there's actually a list of common extensions in the documentation in an effort to unify them a bit.

comment:6 Changed 4 years ago by anonymous

On the growing popularity of ordered dictionaries across many languages:  http://www.mail-archive.com/python-list@python.org/msg58631.html

There is one included with CPython 2.7 and up, and there exists one that's pretty compatible available for 2.4 through 2.6:  http://pypi.python.org/pypi/ordereddict/1.1.

And I can't help but feel that someone might want to pass in a treap someday.

comment:7 in reply to: ↑ 5 Changed 4 years ago by anonymous

Replying to strombrg@gmail.com:

I believe it would make the most sense to pass an optional dictionary-like-object constructor to yaml.load - it would default to a traditional dictionary, but could also be an OrderedDict? ( http://docs.python.org/dev/library/collections.html#collections.OrderedDict) or treap ( http://stromberg.dnsalias.org/~strombrg/treap/) or whatever.

+1 for this. I think that adding the possibility to change the default behavior for dictionaries would be a very welcome feature.

comment:8 Changed 4 years ago by Aphex

I would greatly appreciate a way to do this. I love the simplicity of the yaml syntax and would like to integrate it into my control scheme program, however the control sequences must be ordered by definition. Does anyone have a modified version of pyyaml that reads and dumps mappings as OrderedDicts??

comment:9 Changed 4 years ago by anonymous

I, too, want to express strong interest in this feature. Contrast to other experiences, in nearly all cases I used YAML it was crucial to preserve the ordering as specified in the YAML text. I always had to resort to !!omap, but an !!odict would definitely be a more elegant alternative.

It would also ensure for equivalent behavior as in PHP, where dictionaries are always ordered, and provide for language-independent processing. (With the implementation as it is, processing of the same YAML file yields different ordering and hence different results in different languages, which kind of sucks.)

Regarding speed considerations: If implemented as suggested in this ticket, speed won't be an issue as using odict and/or the special OrderedLoader? would be entirely optional.

Sincerely,
Zoran of  hackadelic.com

comment:10 Changed 4 years ago by anonymous

Kirill would you be willing to add an OrderedDictionary? wiki page that collects recipes for how to do this? There are several alternatives of varying complexity that might be useful to the people (myself included) who have a real use case for ordered dictionaries in YAML.

Here are the links I've found for this. Please amend as you see fit:

(Of all of these the last comes closest to what I need, but there being different use cases, I think listing multiple working alternatives would be most helpful.)

comment:11 Changed 3 years ago by az@…

ordering is very important when a language/text should be human-used. What comes in should come out, or else people think it's wrong. But it took 10 years for python to eventualy get there (and u still cannot keep the order of { 1:a, 2:b ). btw just an ordered dict is not enough, because duplicate keys would be lost in it (last wins). So one should be able to decide what a map should be.

so here's my implementation, both in and out, separate from pyyaml. It could be much easier if the data={} in related 2 funcs were using self.dict() instead of {}...

#yaml_anydict.py
import yaml
from yaml.representer import Representer
from yaml.constructor import Constructor, MappingNode, ConstructorError

def dump_anydict_as_map( anydict):
    yaml.add_representer( anydict, _represent_dictorder)
def _represent_dictorder( self, data):
    return self.represent_mapping('tag:yaml.org,2002:map', data.items() )

class Loader_map_as_anydict( object):
    'inherit + Loader'
    anydict = None      #override
    @classmethod        #and call this
    def load_map_as_anydict( klas):
        yaml.add_constructor( 'tag:yaml.org,2002:map', klas.construct_yaml_map)

    'copied from constructor.BaseConstructor, replacing {} with self.anydict()'
    def construct_mapping(self, node, deep=False):
        if not isinstance(node, MappingNode):
            raise ConstructorError(None, None,
                    "expected a mapping node, but found %s" % node.id,
                    node.start_mark)
        mapping = self.anydict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError as exc:
                raise ConstructorError("while constructing a mapping", node.start_mark,
                        "found unacceptable key (%s)" % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping

    def construct_yaml_map( self, node):
        data = self.anydict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

'''usage: 
   ...dictOrder = whatever-dict-thing

   class Loader( yaml_anydict.Loader_map_as_anydict, yaml.Loader):
       anydict = dictOrder
   Loader.load_map_as_anydict()
   yaml_anydict.dump_anydict_as_map( dictOrder)
   ...
   p = yaml.load( a, Loader= Loader)
'''

comment:12 Changed 8 months ago by maskodok <galihadiputro87@…>

The only thing more I could hope for is documentation of all these features (other than reading through the code).  Cipto Junaedy Is this in process? Can I help? About  Unit Link Terbaik di Indonesia Commonwealth Life Investra Link

comment:13 Changed 5 months ago by liwa <dirosie46@…>

The second issue is that the emitter escapes non-ASCII characters even when all characters are printable (according to 'c-printable' in the YAML spec) when using an encoding (UTF8) that supports such characters. I don't find this as elegant as could be. Instead of the "Fran\xE7ais" output above, I would have hoped for the UTF8-encoded byte string Fran\xc3\xa7ais\n.

 bundapoker.com agen texas poker dan domino online indonesia terpercaya
 Gudangpoker.com Situs Judi Poker Online Terbaik Terpercaya
 Singgasana Hotels & Resorts pilihan akomodasi terbaik di Indonesia
 Cipto Junaedy
 Cipto Junaedy
 Cipto Junaedy

Note: See TracTickets for help on using tickets.