Ticket #128 (new defect)

Opened 5 years ago

Last modified 2 months ago

libyaml should detect duplicate keys and report an error.

Reported by: anonymous Owned by: xi
Priority: high Component: libyaml
Severity: major Keywords:


In the 1.1 spec I believe this is considered an error; currently libyaml silently accepts duplicate keys.


Change History

comment:1 Changed 5 years ago by anonymous

It is an error for two equal keys to appear in the same mapping node. In such a case the YAML processor may continue, ignoring the second key: value pair and issuing an appropriate warning. This strategy preserves a consistent information model for one-pass and random access applications.

I do not think the library should raise an exception in this case.

comment:2 Changed 5 years ago by rblazecka@…

I was just bit by this bug; I strongly believe that it should throw an exception when this scenario is encountered.

The YAML spec as written allows the parser to continue parsing the rest of the file, as highlighted in the comment above. However, this appears to be as a concession, not a rule (it "may" continue, not "must"). In the previous sentence, it unequivocally states that two equal keys is an error. In the case where it does pass over the error, it _must_ issue a warning. There is no case of silently ignoring the duplicate keys.

In all of the use cases that I have in my application, any invalid YAML should be an error. Thus why I'd like a straightforward exception that the data couldn't be parsed.

In this simple test case:

import yaml
print yaml.load('''
one: 1
one: 4

pyyaml loads the data without any warning at all. The output of this program is:

{'one': 4}

No warnings or errors that there were any problems with the data stream at all. Ideally perhaps the library should be configurable between the two modes, but if it clutters the interface too much, I'd like to have it just raise an exception.

(Note also that this output is incorrect when compared to what the spec says it should be doing, in that it should be ignoring the second value, not keeping it)

comment:3 Changed 5 years ago by anonymous

a) this really sucks, loading something that's considered "an error" and not exposing that to the user is broken. b) libyaml is returning the last instance of the key, not the first.

comment:4 Changed 4 years ago by sven@…

I do not know why this issue has not been addressed some months, but I have found a solution which fits to my needs. See the uDiff below. Was hard to find, that it was the constructor.yml file ... but...

---	Do Feb 18 18:10:45 2010
+++	Do Feb 18 18:12:29 2010
@@ -1,4 +1,3 @@
 __all__ = ['BaseConstructor', 'SafeConstructor', 'Constructor',
@@ -137,6 +136,12 @@
                 raise ConstructorError("while constructing a mapping", node.start_mark,
                         "found unacceptable key (%s)" % exc, key_node.start_mark)
             value = self.construct_object(value_node, deep=deep)
+            #print key
+            #print mapping
+            #print "-----------"
+            if key in mapping:
+                raise ConstructorError("while constructing a mapping", node.start_mark,
+                        "found already in-use key (%s)" % key, key_node.start_mark)
             mapping[key] = value
         return mapping

comment:5 Changed 4 years ago by sven.witterstein@…

  • Priority changed from normal to high
  • Severity changed from normal to major

A month later and no comment on my suggested patch. Sad. I think the importance and priority of this bug should be raised, which I do now (if I may). All that needs be done is to apply my patch and maybe add some config param such as --strict-key-check or the like...

comment:6 Changed 4 years ago by strombrg@…

I need this too. It could conceivably prevent me from being able to use PyYAML in my current project.

comment:7 Changed 4 years ago by rdesgroppes

+1. JSON's RFC4627 requires that mappings keys merely “SHOULD” be unique, while YAML insists they “MUST” be. Technically, YAML therefore complies with the JSON spec, choosing to treat duplicates as an error. In practice, since JSON is silent on the semantics of such duplicates, the only portable JSON files are those with unique keys, which are therefore valid YAML files. (from

comment:8 Changed 3 years ago by amannijhawan@…

I think raising an error would be harsh and might break existing code for users, however raising a warning would be useful.

comment:9 Changed 3 years ago by 129a45fds36@…

comment:10 Changed 2 years ago by e.a.b.piel@…

Any advances on this matter? In my project I'd like to detect such erroneous construct but without pyyaml telling me anything I've no idea how to do that :-( I also understand from the spec that it should generate an error, but just a warning would be already very nice :-D

comment:11 Changed 9 months ago by ulu5

There may be use cases where a value would be overridden later in the stream. I think the best solution would be to have an optional argument to allow/disallow duplicates.

Signature would be:

yaml.load(stream, allow_duplicates=True)

comment:12 Changed 8 months ago by aelman

This would be quite useful for our use case as well. We'd like to create a git commit hook that prevents yaml files with duplicate keys from being committed. If an error/warning thrown or if we had an option for allowing duplicates, we could easily make our test.

comment:13 Changed 2 months ago by pinard@…

Hi, people. I was just hit by this problem and found this ticket by Googling around. I'm surprised to see that this problem has been pending, unresolved, for five years by now. I'm not familiar at all with libyaml maintenance. Likely that releases do not occur often?



Add a comment

Modify Ticket

Change Properties
<Author field>
as new
as The resolution will be set. Next status will be 'closed'
to The owner will be changed from xi. Next status will be 'new'
The owner will be changed from xi to anonymous. Next status will be 'assigned'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.