wiki:BugsInTheYAMLSpecification

Version 5 (modified by py4fun@…, 5 years ago) (diff)

!!bool examples contain single characters

Bugs in YAML specification

Bugs in the Type Library

!!float: The regular expression matches a single dot ".".

!!bool: Single characters y, Y, n, N are still present as values for bool

Bugs in Examples

Example 2.17. Hex: decimal instead of hex format

hexesc:  "\x13\x10 is \r\n"

Correct:

hexesc:  "\x0D\x0A is \r\n"

Example 2.19. Integers: invalid comma in

decimal: +12,345

Correct:

decimal: +12_345

Example 2.20. Floating Point: invalid comma in

fixed: 1,230.15

Correct:

fixed: 1_230.15

Example 10.7. Flow Mapping Keys: missing comma in

{
?° : value # Empty key

and

simple key : value

Example 9.33. Final Empty Lines: extra !!seq [ in

---
!!seq [
  !!str "folded line\n\

The same bug in

  • Example 9.25. Literal Scalar,
  • Example 9.29. Folded Scalar,
  • Example 9.30. Folded Lines,
  • Example 9.31. Spaced Lines,
  • Example 9.32. Empty Separation Lines.

Example 8.1. Node Properties: a simple key should be limited to a single line:

!!str
 &a1
  "foo" : !!str bar

Example 8.15. Completely Empty Block Nodes: Either remove seq: from the left side, or add it to the right side.

The same example: extra comma in

  : bar,

Example 5.3. Block Structure Indicators:

  ? sky
  : blue
  ? sea : green

is interpreted as

  !!map {
    ? !!str "sky" : !!str "blue",
    ? !!str "sea" : !!str "green",
  }

Correct:

  !!map {
    ? !!str "sky" : !!str "blue",
    ? !!map { ? !!str "sea" : !!str "green" } : !!null "",
  }

Example 6.7. Empty Lines: invalid indicators in

!!seq {

Example 7.8. Tag Handles: missing comma in

  !<tag:yaml.org,2002:str> "string"

Example 8.1. Node Properties: different case of anchor and alias:

  ? &A1 !!str "foo"

and

  : *a1

Anchors are case sensitive, right?

Example 8.9. Flow Content: the sequence sequence is invalid:

    ? !!str "sequence" : !!seq [
      ? !!str "entry",
      : !!map {
        ? !!str "key" : !!str "value"
    } ],

Correct:

    ? !!str "sequence" : !!seq [
      !!str "entry",
      !!map {
        ? !!str "key" : !!str "value"
    } ],

Example 8.9. Flow Content: extra : in

  ? !!str "collections": : !!map {

and in

    ? !!str "mapping": : !!map {

Example 8.13. Completely Empty Flow Nodes: extra comma in

  ? !!str "",
  : !!str "bar",

Example 8.15. Completely Empty Block Nodes: extra comma in

    ? !!str "",
    : !!str "bar",

Example 9.15. Document Marker Scalar Content: extra comma in

  ? !!str "...",
  : !!str "bar"

Example 9.23. Block Scalar Chomping: should be !!map { instead of !seq [ in

!!seq [
  ? !!str "strip"
  : !!str "# text",

Example 9.24. Empty Scalar Chomping: should be !!map { instead of !seq [ in

!!seq [
  ? !!str "strip"
  : !!str "",

Example 10.5. Block Sequence Entry Types: missing comma after ] in

    !!str "two",
  ]
  !!map {

Example 10.12. Block Mappings: extra comma in

    ? !!str "key",

Example 10.15. In-Line Block Mappings: Right side - { instead of [:

%YAML 1.1
---
!!seq {

Example 5.14. Escaped Characters: Right side - should be \x0C in the line

 \x22 \x07 \x08 \x1B \0C

Example 10.10. Flow Mapping Key: Value Pairs: The tag for empty scalars should be !null, not !str. Compare:

? explicit key3,     # Empty value

with

  ? !!str "explicit key3"
  : !!str "",

Correct:

  ? !!str "explicit key3"
  : !!null "",

The same problem in

  • Example 7.10. Documents,
  • Example 8.13. Completely Empty Flow Nodes,
  • Example 8.15. Completely Empty Block Nodes,
  • Example 10.5. Block Sequence Entry Types,
  • Example 10.7. Flow Mapping Keys,
  • Example 10.9. Flow Mapping Values,
  • Example 10.11. Single Pair Mappings,
  • Example 10.13. Explicit Block Mapping Entries,
  • Example 10.14. Simple Block Mapping Entries,

Example 9.12. Plain Characters: Left side has not comma before and. Compare

- Up, up and away!

with

!!str "Up, up, and away!",

Correct:

- Up, up, and away!

Not bugs, but might be invalid in the future

Example 9.12. Plain Characters. : might be prohibited for plain scalars in the flow context:

# Outside flow collection:
- ::std::vector
- Up, up and away!
- -123
# Inside flow collection:
- [ ::std::vector,
  "Up, up and away!",
  -123 ]

Empty content might be prohibited in the flow context.

Example 8.12. Flow Nodes in Flow Context:

[
  Without properties,
  &anchor "Anchored",
  !!str 'Tagged',
  *anchor, # Alias node
  !!str,   # Empty plain scalar
]

Example 10.7. Flow Mapping Keys

{
?° : value # Empty key
? explicit
 key: value,
simple key : value
[ collection, simple, key ]: value
}

There is a problem when you allow constructions like

{ !!str, }

because URIs may include the , and : characters. Suppose that we use a Perl-specific tag like !!perl/ref/YAML::Parser in

{ !!perl/ref/YAML::Parser, }

How it should be interpreted? Is :: a delimiter or a part of the URI? The above examples may be easily rewritten

{
? ~ : value # Empty key
}
[
  *anchor, # Alias node
  '',   # Empty plain scalar
]

Anchors and Tags eat too much

It's easier to see this problem for Anchors. Problematic rules are:

c-ns-alias-node  	 ::=   	 “*”  ns-anchor-name
c-ns-anchor-property  	 ::=   	 “&”  ns-anchor-name
ns-anchor-name  	 ::=   	 ns-char+

ns-char will eat too much. It's bad, especially for flow collections because it may eat the delimiter comma.

The following example clearly shows the ambiguity:

[ &alias, value ]

It can be parsed both as ["", "value"] and as value?.

Compare it with an example from the spec:

Example 8.12. Flow Nodes in Flow Context:

[
  Without properties,
  &anchor "Anchored",
  !!str 'Tagged',
  *anchor, # Alias node
  !!str,   # Empty plain scalar
]

Solution: restrict ns-anchor-name.

The ns-uri-char definition allows commas so Tags have the same problem. Unfortunately you cannot just forbid commas because of the tags like <tag:yaml.org,2002:str>.

A probable solution is to allow commas only between < and >. Another solution is to forbid empty plain scalars in flow context.

Another solution: require s-separate after Tags, Anchors, and Aliases. It's not really intuitive since it will force you to write [ *alias_, foo] instead of [ *alias, foo].

Now I think the best solution is to restrict ns-anchor-name to nb-plain-char-in+ and to forbid empty scalar content in flow collections. So you should write

[
  Without properties,
  &anchor "Anchored",
  !!str 'Tagged',
  *anchor, # Alias node
  !!str '',   # Empty plain scalar
]

Line break is required before block collections

According to the spec, a document must have at least one leading line break before the real content. It makes simple documents such as Example 2.1 invalid:

- Mark McGwire
- Sammy Sosa
- Ken Griffey

The relevant production rules:

l-implicit-document       ::= s-ignored-space*
                              ns-l+block-node(-1,block-in)
                              l-document-suffix?

ns-l+block-node(n,c)      ::= ns-l+block-in-block(n,c)
                            | ns-l+flow-in-block(n,c)

ns-l+block-in-block(n,c)  ::= ( c-ns-properties(n+1,c) s-separate(n+1,c) )?
                              c-l+block-content(n,c)

c-l+block-content(n,c)    ::= c-l+block-scalar(n)
                            | c-l-block-collection(>n,c)

c-l-block-collection(n,c) ::= c-l-block-sequence(n,c) | c-l-block-mapping(n)

c-l-block-sequence(n,c)   ::= c-l-comments l-block-seq-entry(n,c)+

The last rule is the one that requires a line break since c-l-comments always ended with a line break.