Ticket #156 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

libyaml fails to identify simple keys in very long files on 32-bit platforms

Reported by: ppelletier@… Owned by: xi
Priority: normal Component: libyaml
Severity: normal Keywords:


I have file which contains thousands of relatively short YAML documents, so the file is large (nearly a gigabyte) but the individual documents are not.

I can provide this file if necessary, but I'm not attaching it because it's so large (982M uncompressed, and still 45M when bzip2'ed).

I was getting this error:

Parser error: while parsing a block mapping at line 9259457, column 5
did not find expected key at line 9260367, column 5

This error occurs in both libyaml-0.1.2 and libyaml-0.1.3, but only on 32-bit machines (I tried Ubuntu 8.04 for x86-32, and Intel Mac OS X 10.5 with the compiler in 32-bit mode). If I parse the same file with libyaml on a 64-bit machine (e. g. Ubuntu 8.04 for x86-64), it parses successfully with no error.

I eventually tracked this problem down to an overflow in pointer arithmetic in yaml_parser_save_simple_key(), in yaml-0.1.3/src/scanner.c on line 1125. I changed this:

        simple_key.token_number = 
            parser->tokens_parsed + parser->tokens.tail - parser->tokens.head;

to this:

        simple_key.token_number = 
            parser->tokens_parsed + (parser->tokens.tail - parser->tokens.head);

which caused my file to be parsed successfully, even on 32-bit platforms. So, I would recommend adding this fix to libyaml-0.1.4. Thanks!


ytest.c Download (1.3 KB) - added by ppelletier@… 4 years ago.
a program which demonstrates the bug

Change History

Changed 4 years ago by ppelletier@…

a program which demonstrates the bug

comment:1 Changed 4 years ago by ppelletier@…

I've attached a short program which demonstrates the bug. Sorry for not doing this earlier. The program fails on a 32-bit machine and succeeds on a 64-bit machine. I used gcc 4.2.4 on Ubuntu 8.04, but this problem shows up on any 32-bit machine: I've seen it on Linux and OS X using gcc, and on Windows using Visual Studio.

ppelletier@patrickpc:~/oblong$ gcc -Wall -I/opt/yobuild/include -L/opt/yobuild/lib -lyaml ytest.c
ppelletier@patrickpc:~/oblong$ ./a.out 
libYaml version 0.1.2 on a 32-bit machine
parser error 4
context: while parsing a block mapping at line 17895698
problem: did not find expected key at line 17895699

ppelletier@patrick64:~/misc$ gcc -Wall -I/opt/yobuild/include -L/opt/yobuild/lib64 -lyaml ytest.c
ppelletier@patrick64:~/misc$ ./a.out 
libYaml version 0.1.2 on a 64-bit machine

comment:2 Changed 4 years ago by xi

  • Status changed from new to closed
  • Resolution set to fixed

Thank you for the patch, applied in [371].

I'm afraid it could still overflow token_number on a very big file (say, >4G) on a 32-bit machine. I guess it should be considered a limitation, not a bug, but, perhaps, libyaml should be able to detect it and report a proper error?


Add a comment

Modify Ticket

Change Properties
<Author field>
as closed
The resolution will be deleted. Next status will be 'reopened'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.