Skip to content

bpo-25643: Fix tokenizer error when raw decoding null bytes#25080

Merged
pablogsal merged 1 commit intopython:masterfrom
pablogsal:null_bytes
Mar 29, 2021
Merged

bpo-25643: Fix tokenizer error when raw decoding null bytes#25080
pablogsal merged 1 commit intopython:masterfrom
pablogsal:null_bytes

Conversation

@pablogsal
Copy link
Copy Markdown
Member

@pablogsal pablogsal commented Mar 29, 2021

@pablogsal
Copy link
Copy Markdown
Member Author

@pablogsal
Copy link
Copy Markdown
Member Author

❯ cat bad.py
0000
00000000000


Before:

cpython on  null_bytes [!?] via 🐍 v3.9.1 took 2s
❯ ./python bad.py
=================================================================
==29963==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6250000168ff at pc 0x55eb401ba788 bp 0x7fff8d1d60d0 sp 0x7fff8d1d60c0
READ of size 1 at 0x6250000168ff thread T0
    #0 0x55eb401ba787 in tok_readline_raw Parser/tokenizer.c:817
    #1 0x55eb401bb4b8 in tok_underflow_file Parser/tokenizer.c:947
    #2 0x55eb401bbc97 in tok_nextc Parser/tokenizer.c:1031
    #3 0x55eb401bd1c7 in tok_get Parser/tokenizer.c:1213
    #4 0x55eb401c1286 in PyTokenizer_Get Parser/tokenizer.c:1872
    #5 0x55eb402b09b2 in _PyPegen_fill_token Parser/pegen.c:633
    #6 0x55eb402b2578 in _PyPegen_expect_token Parser/pegen.c:832
    #7 0x55eb4030032d in _tmp_15_rule Parser/parser.c:19552
    #8 0x55eb402b2485 in _PyPegen_lookahead Parser/pegen.c:823
    #9 0x55eb402bf986 in compound_stmt_rule Parser/parser.c:2008
    #10 0x55eb402bd4fd in statement_rule Parser/parser.c:1365
    #11 0x55eb402ff7d2 in _loop1_11_rule Parser/parser.c:19322
    #12 0x55eb402bd342 in statements_rule Parser/parser.c:1324
    #13 0x55eb402bbd2f in file_rule Parser/parser.c:853
    #14 0x55eb40320c20 in _PyPegen_parse Parser/parser.c:29381
    #15 0x55eb402b4e71 in _PyPegen_run_parser Parser/pegen.c:1232
    #16 0x55eb402b5587 in _PyPegen_run_parser_from_file_pointer Parser/pegen.c:1316
    #17 0x55eb401b61d3 in _PyParser_ASTFromFile Parser/peg_api.c:26
    #18 0x55eb40051d55 in pyrun_file Python/pythonrun.c:1125
    #19 0x55eb4004f35a in _PyRun_SimpleFileObject Python/pythonrun.c:457
    #20 0x55eb4004e2bb in _PyRun_AnyFileObject Python/pythonrun.c:91
    #21 0x55eb3fdf17a2 in pymain_run_file_obj Modules/main.c:353
    #22 0x55eb3fdf1961 in pymain_run_file Modules/main.c:372
    #23 0x55eb3fdf2bc6 in pymain_run_python Modules/main.c:587
    #24 0x55eb3fdf2f07 in Py_RunMain Modules/main.c:666
    #25 0x55eb3fdf3282 in pymain_main Modules/main.c:696
    #26 0x55eb3fdf361c in Py_BytesMain Modules/main.c:720
    #27 0x55eb3fdef709 in main Programs/python.c:15
    #28 0x7efd0aa47bf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
    #29 0x55eb3fdef609 in _start (/home/pablogsal/github/cpython/python+0x14f609)

With this PR:

cpython on  null_bytes [!?] via 🐍 v3.9.1 took 2s
❯ ./python bad.py

@vstinner
Copy link
Copy Markdown
Member

This following patch fix Valgrind warnings of https://bugs.python.org/issue43662 for the command:

echo|PYTHONMALLOC=malloc valgrind ./python Tools/scripts/reindent.py 

Patch:

diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c
index d18fffa..9ea6af9 100644
--- a/Parser/tokenizer.c
+++ b/Parser/tokenizer.c
@@ -815,6 +815,7 @@ struct tok_state *
         }
         tok->inp = strchr(tok->inp, '\0');
     } while (tok->inp[-1] != '\n');
+    tok->end = tok->buf + strlen(tok->buf);
     return 1;
 }
 

@vstinner
Copy link
Copy Markdown
Member

Less intrusive patch fixing the Valgrind warnings:

diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c
index d18fffa..d8ccb44 100644
--- a/Parser/tokenizer.c
+++ b/Parser/tokenizer.c
@@ -963,10 +963,14 @@ struct tok_state *
         if (tok->lineno > 2) {
             tok->decoding_state = STATE_NORMAL;
         }
-        else if (!check_coding_spec(tok->cur, tok->end - tok->cur,
-                                    tok, fp_setreadl))
-        {
-            return 0;
+        else {
+            Py_ssize_t size = tok->end - tok->cur;
+            Py_ssize_t len = strnlen(tok->cur, size);
+            size = Py_MIN(size, len);
+            if (!check_coding_spec(tok->cur, size, tok, fp_setreadl))
+            {
+                return 0;
+            }
         }
     }
     /* The default encoding is UTF-8, so make sure we don't have any

@pablogsal
Copy link
Copy Markdown
Member Author

@vstinner I have updated the PR with your patch. This should fix both problems

@pablogsal pablogsal force-pushed the null_bytes branch 2 times, most recently from ad24cc9 to dc175c3 Compare March 29, 2021 22:55
@pablogsal
Copy link
Copy Markdown
Member Author

I have found the actual problem, this used to check the size using strlen:

if (!check_coding_spec(line, strlen(line), tok, fp_setreadl)) {

I have updated the PR with the fix.

@pablogsal pablogsal merged commit 92a02c1 into python:master Mar 29, 2021
@pablogsal pablogsal deleted the null_bytes branch March 29, 2021 23:24
@vstinner
Copy link
Copy Markdown
Member

I confirm that this change fixed https://bugs.python.org/issue43662 thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants