Fix stack overflow on deeply-nested JSON in json.loads()#7632
Conversation
json.loads() on a deeply-nested array or object payload (e.g.
'[' * 50000 + ']' * 50000) overflowed the native Rust stack and
crashed the interpreter process with SIGSEGV. CPython raises
RecursionError on the same input via _Py_EnterRecursiveCall in
Modules/_json.c.
The recursion lives in the mutual call chain:
JsonScanner::parse_object / parse_array
-> JsonScanner::call_scan_once
-> JsonScanner::parse_object / parse_array
Every descent funnels through call_scan_once, so wrapping its body
with vm.with_recursion covers both '{' and '[' paths (and their
mixed nesting) with a single guard.
Before:
./rustpython -c "import json; json.loads('[' * 50000 + ']' * 50000)"
-> SIGSEGV (exit 139)
After:
-> RecursionError: maximum recursion depth exceeded while
decoding a JSON object from a string
Verified:
- extra_tests/snippets/stdlib_json.py: all assertions pass
(includes 3 new regression cases: array, object, alternating
nesting at depth 100000)
- cargo run -- -m test test_json: 214 passed, 0 regressed
(9 skipped, 13 expected failures, all pre-existing)
- depth 500000 no longer crashes (RecursionError)
- shallow parsing unchanged
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including ⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe changes introduce recursion depth guards to the JSON parser by wrapping the dispatch and parsing logic in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ShaharNaveh
left a comment
There was a problem hiding this comment.
tysm!
Can you please check if
now passes?
Per @ShaharNaveh's review on RustPython#7632: this test was previously marked `@unittest.skip("TODO: RUSTPYTHON; crashes")` because json.loads would SIGSEGV on the 500_000-deep input. The recursion-guard added in this PR makes it raise RecursionError like CPython, so the skip decorator can be removed. $ cargo run -- -m unittest \ test.test_json.test_recursion.TestCRecursion.test_highly_nested_objects_decoding \ test.test_json.test_recursion.TestPyRecursion.test_highly_nested_objects_decoding ... Ran 2 tests in 0.825s OK $ cargo run -- -m test test_json Ran 214 tests (7 skipped, 13 expected failures) — all pass.
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [ ] lib: cpython/Lib/json dependencies:
dependent tests: (10 tests)
Legend:
|
Thanks! yes now it's passed. dropped the skip in 566a197. Both variants green, full test_json clean. thanks for catching 😊 |
Summary
json.loads()on a deeply-nested array or object overflows the native Rust stack and crashes the interpreter process withSIGSEGV. CPython raisesRecursionErroron the same input.Root cause
The scanner has a mutual-recursion chain in
crates/stdlib/src/json.rs:call_scan_onceis the single choke point every descent passes through, butit wasn't wrapped in a recursion guard. Each nesting level consumes a pair
of Rust stack frames (
parse_X+call_scan_once), so input depth ~45kexhausts the 8 MB main thread stack on macOS and the OS kills the process
with
SIGSEGV.CPython's
Modules/_json.chandles this with_Py_EnterRecursiveCall(\" while decoding a JSON object from a string\"), which translates native recursion intoRecursionErroratsys.getrecursionlimit().Fix
Wrap the body of
call_scan_oncewithvm.with_recursion(\"while decoding a JSON object from a string\", || { ... }). Same machinery RustPython already uses for comparison,__repr__,__subclasscheck__, and AST traversal (seeprotocol/object.rsandstdlib/_ast/node.rs).The guard is placed at
call_scan_oncerather than onparse_object/parse_arrayindividually because every recursive descent funnels throughcall_scan_once— one wrap covers array, object, and alternating nesting with a single point of maintenance.Verification
Tested surfaces
'[' * 50000 + ']' * 50000'[' * 500000 + ']' * 500000'{\"a\":' * 100000 + '1' + '}' * 100000('[{\"x\":' * 100000) + '1' + ('}]' * 100000)'[1, 2, [3, [4, [5]]]]''{\"a\": {\"b\": {\"c\": 1}}}'CPython parity note
After the fix, RustPython raises
RecursionErrorat JSON depth >= ~1000 (VM defaultrecursion_limit). CPython raises it at depth >= ~10000 due to a different stack-headroom heuristic in_Py_EnterRecursiveCall. Both refuse to crash; exact threshold is tunable viasys.setrecursionlimit(). Real-world JSON is rarely deeper than 50 levels, so the difference is not user-visible in practice.Scope
call_scan_once— covers all three nesting patterns abovescan_oncePython-callback fallback path (already counted by VM frame machinery)Related
vm.with_recursionpattern: Fix segfault on cyclic or deeply-nested AST incompile()#7630 (cyclic AST incompile())Modules/_json.cscanner_call/_scan_onceuses_Py_EnterRecursiveCallSummary by CodeRabbit
Bug Fixes
RecursionErrorwhen decoding extremely deeply nested JSON structures (arrays, objects exceeding ~100,000 nesting levels) instead of causing potential stack overflow crashes.Tests