Skip to content

Fix stack overflow on deeply-nested JSON in json.loads()#7632

Merged
youknowone merged 2 commits into
RustPython:mainfrom
changjoon-park:fix-json-stack-overflow
Apr 20, 2026
Merged

Fix stack overflow on deeply-nested JSON in json.loads()#7632
youknowone merged 2 commits into
RustPython:mainfrom
changjoon-park:fix-json-stack-overflow

Conversation

@changjoon-park
Copy link
Copy Markdown
Contributor

@changjoon-park changjoon-park commented Apr 19, 2026

Summary

json.loads() on a deeply-nested array or object overflows the native Rust stack and crashes the interpreter process with SIGSEGV. CPython raises RecursionError on the same input.

import json
json.loads('[' * 50000 + ']' * 50000)
# RustPython (before): SIGSEGV (exit 139)
# CPython:             RecursionError: maximum recursion depth exceeded
# RustPython (after):  RecursionError: maximum recursion depth exceeded while
#                         decoding a JSON object from a string

Root cause

The scanner has a mutual-recursion chain in crates/stdlib/src/json.rs:

JsonScanner::parse_object / parse_array
  -> JsonScanner::call_scan_once
    -> JsonScanner::parse_object / parse_array  (recurse)

call_scan_once is the single choke point every descent passes through, but
it wasn't wrapped in a recursion guard. Each nesting level consumes a pair
of Rust stack frames (parse_X + call_scan_once), so input depth ~45k
exhausts the 8 MB main thread stack on macOS and the OS kills the process
with SIGSEGV.

CPython's Modules/_json.c handles this with _Py_EnterRecursiveCall(\" while decoding a JSON object from a string\"), which translates native recursion into RecursionError at sys.getrecursionlimit().

Fix

Wrap the body of call_scan_once with vm.with_recursion(\"while decoding a JSON object from a string\", || { ... }). Same machinery RustPython already uses for comparison, __repr__, __subclasscheck__, and AST traversal (see protocol/object.rs and stdlib/_ast/node.rs).

The guard is placed at call_scan_once rather than on parse_object / parse_array individually because every recursive descent funnels through call_scan_once — one wrap covers array, object, and alternating nesting with a single point of maintenance.

Verification

$ cargo +1.94.0 build --release
   Finished \`release\` profile [optimized] target(s) in 41s

$ ./target/release/rustpython -c \"import json; json.loads('[' * 50000 + ']' * 50000)\"
Traceback (most recent call last):
  File \"<stdin>\", line 1, in <module>
RecursionError: maximum recursion depth exceeded while decoding a JSON object from a string

$ ./target/release/rustpython extra_tests/snippets/stdlib_json.py
(passes silently — includes 3 new regression cases)

$ ./target/release/rustpython -m test test_json
Ran 214 tests in 6.700s
OK (skipped=9, expected failures=13)
Result: SUCCESS

Tested surfaces

Input Before After
'[' * 50000 + ']' * 50000 SIGSEGV RecursionError
'[' * 500000 + ']' * 500000 SIGSEGV RecursionError
'{\"a\":' * 100000 + '1' + '}' * 100000 SIGSEGV RecursionError
('[{\"x\":' * 100000) + '1' + ('}]' * 100000) SIGSEGV RecursionError
'[1, 2, [3, [4, [5]]]]' OK OK (no regression)
'{\"a\": {\"b\": {\"c\": 1}}}' OK OK (no regression)

CPython parity note

After the fix, RustPython raises RecursionError at JSON depth >= ~1000 (VM default recursion_limit). CPython raises it at depth >= ~10000 due to a different stack-headroom heuristic in _Py_EnterRecursiveCall. Both refuse to crash; exact threshold is tunable via sys.setrecursionlimit(). Real-world JSON is rarely deeper than 50 levels, so the difference is not user-visible in practice.

Scope

  • In: recursion guard on call_scan_once — covers all three nesting patterns above
  • Out: encoder side (iterative, already safe), string/number parsing (non-recursive), scan_once Python-callback fallback path (already counted by VM frame machinery)

Related

Summary by CodeRabbit

  • Bug Fixes

    • JSON parser now raises RecursionError when decoding extremely deeply nested JSON structures (arrays, objects exceeding ~100,000 nesting levels) instead of causing potential stack overflow crashes.
  • Tests

    • Added regression tests for JSON parsing with extremely deeply nested arrays, objects, and alternating nesting patterns.

json.loads() on a deeply-nested array or object payload (e.g.
'[' * 50000 + ']' * 50000) overflowed the native Rust stack and
crashed the interpreter process with SIGSEGV. CPython raises
RecursionError on the same input via _Py_EnterRecursiveCall in
Modules/_json.c.

The recursion lives in the mutual call chain:
  JsonScanner::parse_object / parse_array
    -> JsonScanner::call_scan_once
      -> JsonScanner::parse_object / parse_array

Every descent funnels through call_scan_once, so wrapping its body
with vm.with_recursion covers both '{' and '[' paths (and their
mixed nesting) with a single guard.

Before:
  ./rustpython -c "import json; json.loads('[' * 50000 + ']' * 50000)"
    -> SIGSEGV (exit 139)

After:
  -> RecursionError: maximum recursion depth exceeded while
     decoding a JSON object from a string

Verified:
  - extra_tests/snippets/stdlib_json.py: all assertions pass
    (includes 3 new regression cases: array, object, alternating
    nesting at depth 100000)
  - cargo run -- -m test test_json: 214 passed, 0 regressed
    (9 skipped, 13 expected failures, all pre-existing)
  - depth 500000 no longer crashes (RecursionError)
  - shallow parsing unchanged
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • Lib/test/test_json/test_recursion.py is excluded by !Lib/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 28ff1b7d-d3c4-47f3-84a1-4426bff2731a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The changes introduce recursion depth guards to the JSON parser by wrapping the dispatch and parsing logic in vm.with_recursion(). This converts native-stack overflows on deeply nested JSON inputs into Python-level RecursionError exceptions, while retaining existing parsing behavior and match arms. Regression tests validate the new error-handling behavior.

Changes

Cohort / File(s) Summary
JSON Parser Recursion Guard
crates/stdlib/src/json.rs
Wraps entire JSON decoding dispatch and parse logic in vm.with_recursion() context, converting unbounded stack recursion to VM-bounded recursion with explicit error handling for deeply nested structures.
JSON Recursion Regression Tests
extra_tests/snippets/stdlib_json.py
Adds test assertions verifying that json.loads() raises RecursionError on extremely deeply nested JSON inputs (100,000 levels), replacing expectation of native stack overflow behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • RustPython/RustPython#7630: Introduces vm.with_recursion guards around recursive descent in AST deserialization to convert native-stack overflows into Python-level RecursionErrors, mirroring the same pattern applied here for JSON parsing.

Suggested reviewers

  • youknowone

Poem

🐰 A rabbit hops through JSON trees,
Where nesting depths caused stack unease,
Now guards protect from overflow,
Catching errors nice and slow,
Safe recursion bounds bestow! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: wrapping JsonScanner::call_scan_once with a recursion guard to prevent stack overflow on deeply-nested JSON inputs in json.loads().
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tysm!

Can you please check if

def test_highly_nested_objects_decoding(self):

now passes?

Per @ShaharNaveh's review on RustPython#7632: this test was previously marked
`@unittest.skip("TODO: RUSTPYTHON; crashes")` because json.loads
would SIGSEGV on the 500_000-deep input. The recursion-guard added
in this PR makes it raise RecursionError like CPython, so the skip
decorator can be removed.

  $ cargo run -- -m unittest \
        test.test_json.test_recursion.TestCRecursion.test_highly_nested_objects_decoding \
        test.test_json.test_recursion.TestPyRecursion.test_highly_nested_objects_decoding
  ...
  Ran 2 tests in 0.825s
  OK

  $ cargo run -- -m test test_json
  Ran 214 tests (7 skipped, 13 expected failures) — all pass.
@github-actions
Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] lib: cpython/Lib/json
[ ] test: cpython/Lib/test/test_json (TODO: 13)

dependencies:

  • json (native: _json, decoder, encoder, json.tool, sys)
    • _colorize, argparse, codecs, re

dependent tests: (10 tests)

  • json: test_logging test_plistlib test_subprocess test_sysconfig test_tomllib test_tools test_traceback test_zoneinfo
    • importlib.metadata: test_importlib
    • multiprocessing.resource_tracker: test_concurrent_futures

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@changjoon-park
Copy link
Copy Markdown
Contributor Author

changjoon-park commented Apr 20, 2026

tysm!

Can you please check if

def test_highly_nested_objects_decoding(self):

now passes?

Thanks! yes now it's passed. dropped the skip in 566a197. Both variants green, full test_json clean. thanks for catching 😊

Copy link
Copy Markdown
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@youknowone youknowone merged commit 175f12b into RustPython:main Apr 20, 2026
21 checks passed
@changjoon-park changjoon-park deleted the fix-json-stack-overflow branch April 27, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants