Match CPython parity for compile() str source, mode wording, and func_type#7767
Match CPython parity for compile() str source, mode wording, and func_type#7767changjoon-park wants to merge 2 commits into
Conversation
…on mode wording
Three CPython parity gaps in `compile()` text-source handling, all in
`crates/vm/src/stdlib/builtins.rs`:
1. **str source path applies bytes-only encoding declaration validation.**
Previously, `compile()` collapsed both `str` and bytes-like inputs to
bytes via `ArgStrOrBytesLike::borrow_bytes()` and ran the same
`decode_source_bytes()` over them, including the "unknown encoding"
error path. CPython only applies `# -*- coding: ... -*-` validation
when source is bytes — for `str` the source is already decoded and
the declaration must be ignored.
```python
>>> compile('# -*- coding: badencoding -*-\n"x"\n', 'tmp', 'exec')
# CPython 3.14.4: <code object>
# RustPython main: SyntaxError: unknown encoding for 'tmp': badencoding ❌
```
Branch on `args.source` type early: pure `str` (no surrogates) skips
the encoding-declaration check; surrogate-bearing `str` falls back to
the bytes path; bytes-like input is unchanged.
2. **`compile()` mode error wording.** The parser's `ModeParseError`
Display produces `mode must be "exec", "eval", or "single"` (no
`compile()` prefix, double quotes, Oxford comma). CPython:
`compile() mode must be 'exec', 'eval' or 'single'`. Replace
`err.to_string()` wrapping at the four `mode_str.parse::<Mode>()`
call sites with the exact CPython wording.
3. **Missing `func_type` validation on text-source path.** Previously,
`compile(text, ..., 'func_type')` without `PyCF_ONLY_AST` fell
through the `mode_str.parse::<Mode>()` failure path, which now
reports `'exec', 'eval' or 'single'` and hides that the real issue
is the missing flag. The AST-input path already had the correct
check; mirror it for text input. CPython:
`compile() mode 'func_type' requires flag PyCF_ONLY_AST`.
Verified byte-identical with CPython 3.14.4 across:
- 9 source-input combinations ({str, bytes, bytearray, memoryview} × {no decl, valid utf-8, bad decl})
- 7 mode strings (exec, eval, single, func_type, bogus, '', BAD)
`cargo run --release -- -m test test_compile test_builtin test_compileall
test_codeop test_ast` — 703 tests pass, 0 regressions. All 188
`extra_tests/snippets/*.py` pass under the CI feature set.
`test_compile.test_encoding`'s expectedFailure marker is left in place:
the test exercises non-UTF-8 encoding decoding (`# -*- coding: latin1 -*-`,
`iso8859-15`) which is a separate pre-existing limitation in
`decode_source_bytes` unaffected by this PR.
📝 WalkthroughWalkthroughThe ChangesSource Decoding and Mode Parsing Improvements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: Turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. 👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/vm/src/stdlib/builtins.rs (1)
318-442: 💤 Low valueConsider extracting repeated error message strings to constants.
The error messages are duplicated across multiple locations:
"compile() mode must be 'exec', 'eval' or 'single'"appears 4 times (lines 321, 401, 416, 441)"compile() mode 'func_type' requires flag PyCF_ONLY_AST"appears 2 times (lines 289, 394)Extracting these to constants would reduce duplication and ensure consistency.
♻️ Proposed refactor
+ const COMPILE_MODE_ERROR: &str = "compile() mode must be 'exec', 'eval' or 'single'"; + const FUNC_TYPE_REQUIRES_AST_FLAG: &str = "compile() mode 'func_type' requires flag PyCF_ONLY_AST"; + #[cfg(any(feature = "parser", feature = "compiler"))] #[pyfunction] fn compile(args: CompileArgs, vm: &VirtualMachine) -> PyResult {Then replace the inline string literals with these constants.
As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/builtins.rs` around lines 318 - 442, Create shared constant error strings (e.g. const COMPILE_MODE_ERR: &str = "compile() mode must be 'exec', 'eval' or 'single'"; const FUNC_TYPE_FLAG_ERR: &str = "compile() mode 'func_type' requires flag PyCF_ONLY_AST";) and replace all inline uses in the compile path: the mode_str.parse::<Mode> error maps (currently returning vm.new_value_error(...)) in the branches that call mode_str.parse (both parser::Mode and crate::compiler::Mode), and the checks that return the func_type flag error (the if mode_str == "func_type" branches used before codegen and in the PyCF_ONLY_AST handling), so all four occurrences of the compile-mode message and both occurrences of the func_type message use the new constants to avoid duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/vm/src/stdlib/builtins.rs`:
- Around line 318-442: Create shared constant error strings (e.g. const
COMPILE_MODE_ERR: &str = "compile() mode must be 'exec', 'eval' or 'single'";
const FUNC_TYPE_FLAG_ERR: &str = "compile() mode 'func_type' requires flag
PyCF_ONLY_AST";) and replace all inline uses in the compile path: the
mode_str.parse::<Mode> error maps (currently returning vm.new_value_error(...))
in the branches that call mode_str.parse (both parser::Mode and
crate::compiler::Mode), and the checks that return the func_type flag error (the
if mode_str == "func_type" branches used before codegen and in the PyCF_ONLY_AST
handling), so all four occurrences of the compile-mode message and both
occurrences of the func_type message use the new constants to avoid duplication.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 837dc311-f5ba-4724-8dc6-43d035393069
📒 Files selected for processing (1)
crates/vm/src/stdlib/builtins.rs
ShaharNaveh
left a comment
There was a problem hiding this comment.
Thanks for the patch!
Does this fix something in the CPython test suite? if not, can you add a custom test to prevent regressions?
Per ShaharNaveh's review on PR RustPython#7767: the fix doesn't unmask any existing CPython test, so add a custom regression snippet covering all three changes: - str source ignores encoding declarations (the "unknown encoding" error path remains for bytes input) - compile() mode error wording matches CPython's exact text - func_type without PyCF_ONLY_AST raises the specific "requires flag" error, not the generic "invalid mode"
There was a problem hiding this comment.
🧹 Nitpick comments (1)
extra_tests/snippets/builtin_compile.py (1)
7-15: ⚡ Quick winSurrogate-bearing
strfallback path is not tested.The Rust implementation explicitly documents that a surrogate-bearing
str(wherepystr.to_str()returnsNone) falls back to the bytes path and raisesSyntaxErroron the WTF-8 encoding. This is new branching logic introduced by this PR, but no assertion covers it. A minimal addition would be:🧪 Suggested coverage addition
compile("# -*- coding: utf-8 -*-\nx = 1\n", "tmp", "exec") + +# Surrogate-bearing str falls back to the bytes path → SyntaxError. +assert_raises(SyntaxError, compile, chr(0xD800) + "\nx = 1\n", "tmp", "exec")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@extra_tests/snippets/builtin_compile.py` around lines 7 - 15, Add a test that exercises the surrogate-bearing str fallback by calling compile with a str that contains a lone surrogate (so pystr.to_str() returns None) and asserting it behaves like the bytes path (i.e., raises SyntaxError for the bogus WTF-8 declaration); use the existing compile call pattern and assert_raises to verify the SyntaxError from the surrogate-bearing string input, referencing the compile and assert_raises usages already in the snippet.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@extra_tests/snippets/builtin_compile.py`:
- Around line 7-15: Add a test that exercises the surrogate-bearing str fallback
by calling compile with a str that contains a lone surrogate (so pystr.to_str()
returns None) and asserting it behaves like the bytes path (i.e., raises
SyntaxError for the bogus WTF-8 declaration); use the existing compile call
pattern and assert_raises to verify the SyntaxError from the surrogate-bearing
string input, referencing the compile and assert_raises usages already in the
snippet.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 7c6a6d5c-add4-415e-b244-99a5e2699d21
📒 Files selected for processing (1)
extra_tests/snippets/builtin_compile.py
thanks for catching ! it doesnt unmask an existing test, i added a regression snippet. please check it and thanks for the review :) |
Summary
Three CPython parity gaps in
compile()text-source handling, all incrates/vm/src/stdlib/builtins.rs.1. str source path applies bytes-only encoding declaration validation
Previously,
compile()collapsed bothstrand bytes-like inputs to bytes viaArgStrOrBytesLike::borrow_bytes()and ran the samedecode_source_bytes()over them, including the "unknown encoding" error path. CPython only applies# -*- coding: ... -*-validation when source is bytes — forstrthe source is already decoded and the declaration must be ignored.Branch on
args.sourcetype early:str(no surrogates) → skip encoding-declaration validationstr→ fall back to bytes path (preserves existing SyntaxError)2.
compile()mode error wordingThe parser's
ModeParseErrorDisplay produces a generic message used outsidecompile()too. Replace theerr.to_string()wrapping at the fourmode_str.parse::<Mode>()call sites inbuiltins::compilewith the exact CPython wording.3. Missing
func_typevalidation on text-source pathThe AST-input path already had this check (lines 287-291); mirror it for text-source input so the user sees the actual cause (missing flag) rather than "invalid mode."
Verification (CPython 3.14.4)
cargo run --release -- -m test test_compile test_builtin test_compileall test_codeop test_ast— 703 tests pass, 0 regressions. All 188extra_tests/snippets/*.pypass under the CI feature set.Out of scope
test_compile.test_encoding'sexpectedFailuremarker is left in place: the test exercises non-UTF-8 encoding decoding (# -*- coding: latin1 -*-,iso8859-15) which is a separate pre-existing limitation indecode_source_bytesunaffected by this PR.Summary by CodeRabbit