Update tarfile from v3.14.3 and fix bugs#7223
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaced xz/lzma platform crates with liblzma crates; added a public accessor on DecompressState; and refactored the LZMA module to use a new LzmaStream wrapper, centralized error mapping, expanded filter/format handling, and new init paths (XZ/ALONE/RAW) with updated constructor fields/types and exposed constants. Changes
Sequence Diagram(s)sequenceDiagram
participant PyClient as Python caller
participant LZMAType as LZMACompressor / LZMADecompressor
participant LzmaStream as LzmaStream (wrapper)
participant LibLzma as liblzma
PyClient->>LZMAType: construct(args: memlimit, filters, preset, format)
LZMAType->>LzmaStream: init_xz/init_alone/init_raw(parse filters)
LzmaStream->>LibLzma: call liblzma init APIs
LibLzma-->>LzmaStream: stream handle + header/check info
LzmaStream-->>LZMAType: expose stream (check/eof/unused_data)
PyClient->>LZMAType: compress()/decompress(data)
LZMAType->>LzmaStream: process(data)
LzmaStream->>LibLzma: stream process calls
LibLzma-->>LzmaStream: processed bytes / status
LzmaStream-->>LZMAType: result / header/check info
LZMAType-->>PyClient: return bytes / raise mapped errors
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 4❌ Failed checks (4 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin tarfile |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/tarfile.py dependencies:
dependent tests: (2 tests)
[x] lib: cpython/Lib/lzma.py dependencies:
dependent tests: (1 tests)
Legend:
|
f90575b to
c4b01ad
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
crates/stdlib/src/lzma.rs (4)
654-659: Preferexpect()overunwrap()for safety-critical unwraps.While the comment explains the safety, using
expect()with a descriptive message provides better documentation and panic messages if the invariant is ever violated.Proposed fix
FORMAT_RAW => { - let filter_specs = args.filters.unwrap(); // safe: checked above + let filter_specs = args.filters.expect("filters required for FORMAT_RAW"); let filters = parse_filter_chain_spec(filter_specs, vm)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 654 - 659, Replace the direct unwrap() on args.filters in the FORMAT_RAW branch with expect() and a clear message describing the invariant; locate the code in the FORMAT_RAW arm where args.filters.unwrap() is called (near parse_filter_chain_spec and Stream::new_raw_decoder) and change it to args.filters.expect("expected filter specs for FORMAT_RAW") so the panic message documents the assumption if it ever fails, leaving the remainder (parse_filter_chain_spec, Stream::new_raw_decoder, LzmaStream::new) unchanged.
119-131: Consider providing a meaningful message for memory errors.The empty string passed to
vm.new_memory_error("")differs from CPython's behavior which typically provides a descriptive message like "Memory allocation failed".Proposed fix
- Error::Mem => vm.new_memory_error(""), + Error::Mem => vm.new_memory_error("Memory allocation failed"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 119 - 131, In catch_lzma_error replace the empty message passed to vm.new_memory_error("") with a meaningful message (e.g. "Memory allocation failed") so Lzma Error::Mem produces the same descriptive text as CPython; locate the vm.new_memory_error call in the catch_lzma_error function and update the argument string accordingly to provide clear diagnostics.
440-449: Simplify redundant condition checks.The checks for
prop > 40andprop == 40can be combined since both returnu32::MAX.Proposed fix
fn lzma2_dict_size_from_prop(prop: u8) -> u32 { - if prop > 40 { - return u32::MAX; - } - if prop == 40 { + if prop >= 40 { return u32::MAX; } let prop = prop as u32; (2 | (prop & 1)) << (prop / 2 + 11) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 440 - 449, In lzma2_dict_size_from_prop, remove the redundant separate checks for prop > 40 and prop == 40 and replace them with a single check that returns u32::MAX when prop >= 40; keep the subsequent cast (let prop = prop as u32) and the existing return expression ((2 | (prop & 1)) << (prop / 2 + 11)) unchanged so behavior and types remain identical.
799-816: Duplicated branches ininit_alone.Both the
if let Some(_filter_specs)andelsebranches execute identical code. The TODO comment indicates filter options should be validated, but currently the filter specs are ignored. Consider either implementing the TODO or removing the dead branch.Proposed simplification (if keeping current behavior)
fn init_alone( preset: u32, - filter_specs: Option<Vec<PyObjectRef>>, + _filter_specs: Option<Vec<PyObjectRef>>, vm: &VirtualMachine, ) -> PyResult<Stream> { - if let Some(_filter_specs) = filter_specs { - // TODO: validate single LZMA1 filter and use its options - let options = LzmaOptions::new_preset(preset).map_err(|_| { - new_lzma_error(format!("Invalid compression preset: {preset}"), vm) - })?; - Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) - } else { - let options = LzmaOptions::new_preset(preset).map_err(|_| { - new_lzma_error(format!("Invalid compression preset: {preset}"), vm) - })?; - Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) - } + // TODO: validate single LZMA1 filter and use its options from filter_specs + let options = LzmaOptions::new_preset(preset).map_err(|_| { + new_lzma_error(format!("Invalid compression preset: {preset}"), vm) + })?; + Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 799 - 816, The init_alone function currently has duplicated if/else branches that both ignore filter_specs; either implement the TODO to validate a single LZMA1 filter and use its options or remove the conditional and simplify to a single path. Specifically, update init_alone (the filter_specs parameter) to parse and validate the provided filter_specs into LZMA options (or document/raise a clear error if unsupported), then call LzmaOptions::new_preset or construct options from parsed filter data and pass them to Stream::new_lzma_encoder (using catch_lzma_error/new_lzma_error as needed); if you choose not to implement filter handling, collapse the if/else into one branch that ignores filter_specs and returns Stream::new_lzma_encoder(&options).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 568-570: The FILTER_DELTA arm currently computes (dist - 1)
without validating dist, allowing dist=0 to underflow; update this arm to mirror
parse_filter_spec_delta's validation by retrieving dist via get_dict_opt_u32 (or
using the same default), then check that 1 <= dist && dist <= 256 and return an
appropriate error if not; only after validation compute (dist - 1) as u8 and
return Ok(vec![(dist - 1) as u8]).
---
Nitpick comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 654-659: Replace the direct unwrap() on args.filters in the
FORMAT_RAW branch with expect() and a clear message describing the invariant;
locate the code in the FORMAT_RAW arm where args.filters.unwrap() is called
(near parse_filter_chain_spec and Stream::new_raw_decoder) and change it to
args.filters.expect("expected filter specs for FORMAT_RAW") so the panic message
documents the assumption if it ever fails, leaving the remainder
(parse_filter_chain_spec, Stream::new_raw_decoder, LzmaStream::new) unchanged.
- Around line 119-131: In catch_lzma_error replace the empty message passed to
vm.new_memory_error("") with a meaningful message (e.g. "Memory allocation
failed") so Lzma Error::Mem produces the same descriptive text as CPython;
locate the vm.new_memory_error call in the catch_lzma_error function and update
the argument string accordingly to provide clear diagnostics.
- Around line 440-449: In lzma2_dict_size_from_prop, remove the redundant
separate checks for prop > 40 and prop == 40 and replace them with a single
check that returns u32::MAX when prop >= 40; keep the subsequent cast (let prop
= prop as u32) and the existing return expression ((2 | (prop & 1)) << (prop / 2
+ 11)) unchanged so behavior and types remain identical.
- Around line 799-816: The init_alone function currently has duplicated if/else
branches that both ignore filter_specs; either implement the TODO to validate a
single LZMA1 filter and use its options or remove the conditional and simplify
to a single path. Specifically, update init_alone (the filter_specs parameter)
to parse and validate the provided filter_specs into LZMA options (or
document/raise a clear error if unsupported), then call LzmaOptions::new_preset
or construct options from parsed filter data and pass them to
Stream::new_lzma_encoder (using catch_lzma_error/new_lzma_error as needed); if
you choose not to implement filter handling, collapse the if/else into one
branch that ignores filter_specs and returns Stream::new_lzma_encoder(&options).
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
Cargo.lockis excluded by!**/*.lockLib/tarfile.pyis excluded by!Lib/**Lib/test/test_lzma.pyis excluded by!Lib/**Lib/test/test_tarfile.pyis excluded by!Lib/**
📒 Files selected for processing (3)
crates/stdlib/Cargo.tomlcrates/stdlib/src/compression.rscrates/stdlib/src/lzma.rs
07632c7 to
3347f29
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 197-199: The error message is misleading: the code uses
spec.downcast_ref::<PyDict>() and therefore requires an actual PyDict, so update
the TypeError text to say "Filter specifier must be a dict" (or similarly
explicit) in the closure passed to downcast_ref::<PyDict>() where `spec` is
downcast and assigned to `dict`; make the same change in the other occurrence(s)
that use downcast_ref::<PyDict>() (the block around where `dict` is created and
the later similar helper) so both helpers consistently require a dict.
- Around line 741-752: The code path for FORMAT_ALONE currently ignores
user-provided filter_specs: when filter_specs is Some it still uses
LzmaOptions::new_preset(preset) and calls Stream::new_lzma_encoder(&options),
silently dropping filters; change this to fail-fast by detecting
Some(filter_specs) and returning a descriptive LZMA error (use
new_lzma_error(..., vm)) indicating FORMAT_ALONE does not support custom filters
yet, instead of proceeding to build options from the preset; update the branch
that currently matches if let Some(_filter_specs) = filter_specs to return that
error and leave the existing preset-based logic only in the else branch.
- Around line 446-465: The _decode_filter_properties implementation accepts
malformed property blobs; enforce strict length checks: for FILTER_LZMA2 and
FILTER_DELTA ensure props.len() == 1 (return Err(new_lzma_error(...)) if not)
before accessing props[0], and for BCJ filters (FILTER_X86, FILTER_POWERPC,
FILTER_IA64, FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) only accept props.len()
== 0 or props.len() == 4 and otherwise return Err(new_lzma_error(...)); update
the branches that currently only check props.is_empty() or len()==4 so they
validate lengths and use new_lzma_error to signal invalid/unsupported options
instead of silently decoding invalid encoded_props.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
Cargo.lockis excluded by!**/*.lockLib/tarfile.pyis excluded by!Lib/**Lib/test/test_lzma.pyis excluded by!Lib/**Lib/test/test_tarfile.pyis excluded by!Lib/**
📒 Files selected for processing (3)
crates/stdlib/Cargo.tomlcrates/stdlib/src/compression.rscrates/stdlib/src/lzma.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/stdlib/src/compression.rs
| FILTER_LZMA2 => { | ||
| if props.is_empty() { | ||
| return Err(new_lzma_error("Invalid or unsupported options", vm)); | ||
| } | ||
| let dict_size = lzma2_dict_size_from_prop(props[0]); | ||
| dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?; | ||
| } | ||
| FILTER_DELTA => { | ||
| if props.is_empty() { | ||
| return Err(new_lzma_error("Invalid or unsupported options", vm)); | ||
| } | ||
| let dist = props[0] as u32 + 1; | ||
| dict.set_item("dist", vm.new_pyobj(dist), vm)?; | ||
| } | ||
| FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB | ||
| | FILTER_SPARC => { | ||
| if props.len() == 4 { | ||
| let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]); | ||
| dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?; | ||
| } |
There was a problem hiding this comment.
_decode_filter_properties accepts malformed property blobs.
FILTER_LZMA2 and FILTER_DELTA only reject empty input, and BCJ filters accept non-0/4 lengths silently. This can decode invalid encoded_props instead of raising LZMAError.
Proposed fix
match filter_id {
FILTER_LZMA1 => {
let (lc, lp, pb, dict_size) = decode_lzma1_properties(props)
.ok_or_else(|| new_lzma_error("Invalid or unsupported options", vm))?;
dict.set_item("lc", vm.new_pyobj(lc), vm)?;
dict.set_item("lp", vm.new_pyobj(lp), vm)?;
dict.set_item("pb", vm.new_pyobj(pb), vm)?;
dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?;
}
FILTER_LZMA2 => {
- if props.is_empty() {
+ if props.len() != 1 {
return Err(new_lzma_error("Invalid or unsupported options", vm));
}
let dict_size = lzma2_dict_size_from_prop(props[0]);
dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?;
}
FILTER_DELTA => {
- if props.is_empty() {
+ if props.len() != 1 {
return Err(new_lzma_error("Invalid or unsupported options", vm));
}
let dist = props[0] as u32 + 1;
dict.set_item("dist", vm.new_pyobj(dist), vm)?;
}
FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB
| FILTER_SPARC => {
- if props.len() == 4 {
+ if props.is_empty() {
+ // default start_offset = 0
+ } else if props.len() == 4 {
let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]);
dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?;
+ } else {
+ return Err(new_lzma_error("Invalid or unsupported options", vm));
}
}
_ => {
return Err(vm.new_value_error(format!("Invalid filter ID: {filter_id}")));
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/stdlib/src/lzma.rs` around lines 446 - 465, The
_decode_filter_properties implementation accepts malformed property blobs;
enforce strict length checks: for FILTER_LZMA2 and FILTER_DELTA ensure
props.len() == 1 (return Err(new_lzma_error(...)) if not) before accessing
props[0], and for BCJ filters (FILTER_X86, FILTER_POWERPC, FILTER_IA64,
FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) only accept props.len() == 0 or
props.len() == 4 and otherwise return Err(new_lzma_error(...)); update the
branches that currently only check props.is_empty() or len()==4 so they validate
lengths and use new_lzma_error to signal invalid/unsupported options instead of
silently decoding invalid encoded_props.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/stdlib/src/lzma.rs (1)
386-395: Redundant conditional branches.Lines 387-392 have two separate checks that both return
u32::MAX. These can be combined.Proposed simplification
fn lzma2_dict_size_from_prop(prop: u8) -> u32 { - if prop > 40 { - return u32::MAX; - } - if prop == 40 { + if prop >= 40 { return u32::MAX; } let prop = prop as u32; (2 | (prop & 1)) << (prop / 2 + 11) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 386 - 395, The function lzma2_dict_size_from_prop has redundant checks for prop > 40 and prop == 40 that both return u32::MAX; replace them with a single combined check (e.g., if prop >= 40 { return u32::MAX; }) in lzma2_dict_size_from_prop and remove the duplicate branch so the remaining logic (casting prop to u32 and computing (2 | (prop & 1)) << (prop / 2 + 11)) is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 83-94: The MemoryError created in catch_lzma_error uses an empty
message which is unhelpful; update the branch handling Error::Mem to call
vm.new_memory_error with a descriptive string (e.g., "Memory allocation failed")
so that catch_lzma_error returns a meaningful message for the
vm.new_memory_error(...) call rather than an empty string.
---
Nitpick comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 386-395: The function lzma2_dict_size_from_prop has redundant
checks for prop > 40 and prop == 40 that both return u32::MAX; replace them with
a single combined check (e.g., if prop >= 40 { return u32::MAX; }) in
lzma2_dict_size_from_prop and remove the duplicate branch so the remaining logic
(casting prop to u32 and computing (2 | (prop & 1)) << (prop / 2 + 11)) is
unchanged.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockLib/test/test_lzma.pyis excluded by!Lib/**
📒 Files selected for processing (3)
crates/stdlib/Cargo.tomlcrates/stdlib/src/compression.rscrates/stdlib/src/lzma.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/stdlib/src/compression.rs
| fn catch_lzma_error(err: Error, vm: &VirtualMachine) -> PyBaseExceptionRef { | ||
| match err { | ||
| Error::UnsupportedCheck => new_lzma_error("Unsupported integrity check", vm), | ||
| Error::Mem => vm.new_memory_error(""), | ||
| Error::MemLimit => new_lzma_error("Memory usage limit exceeded", vm), | ||
| Error::Format => new_lzma_error("Input format not supported by decoder", vm), | ||
| Error::Options => new_lzma_error("Invalid or unsupported options", vm), | ||
| Error::Data => new_lzma_error("Corrupt input data", vm), | ||
| Error::Program => new_lzma_error("Internal error", vm), | ||
| Error::NoCheck => new_lzma_error("Corrupt input data", vm), | ||
| } | ||
| } |
There was a problem hiding this comment.
Empty message in memory error may be unhelpful.
Line 86 creates a memory error with an empty string. Consider providing a meaningful message like "Memory allocation failed" to help users diagnose issues.
Proposed fix
- Error::Mem => vm.new_memory_error(""),
+ Error::Mem => vm.new_memory_error("Memory allocation failed"),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn catch_lzma_error(err: Error, vm: &VirtualMachine) -> PyBaseExceptionRef { | |
| match err { | |
| Error::UnsupportedCheck => new_lzma_error("Unsupported integrity check", vm), | |
| Error::Mem => vm.new_memory_error(""), | |
| Error::MemLimit => new_lzma_error("Memory usage limit exceeded", vm), | |
| Error::Format => new_lzma_error("Input format not supported by decoder", vm), | |
| Error::Options => new_lzma_error("Invalid or unsupported options", vm), | |
| Error::Data => new_lzma_error("Corrupt input data", vm), | |
| Error::Program => new_lzma_error("Internal error", vm), | |
| Error::NoCheck => new_lzma_error("Corrupt input data", vm), | |
| } | |
| } | |
| fn catch_lzma_error(err: Error, vm: &VirtualMachine) -> PyBaseExceptionRef { | |
| match err { | |
| Error::UnsupportedCheck => new_lzma_error("Unsupported integrity check", vm), | |
| Error::Mem => vm.new_memory_error("Memory allocation failed"), | |
| Error::MemLimit => new_lzma_error("Memory usage limit exceeded", vm), | |
| Error::Format => new_lzma_error("Input format not supported by decoder", vm), | |
| Error::Options => new_lzma_error("Invalid or unsupported options", vm), | |
| Error::Data => new_lzma_error("Corrupt input data", vm), | |
| Error::Program => new_lzma_error("Internal error", vm), | |
| Error::NoCheck => new_lzma_error("Corrupt input data", vm), | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/stdlib/src/lzma.rs` around lines 83 - 94, The MemoryError created in
catch_lzma_error uses an empty message which is unhelpful; update the branch
handling Error::Mem to call vm.new_memory_error with a descriptive string (e.g.,
"Memory allocation failed") so that catch_lzma_error returns a meaningful
message for the vm.new_memory_error(...) call rather than an empty string.
There was a problem hiding this comment.
♻️ Duplicate comments (4)
crates/stdlib/src/lzma.rs (4)
211-213:⚠️ Potential issue | 🟡 MinorError text says "dict or dict-like object" but only
PyDictis accepted.The code uses
downcast_ref::<PyDict>()which only accepts actualdictinstances, not mapping-like objects. Either support mapping protocol or tighten the message to "must be a dict".Option 1: Tighten the message
let dict = spec.downcast_ref::<PyDict>().ok_or_else(|| { - vm.new_type_error("Filter specifier must be a dict or dict-like object") + vm.new_type_error("Filter specifier must be a dict") })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 211 - 213, The error message claims "dict or dict-like object" but the code only accepts actual dicts via spec.downcast_ref::<PyDict>(), so change the behavior or the message; simplest fix: tighten the TypeError text used in vm.new_type_error to "Filter specifier must be a dict" (replace the current message) so it accurately reflects that only PyDict instances are accepted by the downcast_ref::<PyDict>() check (update the string passed to vm.new_type_error where spec.downcast_ref::<PyDict>() is called).
100-100:⚠️ Potential issue | 🟡 MinorEmpty message in memory error is unhelpful.
The
Error::Memcase creates a memory error with an empty string. Consider providing a meaningful message like"Memory allocation failed"to help users diagnose issues.Proposed fix
- Error::Mem => vm.new_memory_error(""), + Error::Mem => vm.new_memory_error("Memory allocation failed"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` at line 100, The Error::Mem branch returns a memory error with an empty message; update the Error::Mem handling in lzma.rs to call vm.new_memory_error with a descriptive message (e.g., "Memory allocation failed") so the returned error includes useful context; locate the Error::Mem match arm and replace the empty string argument to vm.new_memory_error accordingly.
754-765:⚠️ Potential issue | 🟠 Major
FORMAT_ALONEsilently ignores user-providedfilters.Both branches execute identical code—the
filter_specsparameter is captured but never used. This silently drops caller configuration. TheTODOcomment acknowledges this, but silent data loss is problematic.Proposed fix: fail fast until full support is implemented
fn init_alone( preset: u32, filter_specs: Option<Vec<PyObjectRef>>, vm: &VirtualMachine, ) -> PyResult<Stream> { - if let Some(_filter_specs) = filter_specs { - // TODO: validate single LZMA1 filter and use its options - let options = LzmaOptions::new_preset(preset).map_err(|_| { - new_lzma_error(format!("Invalid compression preset: {preset}"), vm) - })?; - Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) - } else { - let options = LzmaOptions::new_preset(preset).map_err(|_| { - new_lzma_error(format!("Invalid compression preset: {preset}"), vm) - })?; - Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) - } + if filter_specs.is_some() { + return Err(new_lzma_error( + "FORMAT_ALONE with custom filters is not yet supported", + vm, + )); + } + let options = LzmaOptions::new_preset(preset).map_err(|_| { + new_lzma_error(format!("Invalid compression preset: {preset}"), vm) + })?; + Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 754 - 765, The code currently ignores provided filter_specs in the FORMAT_ALONE branch (the if let Some(_filter_specs) = filter_specs block) and proceeds to call LzmaOptions::new_preset(...) and Stream::new_lzma_encoder(...), causing silent loss of caller configuration; change this to fail fast by returning a new LZMA error when filter_specs is Some(...) (e.g., construct and return new_lzma_error("Filters not supported for FORMAT_ALONE", vm) via map_err or early return) instead of proceeding to use LzmaOptions::new_preset and Stream::new_lzma_encoder; keep the existing behavior for the None case unchanged and reference the functions LzmaOptions::new_preset and Stream::new_lzma_encoder and the variable filter_specs to locate the change.
460-479:⚠️ Potential issue | 🟠 Major
build_filter_specaccepts malformed property blobs.
FILTER_LZMA2andFILTER_DELTAonly reject empty input, allowing extra bytes to be silently ignored. BCJ filters accept non-0/4lengths silently. This can decode invalidencoded_propsinstead of raisingLZMAError.Proposed fix with strict length validation
FILTER_LZMA2 => { - if props.is_empty() { + if props.len() != 1 { return Err(new_lzma_error("Invalid or unsupported options", vm)); } let dict_size = lzma2_dict_size_from_prop(props[0]); dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?; } FILTER_DELTA => { - if props.is_empty() { + if props.len() != 1 { return Err(new_lzma_error("Invalid or unsupported options", vm)); } let dist = props[0] as u32 + 1; dict.set_item("dist", vm.new_pyobj(dist), vm)?; } FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB | FILTER_SPARC => { - if props.len() == 4 { + if props.is_empty() { + // default start_offset = 0, no entry added + } else if props.len() == 4 { let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]); dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?; + } else { + return Err(new_lzma_error("Invalid or unsupported options", vm)); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 460 - 479, build_filter_spec currently accepts malformed property blobs: enforce strict length checks for FILTER_LZMA2 and FILTER_DELTA to require exactly one byte (props.len() == 1) and return Err(new_lzma_error("Invalid or unsupported options", vm)) otherwise; for BCJ filters (FILTER_X86, FILTER_POWERPC, FILTER_IA64, FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) require props.len() == 0 || props.len() == 4 and return the same LZMAError when the length is anything else; update the corresponding match arms (the FILTER_LZMA2, FILTER_DELTA arms and the BCJ arm) to perform these checks before decoding values and calling dict.set_item.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 211-213: The error message claims "dict or dict-like object" but
the code only accepts actual dicts via spec.downcast_ref::<PyDict>(), so change
the behavior or the message; simplest fix: tighten the TypeError text used in
vm.new_type_error to "Filter specifier must be a dict" (replace the current
message) so it accurately reflects that only PyDict instances are accepted by
the downcast_ref::<PyDict>() check (update the string passed to
vm.new_type_error where spec.downcast_ref::<PyDict>() is called).
- Line 100: The Error::Mem branch returns a memory error with an empty message;
update the Error::Mem handling in lzma.rs to call vm.new_memory_error with a
descriptive message (e.g., "Memory allocation failed") so the returned error
includes useful context; locate the Error::Mem match arm and replace the empty
string argument to vm.new_memory_error accordingly.
- Around line 754-765: The code currently ignores provided filter_specs in the
FORMAT_ALONE branch (the if let Some(_filter_specs) = filter_specs block) and
proceeds to call LzmaOptions::new_preset(...) and Stream::new_lzma_encoder(...),
causing silent loss of caller configuration; change this to fail fast by
returning a new LZMA error when filter_specs is Some(...) (e.g., construct and
return new_lzma_error("Filters not supported for FORMAT_ALONE", vm) via map_err
or early return) instead of proceeding to use LzmaOptions::new_preset and
Stream::new_lzma_encoder; keep the existing behavior for the None case unchanged
and reference the functions LzmaOptions::new_preset and Stream::new_lzma_encoder
and the variable filter_specs to locate the change.
- Around line 460-479: build_filter_spec currently accepts malformed property
blobs: enforce strict length checks for FILTER_LZMA2 and FILTER_DELTA to require
exactly one byte (props.len() == 1) and return Err(new_lzma_error("Invalid or
unsupported options", vm)) otherwise; for BCJ filters (FILTER_X86,
FILTER_POWERPC, FILTER_IA64, FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) require
props.len() == 0 || props.len() == 4 and return the same LZMAError when the
length is anything else; update the corresponding match arms (the FILTER_LZMA2,
FILTER_DELTA arms and the BCJ arm) to perform these checks before decoding
values and calling dict.set_item.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockLib/test/test_lzma.pyis excluded by!Lib/**
📒 Files selected for processing (3)
crates/stdlib/Cargo.tomlcrates/stdlib/src/compression.rscrates/stdlib/src/lzma.rs
There was a problem hiding this comment.
♻️ Duplicate comments (4)
crates/stdlib/src/lzma.rs (4)
98-102:⚠️ Potential issue | 🟡 MinorUse a non-empty message for
MemoryError.
Error::Memcurrently producesMemoryError(""), which is hard to diagnose in user reports.Suggested fix
- Error::Mem => vm.new_memory_error(""), + Error::Mem => vm.new_memory_error("Memory allocation failed"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 98 - 102, In catch_lzma_error, Error::Mem returns an empty MemoryError message which is unhelpful; update the Error::Mem arm in the catch_lzma_error function to call vm.new_memory_error with a descriptive non-empty string (e.g., "LZMA decompression memory exhausted" or similar) so MemoryError carries useful context when raised.
461-481:⚠️ Potential issue | 🟠 Major
build_filter_specaccepts malformed property lengths.
FILTER_LZMA2/FILTER_DELTAshould require exactly 1 byte, and BCJ filters should accept only 0 or 4 bytes; current logic silently accepts invalid blobs.Suggested fix
FILTER_LZMA2 => { - if props.is_empty() { + if props.len() != 1 { return Err(new_lzma_error("Invalid or unsupported options", vm)); } let dict_size = lzma2_dict_size_from_prop(props[0]); dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?; } FILTER_DELTA => { - if props.is_empty() { + if props.len() != 1 { return Err(new_lzma_error("Invalid or unsupported options", vm)); } let dist = props[0] as u32 + 1; dict.set_item("dist", vm.new_pyobj(dist), vm)?; } FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB | FILTER_SPARC => { - if props.len() == 4 { + if props.is_empty() { + // default start_offset = 0 + } else if props.len() == 4 { let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]); dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?; + } else { + return Err(new_lzma_error("Invalid or unsupported options", vm)); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 461 - 481, The build_filter_spec branch handling FILTER_LZMA2, FILTER_DELTA, and BCJ filters (FILTER_X86, FILTER_POWERPC, FILTER_IA64, FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) currently accepts malformed props lengths; update it so FILTER_LZMA2 and FILTER_DELTA require props.len() == 1 (otherwise return Err(new_lzma_error("Invalid or unsupported options", vm))) and the BCJ branches only accept props.len() == 0 or props.len() == 4 (if 4 parse start_offset, else if 0 do nothing; any other length return the same LZMA error). Keep the existing dict.set_item calls but gate them behind these strict length checks in build_filter_spec.
755-766:⚠️ Potential issue | 🟠 Major
FORMAT_ALONEcurrently ignores caller-providedfilters.When
filter_specsisSome, this path still builds encoder options only frompreset, silently dropping user configuration.As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".Suggested fix
fn init_alone( preset: u32, filter_specs: Option<Vec<PyObjectRef>>, vm: &VirtualMachine, ) -> PyResult<Stream> { - if let Some(_filter_specs) = filter_specs { - // TODO: validate single LZMA1 filter and use its options - let options = LzmaOptions::new_preset(preset).map_err(|_| { - new_lzma_error(format!("Invalid compression preset: {preset}"), vm) - })?; - Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) - } else { - let options = LzmaOptions::new_preset(preset).map_err(|_| { - new_lzma_error(format!("Invalid compression preset: {preset}"), vm) - })?; - Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) - } + if filter_specs.is_some() { + return Err(new_lzma_error( + "FORMAT_ALONE with custom filters is not implemented", + vm, + )); + } + let options = LzmaOptions::new_preset(preset) + .map_err(|_| new_lzma_error(format!("Invalid compression preset: {preset}"), vm))?; + Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 755 - 766, The branch currently ignores caller-provided filters: when filter_specs is Some you still call LzmaOptions::new_preset(preset) and drop the filters; instead, compute the LZMA encoder options once by first deriving options depending on filter_specs (validate that filter_specs contains a single LZMA1 filter and build LzmaOptions from its options when Some, falling back to LzmaOptions::new_preset(preset) when None), then call Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) a single time; update the code around filter_specs, LzmaOptions::new_preset and Stream::new_lzma_encoder to remove the duplicated logic and correctly honor the caller-provided filters.
207-233:⚠️ Potential issue | 🟡 MinorTypeError text over-promises dict-like support.
Both helpers require
PyDictviadowncast_ref::<PyDict>(), but the message says “dict or dict-like object”.Suggested fix
- let dict = spec.downcast_ref::<PyDict>().ok_or_else(|| { - vm.new_type_error("Filter specifier must be a dict or dict-like object") - })?; + let dict = spec + .downcast_ref::<PyDict>() + .ok_or_else(|| vm.new_type_error("Filter specifier must be a dict"))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/lzma.rs` around lines 207 - 233, The TypeError message in get_dict_opt_u32 and get_dict_opt_u64 incorrectly promises "dict-like" support while downcasting only to PyDict; update the vm.new_type_error call in both functions to state that the filter specifier "must be a dict" (or alternatively, implement true mapping support by accepting mapping protocol), so the error text matches the actual check performed by the downcast_ref::<PyDict>() used in get_dict_opt_u32 and get_dict_opt_u64.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 98-102: In catch_lzma_error, Error::Mem returns an empty
MemoryError message which is unhelpful; update the Error::Mem arm in the
catch_lzma_error function to call vm.new_memory_error with a descriptive
non-empty string (e.g., "LZMA decompression memory exhausted" or similar) so
MemoryError carries useful context when raised.
- Around line 461-481: The build_filter_spec branch handling FILTER_LZMA2,
FILTER_DELTA, and BCJ filters (FILTER_X86, FILTER_POWERPC, FILTER_IA64,
FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) currently accepts malformed props
lengths; update it so FILTER_LZMA2 and FILTER_DELTA require props.len() == 1
(otherwise return Err(new_lzma_error("Invalid or unsupported options", vm))) and
the BCJ branches only accept props.len() == 0 or props.len() == 4 (if 4 parse
start_offset, else if 0 do nothing; any other length return the same LZMA
error). Keep the existing dict.set_item calls but gate them behind these strict
length checks in build_filter_spec.
- Around line 755-766: The branch currently ignores caller-provided filters:
when filter_specs is Some you still call LzmaOptions::new_preset(preset) and
drop the filters; instead, compute the LZMA encoder options once by first
deriving options depending on filter_specs (validate that filter_specs contains
a single LZMA1 filter and build LzmaOptions from its options when Some, falling
back to LzmaOptions::new_preset(preset) when None), then call
Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) a single
time; update the code around filter_specs, LzmaOptions::new_preset and
Stream::new_lzma_encoder to remove the duplicated logic and correctly honor the
caller-provided filters.
- Around line 207-233: The TypeError message in get_dict_opt_u32 and
get_dict_opt_u64 incorrectly promises "dict-like" support while downcasting only
to PyDict; update the vm.new_type_error call in both functions to state that the
filter specifier "must be a dict" (or alternatively, implement true mapping
support by accepting mapping protocol), so the error text matches the actual
check performed by the downcast_ref::<PyDict>() used in get_dict_opt_u32 and
get_dict_opt_u64.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
Cargo.lockis excluded by!**/*.lockLib/tarfile.pyis excluded by!Lib/**Lib/test/test_lzma.pyis excluded by!Lib/**Lib/test/test_tarfile.pyis excluded by!Lib/**
📒 Files selected for processing (3)
crates/stdlib/Cargo.tomlcrates/stdlib/src/compression.rscrates/stdlib/src/lzma.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/stdlib/Cargo.toml
Closes #7213
Summary by CodeRabbit
New Features
Improvements
Breaking Changes