Skip to content

Update tarfile from v3.14.3 and fix bugs#7223

Merged
youknowone merged 4 commits intoRustPython:mainfrom
youknowone:tarfile
Feb 27, 2026
Merged

Update tarfile from v3.14.3 and fix bugs#7223
youknowone merged 4 commits intoRustPython:mainfrom
youknowone:tarfile

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Feb 25, 2026

Closes #7213

Summary by CodeRabbit

  • New Features

    • Exposes additional LZMA constants and adds filter-spec encode/decode helpers.
    • Adds explicit RAW-mode initialization with full filter-chain support.
  • Improvements

    • Centralized, clearer error mapping for compression/decompression.
    • Public decompressor accessor added for inspection and diagnostics.
  • Breaking Changes

    • Compressor/decompressor constructor arguments and initialization behavior changed; updates may be required.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced xz/lzma platform crates with liblzma crates; added a public accessor on DecompressState; and refactored the LZMA module to use a new LzmaStream wrapper, centralized error mapping, expanded filter/format handling, and new init paths (XZ/ALONE/RAW) with updated constructor fields/types and exposed constants.

Changes

Cohort / File(s) Summary
Cargo Dependencies
crates/stdlib/Cargo.toml
In target cfg(not(any(target_os = "android", target_arch = "wasm32"))): removed lzma-sys = "0.1" and xz2 = "0.1"; added liblzma = "0.4" and liblzma-sys = "0.4".
Compression API
crates/stdlib/src/compression.rs
Added pub fn decompressor(&self) -> &D on DecompressState<D> to expose the internal decompressor reference.
LZMA Module Refactor
crates/stdlib/src/lzma.rs
Switched from raw C-FFI constants to liblzma/liblzma_sys types; introduced LzmaStream wrapper with header/check tracking; centralized lzma error translation; added filter parse/serialize helpers (_encode_filter_properties, _decode_filter_properties); added RAW/XZ/ALONE init paths; changed constructor fields/types (mem_limitmemlimit, filters/preset types); exposed many pyattr constants (CHECK_, FILTER_, MF_, MODE_, PRESET_, FORMAT_).

Sequence Diagram(s)

sequenceDiagram
    participant PyClient as Python caller
    participant LZMAType as LZMACompressor / LZMADecompressor
    participant LzmaStream as LzmaStream (wrapper)
    participant LibLzma as liblzma

    PyClient->>LZMAType: construct(args: memlimit, filters, preset, format)
    LZMAType->>LzmaStream: init_xz/init_alone/init_raw(parse filters)
    LzmaStream->>LibLzma: call liblzma init APIs
    LibLzma-->>LzmaStream: stream handle + header/check info
    LzmaStream-->>LZMAType: expose stream (check/eof/unused_data)
    PyClient->>LZMAType: compress()/decompress(data)
    LZMAType->>LzmaStream: process(data)
    LzmaStream->>LibLzma: stream process calls
    LibLzma-->>LzmaStream: processed bytes / status
    LzmaStream-->>LZMAType: result / header/check info
    LZMAType-->>PyClient: return bytes / raise mapped errors
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • gc is vm/stdlib #6929: Modifies the stdlib compression module and touches crates/stdlib/src/compression.rs, likely related to LZMA/compression changes.

Suggested reviewers

  • ShaharNaveh

Poem

🐇

I nibbled headers, wrapped the stream,
chased filters down a bytes-made seam,
memlimit set and checks in sight,
I hopped through init_xz and raw by night,
now compressed dreams sleep soft and light.

🚥 Pre-merge checks | ✅ 1 | ❌ 4

❌ Failed checks (4 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title mentions updating tarfile from v3.14.3, but the actual changes are in LZMA compression modules (lzma.rs and compression.rs), not tarfile. Update the PR title to accurately reflect the actual changes, such as 'Update LZMA compression implementation with liblzma dependencies and expose decompressor interface'.
Linked Issues check ⚠️ Warning The linked issue #7213 is about updating tarfile from v3.14.3, but the actual code changes are in LZMA compression modules unrelated to tarfile. Verify that the correct linked issues are associated with this PR. The changes appear unrelated to tarfile updates and should reference appropriate LZMA-related issues.
Out of Scope Changes check ⚠️ Warning The PR contains extensive changes to LZMA compression (dependency updates, new wrapper types, filter parsing) that are completely unrelated to the stated objective of updating tarfile from v3.14.3. Either update the PR scope documentation and linked issues to match the LZMA changes, or separate the LZMA work into a different PR focused on compression module refactoring.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin tarfile

@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/tarfile.py
[ ] test: cpython/Lib/test/test_tarfile.py

dependencies:

  • tarfile

dependent tests: (2 tests)

  • tarfile: test_shutil test_tarfile

[x] lib: cpython/Lib/lzma.py
[x] test: cpython/Lib/test/test_lzma.py (TODO: 13)

dependencies:

  • lzma

dependent tests: (1 tests)

  • lzma: test_lzma

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone force-pushed the tarfile branch 6 times, most recently from f90575b to c4b01ad Compare February 26, 2026 17:02
@youknowone youknowone marked this pull request as ready for review February 26, 2026 17:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
crates/stdlib/src/lzma.rs (4)

654-659: Prefer expect() over unwrap() for safety-critical unwraps.

While the comment explains the safety, using expect() with a descriptive message provides better documentation and panic messages if the invariant is ever violated.

Proposed fix
                 FORMAT_RAW => {
-                    let filter_specs = args.filters.unwrap(); // safe: checked above
+                    let filter_specs = args.filters.expect("filters required for FORMAT_RAW");
                     let filters = parse_filter_chain_spec(filter_specs, vm)?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 654 - 659, Replace the direct
unwrap() on args.filters in the FORMAT_RAW branch with expect() and a clear
message describing the invariant; locate the code in the FORMAT_RAW arm where
args.filters.unwrap() is called (near parse_filter_chain_spec and
Stream::new_raw_decoder) and change it to args.filters.expect("expected filter
specs for FORMAT_RAW") so the panic message documents the assumption if it ever
fails, leaving the remainder (parse_filter_chain_spec, Stream::new_raw_decoder,
LzmaStream::new) unchanged.

119-131: Consider providing a meaningful message for memory errors.

The empty string passed to vm.new_memory_error("") differs from CPython's behavior which typically provides a descriptive message like "Memory allocation failed".

Proposed fix
-            Error::Mem => vm.new_memory_error(""),
+            Error::Mem => vm.new_memory_error("Memory allocation failed"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 119 - 131, In catch_lzma_error
replace the empty message passed to vm.new_memory_error("") with a meaningful
message (e.g. "Memory allocation failed") so Lzma Error::Mem produces the same
descriptive text as CPython; locate the vm.new_memory_error call in the
catch_lzma_error function and update the argument string accordingly to provide
clear diagnostics.

440-449: Simplify redundant condition checks.

The checks for prop > 40 and prop == 40 can be combined since both return u32::MAX.

Proposed fix
 fn lzma2_dict_size_from_prop(prop: u8) -> u32 {
-    if prop > 40 {
-        return u32::MAX;
-    }
-    if prop == 40 {
+    if prop >= 40 {
         return u32::MAX;
     }
     let prop = prop as u32;
     (2 | (prop & 1)) << (prop / 2 + 11)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 440 - 449, In
lzma2_dict_size_from_prop, remove the redundant separate checks for prop > 40
and prop == 40 and replace them with a single check that returns u32::MAX when
prop >= 40; keep the subsequent cast (let prop = prop as u32) and the existing
return expression ((2 | (prop & 1)) << (prop / 2 + 11)) unchanged so behavior
and types remain identical.

799-816: Duplicated branches in init_alone.

Both the if let Some(_filter_specs) and else branches execute identical code. The TODO comment indicates filter options should be validated, but currently the filter specs are ignored. Consider either implementing the TODO or removing the dead branch.

Proposed simplification (if keeping current behavior)
     fn init_alone(
         preset: u32,
-        filter_specs: Option<Vec<PyObjectRef>>,
+        _filter_specs: Option<Vec<PyObjectRef>>,
         vm: &VirtualMachine,
     ) -> PyResult<Stream> {
-        if let Some(_filter_specs) = filter_specs {
-            // TODO: validate single LZMA1 filter and use its options
-            let options = LzmaOptions::new_preset(preset).map_err(|_| {
-                new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
-            })?;
-            Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
-        } else {
-            let options = LzmaOptions::new_preset(preset).map_err(|_| {
-                new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
-            })?;
-            Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
-        }
+        // TODO: validate single LZMA1 filter and use its options from filter_specs
+        let options = LzmaOptions::new_preset(preset).map_err(|_| {
+            new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
+        })?;
+        Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 799 - 816, The init_alone function
currently has duplicated if/else branches that both ignore filter_specs; either
implement the TODO to validate a single LZMA1 filter and use its options or
remove the conditional and simplify to a single path. Specifically, update
init_alone (the filter_specs parameter) to parse and validate the provided
filter_specs into LZMA options (or document/raise a clear error if unsupported),
then call LzmaOptions::new_preset or construct options from parsed filter data
and pass them to Stream::new_lzma_encoder (using catch_lzma_error/new_lzma_error
as needed); if you choose not to implement filter handling, collapse the if/else
into one branch that ignores filter_specs and returns
Stream::new_lzma_encoder(&options).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 568-570: The FILTER_DELTA arm currently computes (dist - 1)
without validating dist, allowing dist=0 to underflow; update this arm to mirror
parse_filter_spec_delta's validation by retrieving dist via get_dict_opt_u32 (or
using the same default), then check that 1 <= dist && dist <= 256 and return an
appropriate error if not; only after validation compute (dist - 1) as u8 and
return Ok(vec![(dist - 1) as u8]).

---

Nitpick comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 654-659: Replace the direct unwrap() on args.filters in the
FORMAT_RAW branch with expect() and a clear message describing the invariant;
locate the code in the FORMAT_RAW arm where args.filters.unwrap() is called
(near parse_filter_chain_spec and Stream::new_raw_decoder) and change it to
args.filters.expect("expected filter specs for FORMAT_RAW") so the panic message
documents the assumption if it ever fails, leaving the remainder
(parse_filter_chain_spec, Stream::new_raw_decoder, LzmaStream::new) unchanged.
- Around line 119-131: In catch_lzma_error replace the empty message passed to
vm.new_memory_error("") with a meaningful message (e.g. "Memory allocation
failed") so Lzma Error::Mem produces the same descriptive text as CPython;
locate the vm.new_memory_error call in the catch_lzma_error function and update
the argument string accordingly to provide clear diagnostics.
- Around line 440-449: In lzma2_dict_size_from_prop, remove the redundant
separate checks for prop > 40 and prop == 40 and replace them with a single
check that returns u32::MAX when prop >= 40; keep the subsequent cast (let prop
= prop as u32) and the existing return expression ((2 | (prop & 1)) << (prop / 2
+ 11)) unchanged so behavior and types remain identical.
- Around line 799-816: The init_alone function currently has duplicated if/else
branches that both ignore filter_specs; either implement the TODO to validate a
single LZMA1 filter and use its options or remove the conditional and simplify
to a single path. Specifically, update init_alone (the filter_specs parameter)
to parse and validate the provided filter_specs into LZMA options (or
document/raise a clear error if unsupported), then call LzmaOptions::new_preset
or construct options from parsed filter data and pass them to
Stream::new_lzma_encoder (using catch_lzma_error/new_lzma_error as needed); if
you choose not to implement filter handling, collapse the if/else into one
branch that ignores filter_specs and returns Stream::new_lzma_encoder(&options).

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c98215a and 31d7d58.

⛔ Files ignored due to path filters (4)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/tarfile.py is excluded by !Lib/**
  • Lib/test/test_lzma.py is excluded by !Lib/**
  • Lib/test/test_tarfile.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/compression.rs
  • crates/stdlib/src/lzma.rs

@youknowone youknowone force-pushed the tarfile branch 3 times, most recently from 07632c7 to 3347f29 Compare February 26, 2026 18:28
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 197-199: The error message is misleading: the code uses
spec.downcast_ref::<PyDict>() and therefore requires an actual PyDict, so update
the TypeError text to say "Filter specifier must be a dict" (or similarly
explicit) in the closure passed to downcast_ref::<PyDict>() where `spec` is
downcast and assigned to `dict`; make the same change in the other occurrence(s)
that use downcast_ref::<PyDict>() (the block around where `dict` is created and
the later similar helper) so both helpers consistently require a dict.
- Around line 741-752: The code path for FORMAT_ALONE currently ignores
user-provided filter_specs: when filter_specs is Some it still uses
LzmaOptions::new_preset(preset) and calls Stream::new_lzma_encoder(&options),
silently dropping filters; change this to fail-fast by detecting
Some(filter_specs) and returning a descriptive LZMA error (use
new_lzma_error(..., vm)) indicating FORMAT_ALONE does not support custom filters
yet, instead of proceeding to build options from the preset; update the branch
that currently matches if let Some(_filter_specs) = filter_specs to return that
error and leave the existing preset-based logic only in the else branch.
- Around line 446-465: The _decode_filter_properties implementation accepts
malformed property blobs; enforce strict length checks: for FILTER_LZMA2 and
FILTER_DELTA ensure props.len() == 1 (return Err(new_lzma_error(...)) if not)
before accessing props[0], and for BCJ filters (FILTER_X86, FILTER_POWERPC,
FILTER_IA64, FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) only accept props.len()
== 0 or props.len() == 4 and otherwise return Err(new_lzma_error(...)); update
the branches that currently only check props.is_empty() or len()==4 so they
validate lengths and use new_lzma_error to signal invalid/unsupported options
instead of silently decoding invalid encoded_props.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31d7d58 and 3347f29.

⛔ Files ignored due to path filters (4)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/tarfile.py is excluded by !Lib/**
  • Lib/test/test_lzma.py is excluded by !Lib/**
  • Lib/test/test_tarfile.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/compression.rs
  • crates/stdlib/src/lzma.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/stdlib/src/compression.rs

Comment on lines +446 to +465
FILTER_LZMA2 => {
if props.is_empty() {
return Err(new_lzma_error("Invalid or unsupported options", vm));
}
let dict_size = lzma2_dict_size_from_prop(props[0]);
dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?;
}
FILTER_DELTA => {
if props.is_empty() {
return Err(new_lzma_error("Invalid or unsupported options", vm));
}
let dist = props[0] as u32 + 1;
dict.set_item("dist", vm.new_pyobj(dist), vm)?;
}
FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB
| FILTER_SPARC => {
if props.len() == 4 {
let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]);
dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

_decode_filter_properties accepts malformed property blobs.

FILTER_LZMA2 and FILTER_DELTA only reject empty input, and BCJ filters accept non-0/4 lengths silently. This can decode invalid encoded_props instead of raising LZMAError.

Proposed fix
         match filter_id {
             FILTER_LZMA1 => {
                 let (lc, lp, pb, dict_size) = decode_lzma1_properties(props)
                     .ok_or_else(|| new_lzma_error("Invalid or unsupported options", vm))?;
                 dict.set_item("lc", vm.new_pyobj(lc), vm)?;
                 dict.set_item("lp", vm.new_pyobj(lp), vm)?;
                 dict.set_item("pb", vm.new_pyobj(pb), vm)?;
                 dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?;
             }
             FILTER_LZMA2 => {
-                if props.is_empty() {
+                if props.len() != 1 {
                     return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
                 let dict_size = lzma2_dict_size_from_prop(props[0]);
                 dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?;
             }
             FILTER_DELTA => {
-                if props.is_empty() {
+                if props.len() != 1 {
                     return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
                 let dist = props[0] as u32 + 1;
                 dict.set_item("dist", vm.new_pyobj(dist), vm)?;
             }
             FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB
             | FILTER_SPARC => {
-                if props.len() == 4 {
+                if props.is_empty() {
+                    // default start_offset = 0
+                } else if props.len() == 4 {
                     let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]);
                     dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?;
+                } else {
+                    return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
             }
             _ => {
                 return Err(vm.new_value_error(format!("Invalid filter ID: {filter_id}")));
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 446 - 465, The
_decode_filter_properties implementation accepts malformed property blobs;
enforce strict length checks: for FILTER_LZMA2 and FILTER_DELTA ensure
props.len() == 1 (return Err(new_lzma_error(...)) if not) before accessing
props[0], and for BCJ filters (FILTER_X86, FILTER_POWERPC, FILTER_IA64,
FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) only accept props.len() == 0 or
props.len() == 4 and otherwise return Err(new_lzma_error(...)); update the
branches that currently only check props.is_empty() or len()==4 so they validate
lengths and use new_lzma_error to signal invalid/unsupported options instead of
silently decoding invalid encoded_props.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/stdlib/src/lzma.rs (1)

386-395: Redundant conditional branches.

Lines 387-392 have two separate checks that both return u32::MAX. These can be combined.

Proposed simplification
 fn lzma2_dict_size_from_prop(prop: u8) -> u32 {
-    if prop > 40 {
-        return u32::MAX;
-    }
-    if prop == 40 {
+    if prop >= 40 {
         return u32::MAX;
     }
     let prop = prop as u32;
     (2 | (prop & 1)) << (prop / 2 + 11)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 386 - 395, The function
lzma2_dict_size_from_prop has redundant checks for prop > 40 and prop == 40 that
both return u32::MAX; replace them with a single combined check (e.g., if prop
>= 40 { return u32::MAX; }) in lzma2_dict_size_from_prop and remove the
duplicate branch so the remaining logic (casting prop to u32 and computing (2 |
(prop & 1)) << (prop / 2 + 11)) is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 83-94: The MemoryError created in catch_lzma_error uses an empty
message which is unhelpful; update the branch handling Error::Mem to call
vm.new_memory_error with a descriptive string (e.g., "Memory allocation failed")
so that catch_lzma_error returns a meaningful message for the
vm.new_memory_error(...) call rather than an empty string.

---

Nitpick comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 386-395: The function lzma2_dict_size_from_prop has redundant
checks for prop > 40 and prop == 40 that both return u32::MAX; replace them with
a single combined check (e.g., if prop >= 40 { return u32::MAX; }) in
lzma2_dict_size_from_prop and remove the duplicate branch so the remaining logic
(casting prop to u32 and computing (2 | (prop & 1)) << (prop / 2 + 11)) is
unchanged.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3347f29 and d4b67a8.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/test/test_lzma.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/compression.rs
  • crates/stdlib/src/lzma.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/stdlib/src/compression.rs

Comment on lines +83 to 94
fn catch_lzma_error(err: Error, vm: &VirtualMachine) -> PyBaseExceptionRef {
match err {
Error::UnsupportedCheck => new_lzma_error("Unsupported integrity check", vm),
Error::Mem => vm.new_memory_error(""),
Error::MemLimit => new_lzma_error("Memory usage limit exceeded", vm),
Error::Format => new_lzma_error("Input format not supported by decoder", vm),
Error::Options => new_lzma_error("Invalid or unsupported options", vm),
Error::Data => new_lzma_error("Corrupt input data", vm),
Error::Program => new_lzma_error("Internal error", vm),
Error::NoCheck => new_lzma_error("Corrupt input data", vm),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty message in memory error may be unhelpful.

Line 86 creates a memory error with an empty string. Consider providing a meaningful message like "Memory allocation failed" to help users diagnose issues.

Proposed fix
-            Error::Mem => vm.new_memory_error(""),
+            Error::Mem => vm.new_memory_error("Memory allocation failed"),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn catch_lzma_error(err: Error, vm: &VirtualMachine) -> PyBaseExceptionRef {
match err {
Error::UnsupportedCheck => new_lzma_error("Unsupported integrity check", vm),
Error::Mem => vm.new_memory_error(""),
Error::MemLimit => new_lzma_error("Memory usage limit exceeded", vm),
Error::Format => new_lzma_error("Input format not supported by decoder", vm),
Error::Options => new_lzma_error("Invalid or unsupported options", vm),
Error::Data => new_lzma_error("Corrupt input data", vm),
Error::Program => new_lzma_error("Internal error", vm),
Error::NoCheck => new_lzma_error("Corrupt input data", vm),
}
}
fn catch_lzma_error(err: Error, vm: &VirtualMachine) -> PyBaseExceptionRef {
match err {
Error::UnsupportedCheck => new_lzma_error("Unsupported integrity check", vm),
Error::Mem => vm.new_memory_error("Memory allocation failed"),
Error::MemLimit => new_lzma_error("Memory usage limit exceeded", vm),
Error::Format => new_lzma_error("Input format not supported by decoder", vm),
Error::Options => new_lzma_error("Invalid or unsupported options", vm),
Error::Data => new_lzma_error("Corrupt input data", vm),
Error::Program => new_lzma_error("Internal error", vm),
Error::NoCheck => new_lzma_error("Corrupt input data", vm),
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 83 - 94, The MemoryError created in
catch_lzma_error uses an empty message which is unhelpful; update the branch
handling Error::Mem to call vm.new_memory_error with a descriptive string (e.g.,
"Memory allocation failed") so that catch_lzma_error returns a meaningful
message for the vm.new_memory_error(...) call rather than an empty string.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
crates/stdlib/src/lzma.rs (4)

211-213: ⚠️ Potential issue | 🟡 Minor

Error text says "dict or dict-like object" but only PyDict is accepted.

The code uses downcast_ref::<PyDict>() which only accepts actual dict instances, not mapping-like objects. Either support mapping protocol or tighten the message to "must be a dict".

Option 1: Tighten the message
         let dict = spec.downcast_ref::<PyDict>().ok_or_else(|| {
-            vm.new_type_error("Filter specifier must be a dict or dict-like object")
+            vm.new_type_error("Filter specifier must be a dict")
         })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 211 - 213, The error message claims
"dict or dict-like object" but the code only accepts actual dicts via
spec.downcast_ref::<PyDict>(), so change the behavior or the message; simplest
fix: tighten the TypeError text used in vm.new_type_error to "Filter specifier
must be a dict" (replace the current message) so it accurately reflects that
only PyDict instances are accepted by the downcast_ref::<PyDict>() check (update
the string passed to vm.new_type_error where spec.downcast_ref::<PyDict>() is
called).

100-100: ⚠️ Potential issue | 🟡 Minor

Empty message in memory error is unhelpful.

The Error::Mem case creates a memory error with an empty string. Consider providing a meaningful message like "Memory allocation failed" to help users diagnose issues.

Proposed fix
-            Error::Mem => vm.new_memory_error(""),
+            Error::Mem => vm.new_memory_error("Memory allocation failed"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` at line 100, The Error::Mem branch returns a
memory error with an empty message; update the Error::Mem handling in lzma.rs to
call vm.new_memory_error with a descriptive message (e.g., "Memory allocation
failed") so the returned error includes useful context; locate the Error::Mem
match arm and replace the empty string argument to vm.new_memory_error
accordingly.

754-765: ⚠️ Potential issue | 🟠 Major

FORMAT_ALONE silently ignores user-provided filters.

Both branches execute identical code—the filter_specs parameter is captured but never used. This silently drops caller configuration. The TODO comment acknowledges this, but silent data loss is problematic.

Proposed fix: fail fast until full support is implemented
     fn init_alone(
         preset: u32,
         filter_specs: Option<Vec<PyObjectRef>>,
         vm: &VirtualMachine,
     ) -> PyResult<Stream> {
-        if let Some(_filter_specs) = filter_specs {
-            // TODO: validate single LZMA1 filter and use its options
-            let options = LzmaOptions::new_preset(preset).map_err(|_| {
-                new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
-            })?;
-            Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
-        } else {
-            let options = LzmaOptions::new_preset(preset).map_err(|_| {
-                new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
-            })?;
-            Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
-        }
+        if filter_specs.is_some() {
+            return Err(new_lzma_error(
+                "FORMAT_ALONE with custom filters is not yet supported",
+                vm,
+            ));
+        }
+        let options = LzmaOptions::new_preset(preset).map_err(|_| {
+            new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
+        })?;
+        Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 754 - 765, The code currently ignores
provided filter_specs in the FORMAT_ALONE branch (the if let Some(_filter_specs)
= filter_specs block) and proceeds to call LzmaOptions::new_preset(...) and
Stream::new_lzma_encoder(...), causing silent loss of caller configuration;
change this to fail fast by returning a new LZMA error when filter_specs is
Some(...) (e.g., construct and return new_lzma_error("Filters not supported for
FORMAT_ALONE", vm) via map_err or early return) instead of proceeding to use
LzmaOptions::new_preset and Stream::new_lzma_encoder; keep the existing behavior
for the None case unchanged and reference the functions LzmaOptions::new_preset
and Stream::new_lzma_encoder and the variable filter_specs to locate the change.

460-479: ⚠️ Potential issue | 🟠 Major

build_filter_spec accepts malformed property blobs.

FILTER_LZMA2 and FILTER_DELTA only reject empty input, allowing extra bytes to be silently ignored. BCJ filters accept non-0/4 lengths silently. This can decode invalid encoded_props instead of raising LZMAError.

Proposed fix with strict length validation
             FILTER_LZMA2 => {
-                if props.is_empty() {
+                if props.len() != 1 {
                     return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
                 let dict_size = lzma2_dict_size_from_prop(props[0]);
                 dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?;
             }
             FILTER_DELTA => {
-                if props.is_empty() {
+                if props.len() != 1 {
                     return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
                 let dist = props[0] as u32 + 1;
                 dict.set_item("dist", vm.new_pyobj(dist), vm)?;
             }
             FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB
             | FILTER_SPARC => {
-                if props.len() == 4 {
+                if props.is_empty() {
+                    // default start_offset = 0, no entry added
+                } else if props.len() == 4 {
                     let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]);
                     dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?;
+                } else {
+                    return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 460 - 479, build_filter_spec
currently accepts malformed property blobs: enforce strict length checks for
FILTER_LZMA2 and FILTER_DELTA to require exactly one byte (props.len() == 1) and
return Err(new_lzma_error("Invalid or unsupported options", vm)) otherwise; for
BCJ filters (FILTER_X86, FILTER_POWERPC, FILTER_IA64, FILTER_ARM,
FILTER_ARMTHUMB, FILTER_SPARC) require props.len() == 0 || props.len() == 4 and
return the same LZMAError when the length is anything else; update the
corresponding match arms (the FILTER_LZMA2, FILTER_DELTA arms and the BCJ arm)
to perform these checks before decoding values and calling dict.set_item.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 211-213: The error message claims "dict or dict-like object" but
the code only accepts actual dicts via spec.downcast_ref::<PyDict>(), so change
the behavior or the message; simplest fix: tighten the TypeError text used in
vm.new_type_error to "Filter specifier must be a dict" (replace the current
message) so it accurately reflects that only PyDict instances are accepted by
the downcast_ref::<PyDict>() check (update the string passed to
vm.new_type_error where spec.downcast_ref::<PyDict>() is called).
- Line 100: The Error::Mem branch returns a memory error with an empty message;
update the Error::Mem handling in lzma.rs to call vm.new_memory_error with a
descriptive message (e.g., "Memory allocation failed") so the returned error
includes useful context; locate the Error::Mem match arm and replace the empty
string argument to vm.new_memory_error accordingly.
- Around line 754-765: The code currently ignores provided filter_specs in the
FORMAT_ALONE branch (the if let Some(_filter_specs) = filter_specs block) and
proceeds to call LzmaOptions::new_preset(...) and Stream::new_lzma_encoder(...),
causing silent loss of caller configuration; change this to fail fast by
returning a new LZMA error when filter_specs is Some(...) (e.g., construct and
return new_lzma_error("Filters not supported for FORMAT_ALONE", vm) via map_err
or early return) instead of proceeding to use LzmaOptions::new_preset and
Stream::new_lzma_encoder; keep the existing behavior for the None case unchanged
and reference the functions LzmaOptions::new_preset and Stream::new_lzma_encoder
and the variable filter_specs to locate the change.
- Around line 460-479: build_filter_spec currently accepts malformed property
blobs: enforce strict length checks for FILTER_LZMA2 and FILTER_DELTA to require
exactly one byte (props.len() == 1) and return Err(new_lzma_error("Invalid or
unsupported options", vm)) otherwise; for BCJ filters (FILTER_X86,
FILTER_POWERPC, FILTER_IA64, FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) require
props.len() == 0 || props.len() == 4 and return the same LZMAError when the
length is anything else; update the corresponding match arms (the FILTER_LZMA2,
FILTER_DELTA arms and the BCJ arm) to perform these checks before decoding
values and calling dict.set_item.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4b67a8 and 047f5ec.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/test/test_lzma.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/compression.rs
  • crates/stdlib/src/lzma.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
crates/stdlib/src/lzma.rs (4)

98-102: ⚠️ Potential issue | 🟡 Minor

Use a non-empty message for MemoryError.

Error::Mem currently produces MemoryError(""), which is hard to diagnose in user reports.

Suggested fix
-            Error::Mem => vm.new_memory_error(""),
+            Error::Mem => vm.new_memory_error("Memory allocation failed"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 98 - 102, In catch_lzma_error,
Error::Mem returns an empty MemoryError message which is unhelpful; update the
Error::Mem arm in the catch_lzma_error function to call vm.new_memory_error with
a descriptive non-empty string (e.g., "LZMA decompression memory exhausted" or
similar) so MemoryError carries useful context when raised.

461-481: ⚠️ Potential issue | 🟠 Major

build_filter_spec accepts malformed property lengths.

FILTER_LZMA2/FILTER_DELTA should require exactly 1 byte, and BCJ filters should accept only 0 or 4 bytes; current logic silently accepts invalid blobs.

Suggested fix
             FILTER_LZMA2 => {
-                if props.is_empty() {
+                if props.len() != 1 {
                     return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
                 let dict_size = lzma2_dict_size_from_prop(props[0]);
                 dict.set_item("dict_size", vm.new_pyobj(dict_size), vm)?;
             }
             FILTER_DELTA => {
-                if props.is_empty() {
+                if props.len() != 1 {
                     return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
                 let dist = props[0] as u32 + 1;
                 dict.set_item("dist", vm.new_pyobj(dist), vm)?;
             }
             FILTER_X86 | FILTER_POWERPC | FILTER_IA64 | FILTER_ARM | FILTER_ARMTHUMB
             | FILTER_SPARC => {
-                if props.len() == 4 {
+                if props.is_empty() {
+                    // default start_offset = 0
+                } else if props.len() == 4 {
                     let start_offset = u32::from_le_bytes([props[0], props[1], props[2], props[3]]);
                     dict.set_item("start_offset", vm.new_pyobj(start_offset), vm)?;
+                } else {
+                    return Err(new_lzma_error("Invalid or unsupported options", vm));
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 461 - 481, The build_filter_spec
branch handling FILTER_LZMA2, FILTER_DELTA, and BCJ filters (FILTER_X86,
FILTER_POWERPC, FILTER_IA64, FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC)
currently accepts malformed props lengths; update it so FILTER_LZMA2 and
FILTER_DELTA require props.len() == 1 (otherwise return
Err(new_lzma_error("Invalid or unsupported options", vm))) and the BCJ branches
only accept props.len() == 0 or props.len() == 4 (if 4 parse start_offset, else
if 0 do nothing; any other length return the same LZMA error). Keep the existing
dict.set_item calls but gate them behind these strict length checks in
build_filter_spec.

755-766: ⚠️ Potential issue | 🟠 Major

FORMAT_ALONE currently ignores caller-provided filters.

When filter_specs is Some, this path still builds encoder options only from preset, silently dropping user configuration.

Suggested fix
     fn init_alone(
         preset: u32,
         filter_specs: Option<Vec<PyObjectRef>>,
         vm: &VirtualMachine,
     ) -> PyResult<Stream> {
-        if let Some(_filter_specs) = filter_specs {
-            // TODO: validate single LZMA1 filter and use its options
-            let options = LzmaOptions::new_preset(preset).map_err(|_| {
-                new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
-            })?;
-            Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
-        } else {
-            let options = LzmaOptions::new_preset(preset).map_err(|_| {
-                new_lzma_error(format!("Invalid compression preset: {preset}"), vm)
-            })?;
-            Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
-        }
+        if filter_specs.is_some() {
+            return Err(new_lzma_error(
+                "FORMAT_ALONE with custom filters is not implemented",
+                vm,
+            ));
+        }
+        let options = LzmaOptions::new_preset(preset)
+            .map_err(|_| new_lzma_error(format!("Invalid compression preset: {preset}"), vm))?;
+        Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm))
     }
As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 755 - 766, The branch currently
ignores caller-provided filters: when filter_specs is Some you still call
LzmaOptions::new_preset(preset) and drop the filters; instead, compute the LZMA
encoder options once by first deriving options depending on filter_specs
(validate that filter_specs contains a single LZMA1 filter and build LzmaOptions
from its options when Some, falling back to LzmaOptions::new_preset(preset) when
None), then call Stream::new_lzma_encoder(&options).map_err(|e|
catch_lzma_error(e, vm)) a single time; update the code around filter_specs,
LzmaOptions::new_preset and Stream::new_lzma_encoder to remove the duplicated
logic and correctly honor the caller-provided filters.

207-233: ⚠️ Potential issue | 🟡 Minor

TypeError text over-promises dict-like support.

Both helpers require PyDict via downcast_ref::<PyDict>(), but the message says “dict or dict-like object”.

Suggested fix
-        let dict = spec.downcast_ref::<PyDict>().ok_or_else(|| {
-            vm.new_type_error("Filter specifier must be a dict or dict-like object")
-        })?;
+        let dict = spec
+            .downcast_ref::<PyDict>()
+            .ok_or_else(|| vm.new_type_error("Filter specifier must be a dict"))?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/stdlib/src/lzma.rs` around lines 207 - 233, The TypeError message in
get_dict_opt_u32 and get_dict_opt_u64 incorrectly promises "dict-like" support
while downcasting only to PyDict; update the vm.new_type_error call in both
functions to state that the filter specifier "must be a dict" (or alternatively,
implement true mapping support by accepting mapping protocol), so the error text
matches the actual check performed by the downcast_ref::<PyDict>() used in
get_dict_opt_u32 and get_dict_opt_u64.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/stdlib/src/lzma.rs`:
- Around line 98-102: In catch_lzma_error, Error::Mem returns an empty
MemoryError message which is unhelpful; update the Error::Mem arm in the
catch_lzma_error function to call vm.new_memory_error with a descriptive
non-empty string (e.g., "LZMA decompression memory exhausted" or similar) so
MemoryError carries useful context when raised.
- Around line 461-481: The build_filter_spec branch handling FILTER_LZMA2,
FILTER_DELTA, and BCJ filters (FILTER_X86, FILTER_POWERPC, FILTER_IA64,
FILTER_ARM, FILTER_ARMTHUMB, FILTER_SPARC) currently accepts malformed props
lengths; update it so FILTER_LZMA2 and FILTER_DELTA require props.len() == 1
(otherwise return Err(new_lzma_error("Invalid or unsupported options", vm))) and
the BCJ branches only accept props.len() == 0 or props.len() == 4 (if 4 parse
start_offset, else if 0 do nothing; any other length return the same LZMA
error). Keep the existing dict.set_item calls but gate them behind these strict
length checks in build_filter_spec.
- Around line 755-766: The branch currently ignores caller-provided filters:
when filter_specs is Some you still call LzmaOptions::new_preset(preset) and
drop the filters; instead, compute the LZMA encoder options once by first
deriving options depending on filter_specs (validate that filter_specs contains
a single LZMA1 filter and build LzmaOptions from its options when Some, falling
back to LzmaOptions::new_preset(preset) when None), then call
Stream::new_lzma_encoder(&options).map_err(|e| catch_lzma_error(e, vm)) a single
time; update the code around filter_specs, LzmaOptions::new_preset and
Stream::new_lzma_encoder to remove the duplicated logic and correctly honor the
caller-provided filters.
- Around line 207-233: The TypeError message in get_dict_opt_u32 and
get_dict_opt_u64 incorrectly promises "dict-like" support while downcasting only
to PyDict; update the vm.new_type_error call in both functions to state that the
filter specifier "must be a dict" (or alternatively, implement true mapping
support by accepting mapping protocol), so the error text matches the actual
check performed by the downcast_ref::<PyDict>() used in get_dict_opt_u32 and
get_dict_opt_u64.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 047f5ec and bcdb334.

⛔ Files ignored due to path filters (4)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/tarfile.py is excluded by !Lib/**
  • Lib/test/test_lzma.py is excluded by !Lib/**
  • Lib/test/test_tarfile.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/compression.rs
  • crates/stdlib/src/lzma.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/stdlib/Cargo.toml

@youknowone youknowone merged commit 89fb349 into RustPython:main Feb 27, 2026
24 of 25 checks passed
@youknowone youknowone deleted the tarfile branch February 27, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant