Detect list mutation during sort even when list length is unchanged by rlaisqls · Pull Request #7432 · RustPython/RustPython

rlaisqls · 2026-03-14T11:42:08Z

The previous check !elements.is_empty() only verified length, so it missed length-preserving mutations like L.append(3); L.pop().

Fix mutation detection in list.sort() to catch modifications that don't change the list length.

Detail

Before sorting, sort() replaces the list's internal Vec with an empty Vec (length=0, capacity=0) via core::mem::take. After sorting, it swaps back and checks whether the temporary Vec was touched.

Changed the condition to elements.capacity() > 0. Any write operation triggers allocation, increasing capacity — and capacity never shrinks back to 0 even after removing all elements. This also subsumes the old !is_empty() check.

Update: capacity() > 0 has a blind spot. L.append(x); L.clear() goes undetected because clear() uses mem::take, which resets capacity back to 0. Looking for a better approach.

For reference, CPython uses a similar approach: it sets allocated = -1 (an impossible value) as a sentinel before sorting, then checks if it changed afterward.

Related Tests

test_bug453523 — removed @unittest.skip, now passes
test_undetected_mutation — removed @unittest.expectedFailure, now passes
Full test_sort suite: all 21 tests pass

Summary by CodeRabbit

Bug Fixes
- Improved validation logic for list sorting operations to better detect concurrent modifications during sort execution.

coderabbitai · 2026-03-14T11:42:26Z

📝 Walkthrough

Walkthrough

The validation condition in PyList::sort was modified to check vector capacity exceeding zero instead of checking for non-empty state, altering how concurrent modification during sorting is detected. The control flow and error handling remain unchanged.

Changes

Cohort / File(s)	Summary
Sort Validation Logic `crates/vm/src/builtins/list.rs`	Modified post-sort concurrent modification detection in `PyList::sort` to check `capacity() > 0` instead of `!is_empty()` for the temporary elements vector.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A rabbit hops through sorting code,
Where capacity now lights the road,
No longer checking if it's bare,
But if there's space still lingering there! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly describes the main change: improving mutation detection during sorting to catch cases where list length remains unchanged, which is exactly what the code change and PR objectives accomplish.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-14T11:42:57Z

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/modulefinder.py
[x] test: cpython/Lib/test/test_modulefinder.py (TODO: 2)

dependencies:

modulefinder

dependent tests: (2 tests)

modulefinder: test_importlib test_modulefinder

[x] lib: cpython/Lib/poplib.py
[x] test: cpython/Lib/test/test_poplib.py

dependencies:

poplib

dependent tests: (1 tests)

poplib: test_poplib

[x] test: cpython/Lib/test/test_builtin.py (TODO: 25)

dependencies:

dependent tests: (no tests depend on builtin)

[ ] test: cpython/Lib/test/test_decorators.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on decorators)

[x] lib: cpython/Lib/ftplib.py
[x] test: cpython/Lib/test/test_ftplib.py (TODO: 4)

dependencies:

ftplib

dependent tests: (34 tests)

ftplib: test_ftplib test_urllib2
- urllib.request: test_http_cookiejar test_pathlib test_pydoc test_sax test_site test_ssl test_urllib test_urllib2_localnet test_urllib2net test_urllibnet
  - pathlib: test_ast test_dbm_sqlite3 test_ensurepip test_httpservers test_importlib test_launcher test_logging test_pathlib test_pkgutil test_runpy test_tarfile test_tempfile test_tomllib test_tools test_traceback test_unparse test_venv test_winapi test_zipapp test_zipfile test_zoneinfo test_zstd

[x] test: cpython/Lib/test/test_gc.py

dependencies:

dependent tests: (169 tests)

[ ] test: cpython/Lib/test/test_generators.py (TODO: 12)
[ ] test: cpython/Lib/test/test_genexps.py
[x] test: cpython/Lib/test/test_generator_stop.py (TODO: 1)
[ ] test: cpython/Lib/test/test_yield_from.py (TODO: 6)

dependencies:

dependent tests: (no tests depend on generator)

[x] lib: cpython/Lib/inspect.py
[ ] test: cpython/Lib/test/test_inspect (TODO: 39)

dependencies:

inspect

dependent tests: (86 tests)

[ ] test: cpython/Lib/test/test_marshal.py (TODO: 21)

dependencies:

dependent tests: (21 tests)

marshal: test_bool test_exceptions test_importlib test_inspect test_marshal test_zipimport
- importlib._bootstrap_external: test_importlib test_unittest
  - modulefinder: test_importlib test_modulefinder
  - py_compile: test_argparse test_cmd_line_script test_compileall test_importlib test_multiprocessing_main_handling test_py_compile test_pydoc test_runpy
  - pydoc: test_enum
- pkgutil: test_pkgutil
- zipimport: test_importlib

[ ] test: cpython/Lib/test/test_sort.py

dependencies:

dependent tests: (no tests depend on sort)

[x] lib: cpython/Lib/sqlite3
[ ] test: cpython/Lib/test/test_sqlite3 (TODO: 77)

dependencies:

sqlite3

dependent tests: (2 tests)

sqlite3: test_dbm_sqlite3 test_sqlite3

[x] lib: cpython/Lib/symtable.py
[ ] test: cpython/Lib/test/test_symtable.py (TODO: 17)

dependencies:

symtable

dependent tests: (2 tests)

symtable: test_inspect test_symtable

[ ] test: cpython/Lib/test/test_unpack.py (TODO: 1)
[ ] test: cpython/Lib/test/test_unpack_ex.py (TODO: 11)

dependencies:

dependent tests: (no tests depend on unpack)

[x] lib: cpython/Lib/weakref.py
[x] lib: cpython/Lib/_weakrefset.py
[x] test: cpython/Lib/test/test_weakref.py (TODO: 20)
[ ] test: cpython/Lib/test/test_weakset.py

dependencies:

weakref

dependent tests: (205 tests)

Legend:

[+] path exists in CPython
[x] up-to-date, [ ] outdated

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/builtins/list.rs`:
- Around line 399-401: The current post-sort check using elements.capacity() is
unreliable; replace it with an explicit mutation sentinel/version flag on the
list that is set when entering the sort window and cleared on exit, update that
sentinel in every mutating API (e.g., append, clear, extend, insert, pop,
remove, etc.), and after invoking the comparison callback verify the
sentinel/version value still matches the original—if it changed call
vm.new_value_error("list modified during sort"); ensure the sentinel is robust
across mem::take-style operations so any mutation during the sort is always
detected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: cb62b91d-f14d-458c-b20c-30c0f3a94d69

📥 Commits

Reviewing files that changed from the base of the PR and between 005860c and baac056.

⛔ Files ignored due to path filters (1)

Lib/test/test_sort.py is excluded by !Lib/**

📒 Files selected for processing (1)

crates/vm/src/builtins/list.rs

coderabbitai · 2026-03-14T11:47:45Z

crates/vm/src/builtins/list.rs

+        if elements.capacity() > 0 {
            return Err(vm.new_value_error("list modified during sort"));
        }


⚠️ Potential issue | 🟠 Major

capacity() > 0 still allows mutation false negatives.

This condition can miss real mutations when callback code mutates and then restores capacity to zero (for example L.append(1); L.clear()), because clear() uses mem::take at Line 235 and resets capacity back to 0. In that case, sort can incorrectly return Ok(()) instead of raising ValueError.

Please replace the capacity heuristic with an explicit mutation sentinel/version flag during the sort window so any mutating API call is detected regardless of final Vec capacity.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/builtins/list.rs` around lines 399 - 401, The current post-sort check using elements.capacity() is unreliable; replace it with an explicit mutation sentinel/version flag on the list that is set when entering the sort window and cleared on exit, update that sentinel in every mutating API (e.g., append, clear, extend, insert, pop, remove, etc.), and after invoking the comparison callback verify the sentinel/version value still matches the original—if it changed call vm.new_value_error("list modified during sort"); ensure the sentinel is robust across mem::take-style operations so any mutation during the sort is always detected.

Oh.. I missed that case.

Some possible alternatives I can think of:

Add a mutation flag to PyList and set it via borrow_vec_mut()

Change clear() to use Vec::clear() instead of mem::take (preserves capacity, but delays memory deallocation. However... it affects memory usage, so it is not a good method.)

etc

Any suggestions on how best to handle this?

@rlaisqls, great question — the mutation counter approach is the closest Rust equivalent to CPython's sentinel strategy and is the most robust. Here's a concrete plan:

Recommended: per-list mutation counter

Add an AtomicUsize field to PyList, increment it in every mutating method, and compare the snapshot taken before sorting to the value after. This is a direct parallel to CPython's allocated = -1 sentinel, and it handles all mutation shapes (append+clear, append+pop, setitem, delitem, etc.).

// 1. Extend the struct #[pyclass(module = false, name = "list", unhashable = true, traverse = "manual")] #[derive(Default)] pub struct PyList { elements: PyRwLock<Vec<PyObjectRef>>, mutation_count: core::sync::atomic::AtomicUsize, }

// 2. Add a private helper impl PyList { #[inline] fn notify_mutation(&self) { self.mutation_count.fetch_add(1, core::sync::atomic::Ordering::Relaxed); } }

// 3. Call notify_mutation() in every mutating method #[pymethod] pub(crate) fn append(&self, x: PyObjectRef) { self.notify_mutation(); self.borrow_vec_mut().push(x); } #[pymethod] fn clear(&self) { self.notify_mutation(); let _removed = core::mem::take(self.borrow_vec_mut().deref_mut()); } #[pymethod] pub(crate) fn extend(&self, x: PyObjectRef, vm: &VirtualMachine) -> PyResult<()> { let mut new_elements = x.try_to_value(vm)?; self.notify_mutation(); self.borrow_vec_mut().append(&mut new_elements); Ok(()) } // … similarly for insert, pop, remove, reverse, // _setitem, _delitem, __iadd__, irepeat, etc.

// 4. Update sort() to use the counter #[pymethod] pub(crate) fn sort(&self, options: SortOptions, vm: &VirtualMachine) -> PyResult<()> { let count_before = self.mutation_count.load(core::sync::atomic::Ordering::Relaxed); let mut elements = core::mem::take(self.borrow_vec_mut().deref_mut()); let res = do_sort(vm, &mut elements, options.key, options.reverse); core::mem::swap(self.borrow_vec_mut().deref_mut(), &mut elements); res?; if self.mutation_count.load(core::sync::atomic::Ordering::Relaxed) != count_before { return Err(vm.new_value_error("list modified during sort")); } Ok(()) }

Why this is better than the alternatives

Approach Handles append+clear? Memory cost Correctness risk

capacity() > 0 (current) ❌ None Misses mem::take-based clears

Vec::clear() in clear() ✅ Higher (keeps allocation) Breaks GC / freelist behaviour

Flag via borrow_vec_mut() ✅ Minimal Sort's own lock calls trigger it → needs suppression logic

Mutation counter (recommended) ✅ One usize (~8 bytes) None — sort's internal take/swap don't call any mutating #[pymethod]

The counter approach also naturally handles concurrent GIL-released scenarios and nested sorts on the same list, and it keeps borrow_vec_mut() unchanged (no need to thread extra context through the write guard).

Would you like me to generate the full diff as a `` block?

🧠 Learnings used

Learnt from: jackoconnordev Repo: RustPython/RustPython PR: 6086 File: benches/microbenchmarks/sort.py:7-7 Timestamp: 2025-08-09T22:55:10.512Z Learning: In RustPython benchmarks, when testing sorting performance, benchmarking only `sorted()` is sufficient since it internally uses the same sorting logic as `list.sort()`. The only difference is that `sorted()` creates a copy first, so testing both would be redundant for evaluating the sorting algorithm's performance.

Learnt from: ShaharNaveh Repo: RustPython/RustPython PR: 7181 File: crates/stdlib/src/csv.rs:20-20 Timestamp: 2026-02-17T07:54:36.200Z Learning: In RustPython, ensure that modules using Python sequence-like types bring in extension methods by importing the trait (for example, 'use rustpython_vm::sliceable::SliceableSequenceOp;'). Even if the trait name isn't directly referenced in code, its methods (such as len()) become in-scope on types like PyStr. Add this import in modules that rely on those extension methods to avoid compilation errors.

Failed to handle agent chat message. Please try again.

@coderabbitai tell me how CPython 3.14.3 handles this problem

@youknowone, here is a precise breakdown of how CPython (from 2.x through the current 3.14 main branch) handles this — the mechanism has not changed across versions.

How CPython detects list mutation during sort

Before sorting — establish a sentinel

CPython saves the current ob_size, ob_item, and allocated fields, then zeroes out ob_size, sets ob_item = NULL, and sets allocated = -1. The comment reads: /* any operation will reset it to >= 0 */

The allocated field has a strict invariant in normal use: 0 <= ob_size <= allocated. list.sort() deliberately violates this by temporarily setting allocated to -1 to detect mutations.

The list is thus left in a publicly-visible empty state (from Python code len(L) == 0 during the sort). The real elements are tucked away in the local saved_ob_item pointer, invisible to Python callbacks.

After sorting — check the sentinel

After the sort completes, CPython checks: if (self->allocated != -1 && result != NULL) — this is the point where it detects that "the user mucked with the list during the sort".

Because every single mutating operation on a CPython list — append, extend, insert, pop, remove, clear, __setitem__, __delitem__, __iadd__, __imul__, etc. — funnels through list_resize(), which unconditionally sets allocated to a non-negative value, the -1 sentinel is impossible to encounter unless no mutation occurred. Even list.clear() goes through list_resize(0), which sets allocated = 0, not -1.

Why the current RustPython approach breaks down

RustPython's elements.capacity() > 0 check mirrors the CPython intent, but fails because Rust's Vec memory model is different:

Vec::clear() retains the allocation → capacity() stays > 0 → detects mutation correctly

core::mem::take() (used in RustPython's own clear() method at Line 235) produces a brand-new empty Vec with capacity 0 — indistinguishable from the sort's own internal sentinel state → mutation silently missed

The CPython sentinel works because -1 is structurally impossible for allocated under any normal operation. RustPython's capacity() == 0 is not similarly exclusive — it can be achieved by user-initiated mem::take-style clears.

Recommended fix (recap)

The Rust equivalent of CPython's -1 sentinel is an explicit mutation counter (AtomicUsize) on PyList, incremented in every #[pymethod] that mutates the list. A snapshot taken before the sort is compared to the value after — any difference means mutation occurred. This is exactly what I outlined in my earlier comment, and it's the closest structural analogue to CPython's approach.

🧠 Learnings used

Learnt from: jackoconnordev Repo: RustPython/RustPython PR: 6086 File: benches/microbenchmarks/sort.py:7-7 Timestamp: 2025-08-09T22:55:10.512Z Learning: In RustPython benchmarks, when testing sorting performance, benchmarking only `sorted()` is sufficient since it internally uses the same sorting logic as `list.sort()`. The only difference is that `sorted()` creates a copy first, so testing both would be redundant for evaluating the sorting algorithm's performance.

Learnt from: ShaharNaveh Repo: RustPython/RustPython PR: 7181 File: crates/stdlib/src/csv.rs:20-20 Timestamp: 2026-02-17T07:54:36.200Z Learning: In RustPython, ensure that modules using Python sequence-like types bring in extension methods by importing the trait (for example, 'use rustpython_vm::sliceable::SliceableSequenceOp;'). Even if the trait name isn't directly referenced in code, its methods (such as len()) become in-scope on types like PyStr. Add this import in modules that rely on those extension methods to avoid compilation errors.

From what I've found, CPython handles this by setting allocated = -1 forcely as a sentinel before sorting, so any mutation naturally overwrites it.

But Rust's Vec doesn't expose its capacity field for direct manipulation, so the same trick isn't directly applicable.

To fully solve this, I think it would need to add some form of mutation flag to PyList, would that be an acceptable approach, or do you have a different idea in mind?

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Detect list mutation during sort even when list length is unchanged

baac056

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect list mutation during sort even when list length is unchanged#7432

Detect list mutation during sort even when list length is unchanged#7432
rlaisqls wants to merge 1 commit intoRustPython:mainfrom
rlaisqls:fix/sort-mutation-detection

rlaisqls commented Mar 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 14, 2026 •

edited

Loading

Uh oh!

rlaisqls Mar 14, 2026

Uh oh!

coderabbitai bot Mar 14, 2026

Uh oh!

youknowone Mar 14, 2026

Uh oh!

coderabbitai bot Mar 14, 2026

Uh oh!

rlaisqls Mar 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Approach	Handles `append+clear`?	Memory cost	Correctness risk
`capacity() > 0` (current)	❌	None	Misses `mem::take`-based clears
`Vec::clear()` in `clear()`	✅	Higher (keeps allocation)	Breaks GC / freelist behaviour
Flag via `borrow_vec_mut()`	✅	Minimal	Sort's own lock calls trigger it → needs suppression logic
Mutation counter (recommended)	✅	One `usize` (~8 bytes)	None — sort's internal `take`/`swap` don't call any mutating `#[pymethod]`

Conversation

rlaisqls commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Detail

Related Tests

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

github-actions bot commented Mar 14, 2026

📦 Library Dependencies

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlaisqls Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Recommended: per-list mutation counter

Why this is better than the alternatives

Uh oh!

youknowone Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

How CPython detects list mutation during sort

Before sorting — establish a sentinel

After sorting — check the sentinel

Why the current RustPython approach breaks down

Recommended fix (recap)

Uh oh!

rlaisqls Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rlaisqls commented Mar 14, 2026 •

edited

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

coderabbitai bot Mar 14, 2026 •

edited

Loading

rlaisqls Mar 14, 2026 •

edited

Loading