Skip to content

Suspend Python threads before fork() #7364

Merged
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:implock
Mar 7, 2026
Merged

Suspend Python threads before fork() #7364
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:implock

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 5, 2026

Replace transmute+AtomicCell::swap with ptr::write_bytes(0) for resetting parking_lot mutexes in _at_fork_reinit methods. Simpler and avoids relying on AtomicCell's internal layout matching the mutex type.

Summary by CodeRabbit

  • Bug Fixes

    • Improved thread safety for fork operations with coordinated suspension of active threads before forking.
    • Fixed I/O operations (read, write) to properly handle threading context.
    • Enhanced sleep functionality to work safely within the threading system.
  • New Features

    • Added stop-the-world mechanism for safer process forking in multi-threaded environments.
    • Implemented atomic descriptor caching for improved performance of method lookups.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR implements a multi-threaded stop-the-world mechanism for safe fork operations in RustPython. It centralizes lock reinitialization, introduces atomic descriptor sequencing for caching, adds per-thread state tracking, and integrates a thread suspension mechanism to pause non-requester threads before fork while coordinating their resumption afterward.

Changes

Cohort / File(s) Summary
Spell Checker Configuration
.cspell.dict/cpython.txt, .cspell.json
Added/removed CPython-related terms (PYTHONUTF, setprofileallthreads, settraceallthreads, sysdict) in spell checker dictionary.
Generic Lock Zeroing Utility
crates/common/src/lock.rs
Introduced generic zero_reinit_after_fork<T>() function to centralize zeroing of unlocked parking_lot lock state; updated reinit_mutex_after_fork and reinit_rwlock_after_fork to delegate to it.
Descriptor Sequencing Infrastructure
crates/compiler-core/src/bytecode.rs
Added descriptor_sequences: Box<[AtomicU32]> field to CodeUnits with atomic sequence counter for per-instruction descriptor updates; exposed begin_descriptor_write() and end_descriptor_write() public API.
Atomic Descriptor Caching
crates/vm/src/frame.rs
Added lock-free read path try_read_cached_descriptor() with double-check; introduced write_cached_descriptor() and write_cached_descriptor_with_metaclass() with atomic publish boundaries using descriptor sequence API.
Stop-the-World Core Infrastructure
crates/vm/src/vm/mod.rs
Introduced StopTheWorldState and StopTheWorldStats structs with methods for thread suspension coordination, fork-safe state reset, and statistics tracking; added allow_threads() API to VirtualMachine for temporary thread detachment; integrated suspension checks into signal handling.
Per-Thread State Management
crates/vm/src/vm/thread.rs
Extended ThreadSlot with atomic per-thread state (THREAD_DETACHED, THREAD_ATTACHED, THREAD_SUSPENDED) and thread handle; added suspend_if_needed(), allow_threads(), and internal state transition helpers; updated initialization/fork reinit paths.
Fork Synchronization & Threading
crates/vm/src/stdlib/posix.rs
Modified fork() signature to return PyResult<i32>; integrated thread coordination (acquire import lock, stop-the-world before fork, reset in child, resume in parent); added error handling for fork failures and post-fork hook invocation.
Import Lock Fork Safety
crates/vm/src/stdlib/imp.rs
Replaced direct lock manipulation with fork-aware helpers; added acquire_lock_for_fork(), release_lock_after_fork_parent(), and after_fork_child_reinit_and_release(); delegated reinit logic to zero_reinit_after_fork(); exposed re-export wrappers for threading-aware fork hooks.
Thread Module Stop-the-World Integration
crates/vm/src/stdlib/thread.rs
Updated Lock and RLock fork reinitialization to use zero_reinit_after_fork() instead of manual memory writes; added _stop_the_world_stats() and _stop_the_world_reset_stats() public functions for statistics exposure; unified fork-reinit paths.
Allow-Threads Wrapper Integration
crates/vm/src/stdlib/os.rs, crates/vm/src/stdlib/time.rs
Wrapped blocking I/O (read, readinto, write) and sleep calls inside vm.allow_threads() to enable stop-the-world coordination during blocking operations.
Object Deallocation Refactoring
crates/vm/src/object/core.rs
Removed PyDict from public re-export; added into_inner() method to InstanceDict; changed dict deallocation in subtype_clear and gc_clear_raw to detach dict pointer instead of clearing contents.
Interpreter Initialization
crates/vm/src/vm/interpreter.rs
Added conditional stop_the_world field to PyGlobalState under Unix+threading; extended frozen module aliasing to include test module aliases (__hello__, __phello__ families).

Sequence Diagram(s)

sequenceDiagram
    participant Main as Main Thread<br/>(Forking)
    participant Worker as Worker Threads
    participant VM as VirtualMachine<br/>(stop_the_world)
    participant Fork as fork() syscall
    participant Child as Child Process
    participant Parent as Parent Process
    
    Main->>VM: stop_the_world()
    VM->>VM: set requested state
    VM->>Worker: signal suspension needed
    loop Poll until suspended
        VM->>Worker: check state
        Worker->>VM: ATTACHED→SUSPENDED
        Worker->>VM: notify_suspended()
        VM->>VM: accumulate stats
    end
    VM->>Main: all suspended ✓
    Main->>Fork: fork syscall
    Fork->>Child: child process
    Fork->>Parent: parent process
    
    Child->>VM: reset_after_fork()
    Child->>VM: reinit locks (zero_reinit)
    Child->>VM: release import lock
    
    Parent->>VM: start_the_world()
    VM->>Worker: resume (SUSPENDED→ATTACHED)
    Parent->>VM: resume threads
    Parent->>Parent: release import lock
    
    Worker->>Worker: continue execution
Loading
sequenceDiagram
    participant Reader as Reader Thread
    participant Cache as Descriptor Cache
    participant Writer as Writer Thread
    participant Seq as Sequence Counter
    
    Writer->>Seq: begin_descriptor_write(idx)
    Writer->>Cache: write pointer, version
    Writer->>Seq: end_descriptor_write(idx)
    
    Reader->>Seq: read sequence[idx]
    Reader->>Cache: read pointer, type_version
    Reader->>Cache: double-check version
    alt Versions match
        Reader->>Reader: use cached descriptor
    else Stale cache
        Reader->>Cache: miss (stale)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • fanninpm

Poem

🐰 Threads pause mid-stride at fork's command,
A stop-the-world so calm and grand,
Descriptors dance in sync with grace,
While parent waits at rendezvous place,
At last they wake and resume their race! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Suspend Python threads before fork()' accurately describes the main objective of the PR, which involves implementing stop-the-world suspension mechanics before fork and adding comprehensive threading coordination around fork operations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone changed the title Use write_bytes for lock reinit after fork Suspend Python threads before fork() Mar 6, 2026
@youknowone youknowone force-pushed the implock branch 3 times, most recently from c7de8f0 to e2c097e Compare March 6, 2026 04:36
@youknowone youknowone marked this pull request as ready for review March 6, 2026 05:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
crates/vm/src/vm/mod.rs (2)

186-191: Minor: Redundant control flow.

The final check can be simplified since both branches return all_suspended. The comment mentions "verify once more after dropping the lock" but verification happens before the lock is dropped.

♻️ Simplify return
-        if all_suspended {
-            // Verify once more after dropping the lock
-            return true;
-        }
-        all_suspended
+        all_suspended
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 186 - 191, The branch returning true is
redundant: where you currently have `if all_suspended { /* Verify once more
after dropping the lock */ return true; } all_suspended`, replace that whole
conditional with a single `return all_suspended;` and remove or update the
misleading comment about verifying after dropping the lock so it accurately
reflects where verification occurs; target the block that references the
`all_suspended` boolean in this function.

199-218: Consider logging or metrics for stop_the_world timeout.

When the 500ms deadline is reached without all threads suspended, stop_the_world silently proceeds. While this matches CPython's best-effort behavior, it could lead to subtle fork-safety issues if threads remain attached. Consider adding a debug log or metric for observability in production debugging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 199 - 218, stop_the_world currently
proceeds silently on timeout; after the waiting loop returns, call
park_detached_threads(vm) once more (or check the loop exit cause) and if it
still fails to suspend all threads, emit a debug log and/or increment an
observability metric indicating a stop_the_world timeout, including the
requester_ident (from crate::stdlib::thread::get_ident()) for context; use
existing symbols stop_the_world, park_detached_threads,
requester/requester_ident, and notify_mutex/notify_cv to locate the correct spot
to add the log/metric.
crates/compiler-core/src/bytecode.rs (1)

658-704: SeqLock implementation looks correct; minor redundancy in begin_descriptor_write.

The SeqLock pattern is correctly implemented. One observation: the fence(Ordering::Release) at line 690 is redundant because the successful CAS with Ordering::AcqRel already provides Release semantics. The fence is harmless but adds unnecessary overhead.

♻️ Optional: Remove redundant fence
             match sequence.compare_exchange_weak(
                 seq,
                 seq.wrapping_add(1),
                 Ordering::AcqRel,
                 Ordering::Acquire,
             ) {
                 Ok(_) => {
-                    core::sync::atomic::fence(Ordering::Release);
                     break;
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 658 - 704, The
fence(Ordering::Release) call inside begin_descriptor_write is redundant because
the successful compare_exchange_weak already uses Ordering::AcqRel and provides
the required release semantics; remove the
core::sync::atomic::fence(Ordering::Release) statement from the
begin_descriptor_write implementation so the CAS alone handles ordering, leaving
the rest of the loop and compare_exchange_weak call intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.cspell.dict/cpython.txt:
- Line 163: The dictionary entry "PYTHONUTF" is misspelled; update the token to
"PYTHONUTF8" so the CPython UTF-8 environment variable is correct; locate the
entry "PYTHONUTF" in .cspell.dict/cpython.txt and replace it with "PYTHONUTF8"
(preserve surrounding formatting).

In `@crates/vm/src/stdlib/posix.rs`:
- Around line 913-918: The stop-the-world call may time out and currently is
ignored before calling libc::fork(), which can fork while another thread holds
interpreter locks; change the call site that invokes
vm.state.stop_the_world.stop_the_world(vm) to check a success/failure return (or
propagate a Result) and if the stop-the-world timed out abort the fork path by
returning an error or raising an exception instead of proceeding to unsafe {
libc::fork() }; update the stop_the_world API (and any callers like the code
invoking fork) to return a boolean/Result so fork only runs when stop-the-world
succeeded.

In `@crates/vm/src/vm/thread.rs`:
- Around line 175-202: do_suspend currently does a direct
s.state.store(THREAD_ATTACHED, Ordering::Release) after the suspension loop,
which can overwrite a concurrently set THREAD_SUSPENDED; change this to the same
CAS-with-retry pattern used by attach_thread: perform a compare_exchange loop on
s.state that only sets THREAD_ATTACHED when the current value is
THREAD_SUSPENDED (use the same Ordering::Acquire/Release semantics as
surrounding loads/stores), retrying or re-reading the state until the
compare_exchange succeeds or the state is no longer THREAD_SUSPENDED so you
avoid clobbering a new suspend request; reference function do_suspend, type
StopTheWorldState, and field s.state and constants THREAD_SUSPENDED and
THREAD_ATTACHED.

---

Nitpick comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 658-704: The fence(Ordering::Release) call inside
begin_descriptor_write is redundant because the successful compare_exchange_weak
already uses Ordering::AcqRel and provides the required release semantics;
remove the core::sync::atomic::fence(Ordering::Release) statement from the
begin_descriptor_write implementation so the CAS alone handles ordering, leaving
the rest of the loop and compare_exchange_weak call intact.

In `@crates/vm/src/vm/mod.rs`:
- Around line 186-191: The branch returning true is redundant: where you
currently have `if all_suspended { /* Verify once more after dropping the lock
*/ return true; } all_suspended`, replace that whole conditional with a single
`return all_suspended;` and remove or update the misleading comment about
verifying after dropping the lock so it accurately reflects where verification
occurs; target the block that references the `all_suspended` boolean in this
function.
- Around line 199-218: stop_the_world currently proceeds silently on timeout;
after the waiting loop returns, call park_detached_threads(vm) once more (or
check the loop exit cause) and if it still fails to suspend all threads, emit a
debug log and/or increment an observability metric indicating a stop_the_world
timeout, including the requester_ident (from crate::stdlib::thread::get_ident())
for context; use existing symbols stop_the_world, park_detached_threads,
requester/requester_ident, and notify_mutex/notify_cv to locate the correct spot
to add the log/metric.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 242defd5-92d2-405d-b0ec-2b02b3d677ad

📥 Commits

Reviewing files that changed from the base of the PR and between a27d812 and e2c097e.

📒 Files selected for processing (11)
  • .cspell.dict/cpython.txt
  • .cspell.json
  • crates/common/src/lock.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/stdlib/imp.rs
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/vm/interpreter.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/thread.rs
💤 Files with no reviewable changes (1)
  • .cspell.json

Comment on lines +7033 to +7076
loop {
let sequence = self.code.instructions.begin_descriptor_read(cache_base);
let version_before = self.code.instructions.read_cache_u32(cache_base + 1);
if version_before != expected_type_version || version_before == 0 {
if self
.code
.instructions
.end_descriptor_read(cache_base, sequence)
{
return None;
}
continue;
}

let descr_ptr = self.code.instructions.read_cache_ptr(cache_base + 5);
if descr_ptr == 0 {
if self
.code
.instructions
.end_descriptor_read(cache_base, sequence)
{
return None;
}
continue;
}

let cloned = unsafe { PyObject::try_to_owned_from_ptr(descr_ptr as *mut PyObject) };
let version_after = self.code.instructions.read_cache_u32(cache_base + 1);
let ptr_after = self.code.instructions.read_cache_ptr(cache_base + 5);
let stable = self
.code
.instructions
.end_descriptor_read(cache_base, sequence);

if let Some(cloned) = cloned
&& stable
&& version_after == expected_type_version
&& ptr_after == descr_ptr
{
return Some(cloned);
}
if stable {
return None;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include cache_base + 3 in the guarded descriptor snapshot.

try_read_cached_descriptor() only revalidates the type-version slot and descriptor pointer, but LoadAttrClassWithMetaclassCheck still consumes metaclass_version outside that protocol while write_cached_descriptor_with_metaclass() now publishes it inside the write epoch. That leaves the metaclass-aware fast path built from a mixed snapshot: best case it spuriously deopts, worst case it accepts a cache hit against a stale metaclass guard under concurrent mutation. Please fold the metaclass slot into the same read helper, or add a metaclass-aware variant.

Also applies to: 7111-7124

Comment on lines +913 to +918
// Like CPython's PyOS_BeforeFork: stop all other Python threads
// so they are at safe points, not holding internal Rust/C locks.
#[cfg(feature = "threading")]
vm.state.stop_the_world.stop_the_world(vm);

let pid = unsafe { libc::fork() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't call fork() after a timed-out stop-the-world.

stop_the_world() currently gives up after 500ms in crates/vm/src/vm/mod.rs:199-218, while attached threads only self-suspend from the bytecode loop in crates/vm/src/vm/thread.rs:168-197. A thread blocked in a VM/internal lock never reaches that path, so this code can still execute libc::fork() while another thread owns interpreter state. Please make the stop-the-world phase report success/failure and abort or raise instead of forking on timeout.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 913 - 918, The stop-the-world
call may time out and currently is ignored before calling libc::fork(), which
can fork while another thread holds interpreter locks; change the call site that
invokes vm.state.stop_the_world.stop_the_world(vm) to check a success/failure
return (or propagate a Result) and if the stop-the-world timed out abort the
fork path by returning an error or raising an exception instead of proceeding to
unsafe { libc::fork() }; update the stop_the_world API (and any callers like the
code invoking fork) to return a boolean/Result so fork only runs when
stop-the-world succeeded.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/thread.rs (1)

79-96: ⚠️ Potential issue | 🔴 Critical

The detach-on-wait rollout is still incomplete.

This fixes Lock.acquire(), but RLock::_acquire_restore() still does a raw self.mu.lock(), and _ThreadHandle.join() still waits on done_event/JoinHandle::join() while the caller stays THREAD_ATTACHED. If another thread forks during either path, stop_the_world() can spin forever waiting for them to suspend. Please route those waits through vm.allow_threads() too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 79 - 96, The rollout missed
converting some blocking waits to detach from the GC: wrap the raw mutex/wait
calls in RLock::_acquire_restore and in _ThreadHandle.join so they use
vm.allow_threads (same pattern as Lock.acquire used for mu.try_lock_for and
mu.lock()); specifically replace direct self.mu.lock() in
RLock::_acquire_restore and the waits on done_event / JoinHandle::join inside
_ThreadHandle.join with vm.allow_threads(|| ... ) so those waits occur while the
thread is THREAD_DETACHED and cannot block stop_the_world.
♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)

7027-7058: ⚠️ Potential issue | 🟠 Major

Guard cache_base + 3 in the same snapshot as the descriptor pointer.

LoadAttrClassWithMetaclassCheck still consumes metaclass_version outside try_read_cached_descriptor(), while this writer now publishes that slot inside the descriptor write epoch. That means the fast path can still combine a stale metaclass guard with a freshly published {type_version, descr_ptr} pair and accept a mixed snapshot under concurrent mutation. Please fold cache_base + 3 into the read helper, or add a metaclass-aware variant for the class-attribute path.

Also applies to: 7091-7105

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7027 - 7058, The helper
try_read_cached_descriptor can return a false-positive by not checking the
metaclass guard slot (cache_base + 3) in the same snapshot as the descr_ptr; add
the metaclass check into the atomic double-check or provide a new variant (e.g.,
try_read_cached_descriptor_with_metaclass) that accepts an
expected_metaclass_version and verifies cache_base+3 equals that value in the
same re-read that validates the type version and descr_ptr. Update the
LoadAttrClassWithMetaclassCheck call site to call the new variant (or fold its
metaclass_version consumption into try_read_cached_descriptor) so the
metaclass_version, type_version, and descr_ptr are validated together before
returning the cloned object.
🧹 Nitpick comments (1)
scripts/fork_stw_compare.sh (1)

17-18: Use python for the CPython side of this harness.

Hardcoding python3 can select a different interpreter than the repo's active comparison environment.

♻️ Suggested change
-python3 "$SCRIPT_PATH" \
+python "$SCRIPT_PATH" \

Based on learnings: When comparing behavior with CPython or running test suites, explicitly use the python command for CPython and cargo run -- for RustPython.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/fork_stw_compare.sh` around lines 17 - 18, The script uses the
hardcoded interpreter "python3" when invoking the CPython side; update the
invocation in the fork_stw_compare.sh script (the line calling python3
"$SCRIPT_PATH") to call "python" instead so the harness uses the repo's
configured CPython executable; keep the echo "[CPython]" and rest of the
invocation unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/time.rs`:
- Around line 120-122: The EINTR check is unreliable because errno is read after
vm.allow_threads returns; change the closure passed to vm.allow_threads so it
calls libc::nanosleep and immediately captures errno inside the closure (e.g.
call nix::Error::last_raw() or read libc::errno) and return or otherwise expose
both the nanosleep result and the captured errno to the outer scope; then
compute interrupted with the captured errno (e.g. interrupted = res == -1 &&
captured_errno == libc::EINTR) and call vm.check_signals() when interrupted.
Ensure you update the use of libc::nanosleep, vm.allow_threads,
nix::Error::last_raw(), and the interrupted logic to use the captured errno from
inside the closure.

In `@crates/vm/src/vm/mod.rs`:
- Around line 314-320: The stop-request flag self.requested must be cleared
before waking suspended threads to avoid a race where a thread rechecks
attach_thread() and re-suspends; move the self.requested.store(false,
Ordering::Release) above the loop that iterates registry and before calling
slot.state.store(... THREAD_DETACHED ...) and slot.thread.unpark() in
start_the_world(), so that threads returning from allow_threads() see
requested==false; update any comments around start_the_world(), attach_thread(),
THREAD_SUSPENDED, and THREAD_DETACHED to reflect the new ordering and keep
drop(registry) after the wake loop as needed.

In `@scripts/fork_stw_matrix.sh`:
- Around line 21-22: The script currently calls a hardcoded prebuilt RustPython
binary (target/release/rustpython) which breaks if CARGO_TARGET_DIR or Cargo
config moves build outputs; replace those direct binary invocations with cargo
run --release -- (invoked from the same directory where you run cargo build,
e.g. (cd "$ROOT_DIR" && cargo run --release -- -- <args>)) or alternatively
resolve the actual target dir via CARGO_TARGET_DIR (or cargo metadata) and use
that path; update every occurrence of target/release/rustpython (including the
block around the build step and the invocations in the 35-41 range) to use cargo
run --release -- or the dynamically-resolved path so RustPython is executed
reliably.

In `@scripts/fork_stw_stress.py`:
- Around line 92-105: The benchmark resets STW stats too early and labels
latency incorrectly: move the maybe_reset_stw_stats() call out of the top of the
loop and instead invoke it once immediately after the warmup iterations complete
so rustpython_stw_stats excludes warmup and aligns with lat_ns; also adjust how
fork_latency_us is measured—if you intend to measure pure fork latency (not
child exit+wait), capture t0 = time.perf_counter_ns() before os.fork() and
compute dt immediately after the fork returns (before os.waitpid()), otherwise
update the metric name to indicate it includes the waitpid round-trip; update
references in this block (maybe_reset_stw_stats(), warmup, lat_ns,
fork_latency_us) accordingly.

---

Outside diff comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 79-96: The rollout missed converting some blocking waits to detach
from the GC: wrap the raw mutex/wait calls in RLock::_acquire_restore and in
_ThreadHandle.join so they use vm.allow_threads (same pattern as Lock.acquire
used for mu.try_lock_for and mu.lock()); specifically replace direct
self.mu.lock() in RLock::_acquire_restore and the waits on done_event /
JoinHandle::join inside _ThreadHandle.join with vm.allow_threads(|| ... ) so
those waits occur while the thread is THREAD_DETACHED and cannot block
stop_the_world.

---

Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 7027-7058: The helper try_read_cached_descriptor can return a
false-positive by not checking the metaclass guard slot (cache_base + 3) in the
same snapshot as the descr_ptr; add the metaclass check into the atomic
double-check or provide a new variant (e.g.,
try_read_cached_descriptor_with_metaclass) that accepts an
expected_metaclass_version and verifies cache_base+3 equals that value in the
same re-read that validates the type version and descr_ptr. Update the
LoadAttrClassWithMetaclassCheck call site to call the new variant (or fold its
metaclass_version consumption into try_read_cached_descriptor) so the
metaclass_version, type_version, and descr_ptr are validated together before
returning the cloned object.

---

Nitpick comments:
In `@scripts/fork_stw_compare.sh`:
- Around line 17-18: The script uses the hardcoded interpreter "python3" when
invoking the CPython side; update the invocation in the fork_stw_compare.sh
script (the line calling python3 "$SCRIPT_PATH") to call "python" instead so the
harness uses the repo's configured CPython executable; keep the echo "[CPython]"
and rest of the invocation unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7b551c33-3f49-4c98-92c9-d0ef02d01082

📥 Commits

Reviewing files that changed from the base of the PR and between 3c9598a and fc1ec3a.

📒 Files selected for processing (14)
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/stdlib/imp.rs
  • crates/vm/src/stdlib/os.rs
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/stdlib/time.rs
  • crates/vm/src/vm/interpreter.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/thread.rs
  • fork-align.md
  • scripts/fork_stw_compare.sh
  • scripts/fork_stw_matrix.sh
  • scripts/fork_stw_stress.py

Comment on lines +21 to +22
echo "building RustPython release binary..."
(cd "$ROOT_DIR" && cargo build --release >/dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve the RustPython executable through Cargo.

target/release/rustpython breaks as soon as CARGO_TARGET_DIR or Cargo config moves the build output. Either invoke RustPython with cargo run --release -- here, or resolve Cargo's target directory dynamically before running a prebuilt binary.

♻️ Suggested change
-echo "building RustPython release binary..."
-(cd "$ROOT_DIR" && cargo build --release >/dev/null)
-
 for threads in $THREADS_SET; do
   echo "[threads=$threads] CPython"
   python3 "$SCRIPT_PATH" \
@@
   echo "[threads=$threads] RustPython"
-  (cd "$ROOT_DIR" && target/release/rustpython "$SCRIPT_PATH" \
+  (cd "$ROOT_DIR" && cargo run --release -- "$SCRIPT_PATH" \
     --loops "$LOOPS" \
     --threads "$threads" \
     --spin-chunk "$SPIN_CHUNK" \

Based on learnings: When comparing behavior with CPython or running test suites, explicitly use the python command for CPython and cargo run -- for RustPython.

Also applies to: 35-41

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/fork_stw_matrix.sh` around lines 21 - 22, The script currently calls
a hardcoded prebuilt RustPython binary (target/release/rustpython) which breaks
if CARGO_TARGET_DIR or Cargo config moves build outputs; replace those direct
binary invocations with cargo run --release -- (invoked from the same directory
where you run cargo build, e.g. (cd "$ROOT_DIR" && cargo run --release -- --
<args>)) or alternatively resolve the actual target dir via CARGO_TARGET_DIR (or
cargo metadata) and use that path; update every occurrence of
target/release/rustpython (including the block around the build step and the
invocations in the 35-41 range) to use cargo run --release -- or the
dynamically-resolved path so RustPython is executed reliably.

Comment on lines +92 to +105
maybe_reset_stw_stats()
lat_ns: list[int] = []

total = max(0, args.loops)
warmup = min(max(0, args.warmup), total)
for i in range(total):
t0 = time.perf_counter_ns()
pid = os.fork()
if pid == 0:
os._exit(0)
os.waitpid(pid, 0)
dt = time.perf_counter_ns() - t0
if i >= warmup:
lat_ns.append(dt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align the benchmark window with what the script reports.

fork_latency_us currently includes the child exit + waitpid() round trip, and rustpython_stw_stats includes warmup iterations because the reset happens before the warmup loop. That makes the latency label misleading and the counters non-comparable to the sampled latencies.

📏 Suggested refactor
-    maybe_reset_stw_stats()
-    lat_ns: list[int] = []
-
-    total = max(0, args.loops)
-    warmup = min(max(0, args.warmup), total)
-    for i in range(total):
-        t0 = time.perf_counter_ns()
-        pid = os.fork()
-        if pid == 0:
-            os._exit(0)
-        os.waitpid(pid, 0)
-        dt = time.perf_counter_ns() - t0
-        if i >= warmup:
-            lat_ns.append(dt)
+    def run_once() -> int:
+        t0 = time.perf_counter_ns()
+        pid = os.fork()
+        if pid == 0:
+            os._exit(0)
+        dt = time.perf_counter_ns() - t0
+        os.waitpid(pid, 0)
+        return dt
+
+    total = max(0, args.loops)
+    warmup = min(max(0, args.warmup), total)
+    for _ in range(warmup):
+        run_once()
+
+    maybe_reset_stw_stats()
+    lat_ns = [run_once() for _ in range(total - warmup)]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/fork_stw_stress.py` around lines 92 - 105, The benchmark resets STW
stats too early and labels latency incorrectly: move the maybe_reset_stw_stats()
call out of the top of the loop and instead invoke it once immediately after the
warmup iterations complete so rustpython_stw_stats excludes warmup and aligns
with lat_ns; also adjust how fork_latency_us is measured—if you intend to
measure pure fork latency (not child exit+wait), capture t0 =
time.perf_counter_ns() before os.fork() and compute dt immediately after the
fork returns (before os.waitpid()), otherwise update the metric name to indicate
it includes the waitpid round-trip; update references in this block
(maybe_reset_stw_stats(), warmup, lat_ns, fork_latency_us) accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/posix.rs (1)

1879-1885: ⚠️ Potential issue | 🟠 Major

Capture errno before allow_threads() reattaches the VM thread.

waitpid() now reads errno after vm.allow_threads(...) returns, so an EINTR can be misclassified and skip vm.check_signals().

Suggested fix
-            let res = vm.allow_threads(|| unsafe { libc::waitpid(pid, &mut status, opt) });
+            let (res, err) = vm.allow_threads(|| {
+                let res = unsafe { libc::waitpid(pid, &mut status, opt) };
+                (res, nix::Error::last_raw())
+            });
             if res == -1 {
-                if nix::Error::last_raw() == libc::EINTR {
+                if err == libc::EINTR {
                     vm.check_signals()?;
                     continue;
                 }
-                return Err(nix::Error::last().into_pyexception(vm));
+                return Err(io::Error::from_raw_os_error(err).into_pyexception(vm));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 1879 - 1885, The code currently
calls vm.allow_threads(|| unsafe { libc::waitpid(pid, &mut status, opt) }) and
then checks nix::Error::last_raw(), but errno can be overwritten when
allow_threads reattaches the VM thread; fix by capturing the raw errno inside
the closure that calls libc::waitpid and return it alongside the syscall result
(e.g., return (res, err_no) from the vm.allow_threads closure), then use that
captured errno instead of calling nix::Error::last_raw() after vm.allow_threads
returns so EINTR is detected and vm.check_signals() is called correctly; update
the branch that compares to libc::EINTR and the error return to use the captured
errno and map it via nix::Error::from_errno or into_pyexception as before.
♻️ Duplicate comments (4)
crates/vm/src/stdlib/time.rs (1)

120-122: ⚠️ Potential issue | 🟠 Major

Capture errno inside the allow_threads closure.

allow_threads() reattaches the VM thread before this code reads errno, so the EINTR check can observe a later syscall and miss vm.check_signals().

Suggested fix
-            let res =
-                vm.allow_threads(|| unsafe { libc::nanosleep(ts.as_ref(), core::ptr::null_mut()) });
-            let interrupted = res == -1 && nix::Error::last_raw() == libc::EINTR;
+            let (res, err) = vm.allow_threads(|| {
+                let res = unsafe { libc::nanosleep(ts.as_ref(), core::ptr::null_mut()) };
+                (res, nix::Error::last_raw())
+            });
+            let interrupted = res == -1 && err == libc::EINTR;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/time.rs` around lines 120 - 122, The EINTR check reads
errno after vm.allow_threads reattaches the VM thread, allowing a different
syscall to clobber errno; fix by capturing the errno inside the closure that
calls libc::nanosleep: call vm.allow_threads(|| { let res = unsafe {
libc::nanosleep(...)}; let err = nix::Error::last_raw() (or read errno) inside
the closure; return both res and err (or set interrupted) so the outer code uses
the captured err to compute interrupted and then call vm.check_signals()
accordingly, updating the handling around res and interrupted variables in
time.rs.
crates/vm/src/stdlib/posix.rs (1)

948-950: ⚠️ Potential issue | 🔴 Critical

Don't proceed to fork() if stop-the-world can fail or time out.

This path still calls libc::fork() unconditionally after py_os_before_fork(vm). If StopTheWorldState::stop_the_world() still has a timeout/failure mode, the fork can run while another thread still owns interpreter state.

Run this to confirm whether stop_the_world() has a timeout/failure path that is currently ignored here:

#!/bin/bash
set -euo pipefail
rg -n -C4 'struct StopTheWorldState|fn stop_the_world|wait_timeout|timeout|500' crates/vm/src/vm
sed -n '764,776p' crates/vm/src/stdlib/posix.rs
sed -n '929,952p' crates/vm/src/stdlib/posix.rs

If the search shows a boolean/Result return or an internal timeout path, py_os_before_fork() should propagate that outcome and fork() should abort instead of continuing to libc::fork().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/posix.rs` around lines 948 - 950, py_os_before_fork(vm)
may hide a failure/timeout from StopTheWorldState::stop_the_world(), so do not
call libc::fork() unconditionally; change the call site in the posix fork path
to inspect and propagate the outcome of py_os_before_fork(vm) (or directly check
StopTheWorldState::stop_the_world()) and abort/return an error instead of
proceeding to unsafe { libc::fork() } when stop-the-world failed or timed out;
update py_os_before_fork(vm) to return a Result or bool indicating success if
needed, log the failure and return a Python error from the surrounding function
rather than performing fork when the stop-the-world step did not succeed.
crates/vm/src/vm/mod.rs (1)

305-321: ⚠️ Potential issue | 🔴 Critical

Clear requested before resuming suspended threads.

Waking slots while the global stop flag is still true lets a thread returning through attach_thread() see requested == true, move itself back to THREAD_SUSPENDED, and miss this resume pass. self.requested.store(false, ...) needs to happen before the wake loop.

Suggested fix
 pub fn start_the_world(&self, vm: &VirtualMachine) {
     use thread::{THREAD_DETACHED, THREAD_SUSPENDED};
     let requester = self.requester.load(Ordering::Relaxed);
     stw_trace(format_args!("start begin requester={requester}"));
+    self.requested.store(false, Ordering::Release);
     let registry = vm.state.thread_frames.lock();
     for (&id, slot) in registry.iter() {
         if id == requester {
             continue;
         }
         if slot.state.load(Ordering::Relaxed) == THREAD_SUSPENDED {
             slot.state.store(THREAD_DETACHED, Ordering::Release);
             slot.thread.unpark();
         }
     }
     drop(registry);
-    self.requested.store(false, Ordering::Release);
     self.requester.store(0, Ordering::Relaxed);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/mod.rs` around lines 305 - 321, The global stop flag
self.requested is cleared too late in start_the_world, allowing threads
returning via attach_thread to see requested == true and re-suspend; move the
self.requested.store(false, Ordering::Release) (and keep self.requester.store(0,
Ordering::Relaxed) after) to occur before iterating the registry so the wake
loop sees requested==false; ensure the loop still checks slot.state for
THREAD_SUSPENDED and sets THREAD_DETACHED and unpark() as before (symbols:
start_the_world, self.requested, self.requester, THREAD_SUSPENDED,
THREAD_DETACHED, attach_thread).
crates/vm/src/vm/thread.rs (1)

299-304: ⚠️ Potential issue | 🔴 Critical

Don’t recover with a blind store/swap back to ATTACHED.

These writes can still overwrite a fresh THREAD_SUSPENDED. A new stop-the-world request can race in after the flag/wait check and before Line 302 or Line 316, and the slot gets forced back to running code even though a requester already treated it as parked. This still needs the guarded CAS-and-retry reattach flow from the earlier review.

Also applies to: 315-320

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/vm/thread.rs` around lines 299 - 304, The blind
s.state.store(some) in the suspend-abort path must be replaced with a guarded
compare-and-swap loop: use s.state.compare_exchange to change THREAD_DETACHED →
THREAD_ATTACHED (with the same Ordering semantics used elsewhere) and retry the
CAS; if the CAS observes THREAD_SUSPENDED (or any non-DETACHED that indicates a
new STW request) then abort/return without forcing ATTACHED. Apply the same
CAS-and-retry reattach flow to both the early-recovery branch around
s.state.store(THREAD_ATTACHED) and the later branch (lines ~315-320) so you
don't overwrite a freshly suspended slot; reference s.state, THREAD_DETACHED,
THREAD_ATTACHED, THREAD_SUSPENDED, and stw.requested and keep ordering
Acquire/Release consistent.
🧹 Nitpick comments (2)
crates/compiler-core/src/bytecode.rs (1)

658-688: Use an RAII guard for descriptor write sections.

If a caller panics or returns early after begin_descriptor_write() and before end_descriptor_write(), the sequence stays odd forever and later readers/writers will spin on that slot. Returning a small guard that performs the closing increment in Drop makes this much harder to misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/compiler-core/src/bytecode.rs` around lines 658 - 688, Replace the
manual begin/end pair with an RAII guard: modify begin_descriptor_write to
return a small DescriptorWriteGuard (holding &self and the index) instead of
unit, implement Drop for DescriptorWriteGuard to perform the increment on
self.descriptor_sequences[index] (using fetch_add with Ordering::Release) so the
sequence is always closed even on panic/early return, and remove or make
end_descriptor_write private/unused; keep the current locking loop logic inside
the new begin_descriptor_write before creating/returning the guard so callers
simply hold the guard for the write scope.
crates/vm/src/frame.rs (1)

7053-7098: Factor the descriptor publish sequence into one helper.

write_cached_descriptor and write_cached_descriptor_with_metaclass now duplicate the same begin/invalidate/publish/end flow, which makes it easy for the cache publication protocol to drift when this layout changes again. A single helper with an optional metaclass write keeps the ordering rules in one place.

As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` around lines 7053 - 7098, Extract the common publish
protocol used by write_cached_descriptor and
write_cached_descriptor_with_metaclass into a single helper (e.g.,
publish_descriptor_cache) that encapsulates begin_descriptor_write(cache_base),
the invalidate write_cache_u32(cache_base + 1, 0), the payload writes
(write_cache_u32 for metaclass when present, write_cache_ptr for descr_ptr), the
final write_cache_u32(cache_base + 1, type_version), and
end_descriptor_write(cache_base); modify write_cached_descriptor and
write_cached_descriptor_with_metaclass to compute their metaclass_version
(Option<u32>) and descr_ptr values and then call this helper so the
begin/invalidate/payload/publish/end sequence is implemented once and both
functions delegate to it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/imp.rs`:
- Around line 17-20: acquire_lock currently calls acquire_lock_for_fork and may
block on IMP_LOCK while the Python thread remains ATTACHED, preventing
stop_the_world from observing signals; wrap the blocking lock acquisition in the
Python-facing path with vm.allow_threads(...) so the thread detaches while
waiting — modify the acquire_lock function to call vm.allow_threads(||
acquire_lock_for_fork()) (or the equivalent closure/callback pattern used by
VirtualMachine) so the IMP_LOCK.lock() happens while threads are allowed,
ensuring check_signals() can run on the requester.

In `@crates/vm/src/stdlib/os.rs`:
- Around line 325-330: The write implementation in fn write (using
ArgBytesLike::with_ref and vm.allow_threads(|| crt_fd::write(fd, b))) should
loop and retry when the syscall returns an EINTR (io::ErrorKind::Interrupted)
similar to read/readinto: call crt_fd::write inside vm.allow_threads, if it
returns Err(e) with kind Interrupted then continue the loop and retry, otherwise
return the Ok or Err result; ensure you preserve the Borrowed fd and the bytes
slice across retries and break only on non-Interrupted errors or a successful
write.

In `@crates/vm/src/vm/thread.rs`:
- Around line 146-158: After the successful DETACHED->SUSPENDED CAS
(s.state.compare_exchange) you must re-check vm.state.stop_the_world.requested
to avoid parking on a cleared request: after Ok(_), reload requested with
Ordering::Acquire and if it's false, undo the suspend by CAS-ing s.state from
THREAD_SUSPENDED back to THREAD_DETACHED (or retry that revert loop) and skip
notify_suspended()/wait_while_suspended()/add_attach_wait_yields so the thread
self-recovers; apply the same pattern to the other identical suspend path (the
block around lines 205-220) so you never park when the STW request was cleared
between the initial load and the CAS.

---

Outside diff comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 1879-1885: The code currently calls vm.allow_threads(|| unsafe {
libc::waitpid(pid, &mut status, opt) }) and then checks nix::Error::last_raw(),
but errno can be overwritten when allow_threads reattaches the VM thread; fix by
capturing the raw errno inside the closure that calls libc::waitpid and return
it alongside the syscall result (e.g., return (res, err_no) from the
vm.allow_threads closure), then use that captured errno instead of calling
nix::Error::last_raw() after vm.allow_threads returns so EINTR is detected and
vm.check_signals() is called correctly; update the branch that compares to
libc::EINTR and the error return to use the captured errno and map it via
nix::Error::from_errno or into_pyexception as before.

---

Duplicate comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 948-950: py_os_before_fork(vm) may hide a failure/timeout from
StopTheWorldState::stop_the_world(), so do not call libc::fork()
unconditionally; change the call site in the posix fork path to inspect and
propagate the outcome of py_os_before_fork(vm) (or directly check
StopTheWorldState::stop_the_world()) and abort/return an error instead of
proceeding to unsafe { libc::fork() } when stop-the-world failed or timed out;
update py_os_before_fork(vm) to return a Result or bool indicating success if
needed, log the failure and return a Python error from the surrounding function
rather than performing fork when the stop-the-world step did not succeed.

In `@crates/vm/src/stdlib/time.rs`:
- Around line 120-122: The EINTR check reads errno after vm.allow_threads
reattaches the VM thread, allowing a different syscall to clobber errno; fix by
capturing the errno inside the closure that calls libc::nanosleep: call
vm.allow_threads(|| { let res = unsafe { libc::nanosleep(...)}; let err =
nix::Error::last_raw() (or read errno) inside the closure; return both res and
err (or set interrupted) so the outer code uses the captured err to compute
interrupted and then call vm.check_signals() accordingly, updating the handling
around res and interrupted variables in time.rs.

In `@crates/vm/src/vm/mod.rs`:
- Around line 305-321: The global stop flag self.requested is cleared too late
in start_the_world, allowing threads returning via attach_thread to see
requested == true and re-suspend; move the self.requested.store(false,
Ordering::Release) (and keep self.requester.store(0, Ordering::Relaxed) after)
to occur before iterating the registry so the wake loop sees requested==false;
ensure the loop still checks slot.state for THREAD_SUSPENDED and sets
THREAD_DETACHED and unpark() as before (symbols: start_the_world,
self.requested, self.requester, THREAD_SUSPENDED, THREAD_DETACHED,
attach_thread).

In `@crates/vm/src/vm/thread.rs`:
- Around line 299-304: The blind s.state.store(some) in the suspend-abort path
must be replaced with a guarded compare-and-swap loop: use
s.state.compare_exchange to change THREAD_DETACHED → THREAD_ATTACHED (with the
same Ordering semantics used elsewhere) and retry the CAS; if the CAS observes
THREAD_SUSPENDED (or any non-DETACHED that indicates a new STW request) then
abort/return without forcing ATTACHED. Apply the same CAS-and-retry reattach
flow to both the early-recovery branch around s.state.store(THREAD_ATTACHED) and
the later branch (lines ~315-320) so you don't overwrite a freshly suspended
slot; reference s.state, THREAD_DETACHED, THREAD_ATTACHED, THREAD_SUSPENDED, and
stw.requested and keep ordering Acquire/Release consistent.

---

Nitpick comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 658-688: Replace the manual begin/end pair with an RAII guard:
modify begin_descriptor_write to return a small DescriptorWriteGuard (holding
&self and the index) instead of unit, implement Drop for DescriptorWriteGuard to
perform the increment on self.descriptor_sequences[index] (using fetch_add with
Ordering::Release) so the sequence is always closed even on panic/early return,
and remove or make end_descriptor_write private/unused; keep the current locking
loop logic inside the new begin_descriptor_write before creating/returning the
guard so callers simply hold the guard for the write scope.

In `@crates/vm/src/frame.rs`:
- Around line 7053-7098: Extract the common publish protocol used by
write_cached_descriptor and write_cached_descriptor_with_metaclass into a single
helper (e.g., publish_descriptor_cache) that encapsulates
begin_descriptor_write(cache_base), the invalidate write_cache_u32(cache_base +
1, 0), the payload writes (write_cache_u32 for metaclass when present,
write_cache_ptr for descr_ptr), the final write_cache_u32(cache_base + 1,
type_version), and end_descriptor_write(cache_base); modify
write_cached_descriptor and write_cached_descriptor_with_metaclass to compute
their metaclass_version (Option<u32>) and descr_ptr values and then call this
helper so the begin/invalidate/payload/publish/end sequence is implemented once
and both functions delegate to it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: ad49f99a-3dcf-45ac-822f-069c28ac27fb

📥 Commits

Reviewing files that changed from the base of the PR and between fc1ec3a and 1460037.

📒 Files selected for processing (14)
  • .cspell.dict/cpython.txt
  • .cspell.json
  • crates/common/src/lock.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/object/core.rs
  • crates/vm/src/stdlib/imp.rs
  • crates/vm/src/stdlib/os.rs
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/stdlib/time.rs
  • crates/vm/src/vm/interpreter.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/thread.rs
💤 Files with no reviewable changes (1)
  • .cspell.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • .cspell.dict/cpython.txt

Comment on lines 17 to 20
#[pyfunction]
fn acquire_lock(_vm: &VirtualMachine) {
IMP_LOCK.lock()
acquire_lock_for_fork()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Detach while waiting on IMP_LOCK.

acquire_lock() can now block in IMP_LOCK.lock() while the thread stays ATTACHED. That gives stop_the_world() the same failure mode this PR is fixing elsewhere: the requester can wait forever on a thread parked inside the import lock instead of reaching check_signals(). The Python-facing path should use vm.allow_threads(...) around the blocking acquire.

Suggested fix
 #[pyfunction]
-fn acquire_lock(_vm: &VirtualMachine) {
-    acquire_lock_for_fork()
+fn acquire_lock(vm: &VirtualMachine) {
+    vm.allow_threads(acquire_lock_for_fork)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[pyfunction]
fn acquire_lock(_vm: &VirtualMachine) {
IMP_LOCK.lock()
acquire_lock_for_fork()
}
#[pyfunction]
fn acquire_lock(vm: &VirtualMachine) {
vm.allow_threads(acquire_lock_for_fork)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/imp.rs` around lines 17 - 20, acquire_lock currently
calls acquire_lock_for_fork and may block on IMP_LOCK while the Python thread
remains ATTACHED, preventing stop_the_world from observing signals; wrap the
blocking lock acquisition in the Python-facing path with vm.allow_threads(...)
so the thread detaches while waiting — modify the acquire_lock function to call
vm.allow_threads(|| acquire_lock_for_fork()) (or the equivalent closure/callback
pattern used by VirtualMachine) so the IMP_LOCK.lock() happens while threads are
allowed, ensuring check_signals() can run on the requester.

Comment on lines +325 to +330
fn write(
fd: crt_fd::Borrowed<'_>,
data: ArgBytesLike,
vm: &VirtualMachine,
) -> io::Result<usize> {
data.with_ref(|b| vm.allow_threads(|| crt_fd::write(fd, b)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retry os.write() on EINTR after releasing the VM thread.

read() and readinto() already loop on EINTR, but write() now returns the interrupted syscall directly. That regresses signal behavior for a changed blocking path.

Suggested fix
-    fn write(
-        fd: crt_fd::Borrowed<'_>,
-        data: ArgBytesLike,
-        vm: &VirtualMachine,
-    ) -> io::Result<usize> {
-        data.with_ref(|b| vm.allow_threads(|| crt_fd::write(fd, b)))
+    fn write(
+        fd: crt_fd::Borrowed<'_>,
+        data: ArgBytesLike,
+        vm: &VirtualMachine,
+    ) -> PyResult<usize> {
+        data.with_ref(|b| loop {
+            match vm.allow_threads(|| crt_fd::write(fd, b)) {
+                Ok(n) => return Ok(n),
+                Err(e) if e.raw_os_error() == Some(libc::EINTR) => {
+                    vm.check_signals()?;
+                    continue;
+                }
+                Err(e) => return Err(e.into_pyexception(vm)),
+            }
+        })
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn write(
fd: crt_fd::Borrowed<'_>,
data: ArgBytesLike,
vm: &VirtualMachine,
) -> io::Result<usize> {
data.with_ref(|b| vm.allow_threads(|| crt_fd::write(fd, b)))
fn write(
fd: crt_fd::Borrowed<'_>,
data: ArgBytesLike,
vm: &VirtualMachine,
) -> PyResult<usize> {
data.with_ref(|b| loop {
match vm.allow_threads(|| crt_fd::write(fd, b)) {
Ok(n) => return Ok(n),
Err(e) if e.raw_os_error() == Some(libc::EINTR) => {
vm.check_signals()?;
continue;
}
Err(e) => return Err(e.into_pyexception(vm)),
}
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/os.rs` around lines 325 - 330, The write implementation
in fn write (using ArgBytesLike::with_ref and vm.allow_threads(||
crt_fd::write(fd, b))) should loop and retry when the syscall returns an EINTR
(io::ErrorKind::Interrupted) similar to read/readinto: call crt_fd::write inside
vm.allow_threads, if it returns Err(e) with kind Interrupted then continue the
loop and retry, otherwise return the Ok or Err result; ensure you preserve the
Borrowed fd and the bytes slice across retries and break only on non-Interrupted
errors or a successful write.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_fork1.py

dependencies:

dependent tests: (no tests depend on fork1)

[x] lib: cpython/Lib/os.py
[ ] test: cpython/Lib/test/test_os.py (TODO: 1)
[x] test: cpython/Lib/test/test_popen.py

dependencies:

  • os

dependent tests: (169 tests)

  • os: test___all__ test__osx_support test_argparse test_ast test_asyncio test_atexit test_base64 test_baseexception test_bdb test_bool test_buffer test_builtin test_bytes test_bz2 test_c_locale_coercion test_calendar test_cmd_line test_cmd_line_script test_codecs test_compile test_compileall test_concurrent_futures test_configparser test_contextlib test_ctypes test_dbm test_dbm_dumb test_dbm_sqlite3 test_decimal test_devpoll test_doctest test_dtrace test_eintr test_email test_ensurepip test_enum test_epoll test_exception_hierarchy test_exceptions test_faulthandler test_fcntl test_file test_file_eintr test_filecmp test_fileinput test_fileio test_float test_fnmatch test_fork1 test_fractions test_fstring test_ftplib test_future_stmt test_genericalias test_genericpath test_getpass test_gettext test_glob test_graphlib test_gzip test_hash test_hashlib test_http_cookiejar test_httplib test_httpservers test_imaplib test_importlib test_inspect test_io test_ioctl test_json test_kqueue test_largefile test_launcher test_linecache test_locale test_logging test_lzma test_mailbox test_marshal test_math test_mimetypes test_mmap test_msvcrt test_multiprocessing_fork test_multiprocessing_forkserver test_multiprocessing_main_handling test_multiprocessing_spawn test_netrc test_ntpath test_openpty test_optparse test_os test_pathlib test_pkg test_pkgutil test_platform test_plistlib test_poll test_popen test_posix test_posixpath test_pty test_py_compile test_pydoc test_pyexpat test_random test_regrtest test_repl test_reprlib test_robotparser test_runpy test_sax test_script_helper test_selectors test_shelve test_shutil test_signal test_site test_smtpnet test_socket test_socketserver test_sqlite3 test_ssl test_stat test_string_literals test_structseq test_subprocess test_support test_sys test_sysconfig test_tabnanny test_tarfile test_tempfile test_termios test_thread test_threading test_threadsignals test_time test_tokenize test_tools test_trace test_tty test_typing test_unicode_file test_unicode_file_functions test_unittest test_univnewlines test_urllib test_urllib2 test_urllib2_localnet test_urllib2net test_urllibnet test_uuid test_venv test_wait3 test_wait4 test_wave test_webbrowser test_winapi test_winconsoleio test_winreg test_winsound test_wsgiref test_xml_etree test_zipfile test_zipimport test_zoneinfo test_zstd

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Add stop-the-world thread suspension around fork() to prevent
deadlocks from locks held by dead parent threads in the child.

- Thread states: DETACHED / ATTACHED / SUSPENDED with atomic CAS
  transitions matching _PyThreadState_{Attach,Detach,Suspend}
- stop_the_world / start_the_world: park all non-requester threads
  before fork, resume after (parent) or reset (child)
- allow_threads (Py_BEGIN/END_ALLOW_THREADS): detach around blocking
  syscalls (os.read/write, waitpid, Lock.acquire, time.sleep) so
  stop_the_world can force-park via CAS
- Acquire/release import lock around fork lifecycle
- zero_reinit_after_fork: generic lock reset for parking_lot types
- gc_clear_raw: detach dict instead of clearing entries
- Lock-free double-check for descriptor cache reads (no read-side
  seqlock); write-side seqlock retained for writer serialization
- fork() returns PyResult, checks PythonFinalizationError, calls
  sys.audit
@youknowone youknowone merged commit 2bb9173 into RustPython:main Mar 7, 2026
13 of 14 checks passed
@youknowone youknowone deleted the implock branch March 7, 2026 11:20
@coderabbitai coderabbitai bot mentioned this pull request Mar 7, 2026
youknowone added a commit to youknowone/RustPython that referenced this pull request Mar 8, 2026
* Suspend Python threads before fork()

Add stop-the-world thread suspension around fork() to prevent
deadlocks from locks held by dead parent threads in the child.

- Thread states: DETACHED / ATTACHED / SUSPENDED with atomic CAS
  transitions matching _PyThreadState_{Attach,Detach,Suspend}
- stop_the_world / start_the_world: park all non-requester threads
  before fork, resume after (parent) or reset (child)
- allow_threads (Py_BEGIN/END_ALLOW_THREADS): detach around blocking
  syscalls (os.read/write, waitpid, Lock.acquire, time.sleep) so
  stop_the_world can force-park via CAS
- Acquire/release import lock around fork lifecycle
- zero_reinit_after_fork: generic lock reset for parking_lot types
- gc_clear_raw: detach dict instead of clearing entries
- Lock-free double-check for descriptor cache reads (no read-side
  seqlock); write-side seqlock retained for writer serialization
- fork() returns PyResult, checks PythonFinalizationError, calls
  sys.audit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant