Suspend Python threads before fork() #7364
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR implements a multi-threaded stop-the-world mechanism for safe fork operations in RustPython. It centralizes lock reinitialization, introduces atomic descriptor sequencing for caching, adds per-thread state tracking, and integrates a thread suspension mechanism to pause non-requester threads before fork while coordinating their resumption afterward. Changes
Sequence Diagram(s)sequenceDiagram
participant Main as Main Thread<br/>(Forking)
participant Worker as Worker Threads
participant VM as VirtualMachine<br/>(stop_the_world)
participant Fork as fork() syscall
participant Child as Child Process
participant Parent as Parent Process
Main->>VM: stop_the_world()
VM->>VM: set requested state
VM->>Worker: signal suspension needed
loop Poll until suspended
VM->>Worker: check state
Worker->>VM: ATTACHED→SUSPENDED
Worker->>VM: notify_suspended()
VM->>VM: accumulate stats
end
VM->>Main: all suspended ✓
Main->>Fork: fork syscall
Fork->>Child: child process
Fork->>Parent: parent process
Child->>VM: reset_after_fork()
Child->>VM: reinit locks (zero_reinit)
Child->>VM: release import lock
Parent->>VM: start_the_world()
VM->>Worker: resume (SUSPENDED→ATTACHED)
Parent->>VM: resume threads
Parent->>Parent: release import lock
Worker->>Worker: continue execution
sequenceDiagram
participant Reader as Reader Thread
participant Cache as Descriptor Cache
participant Writer as Writer Thread
participant Seq as Sequence Counter
Writer->>Seq: begin_descriptor_write(idx)
Writer->>Cache: write pointer, version
Writer->>Seq: end_descriptor_write(idx)
Reader->>Seq: read sequence[idx]
Reader->>Cache: read pointer, type_version
Reader->>Cache: double-check version
alt Versions match
Reader->>Reader: use cached descriptor
else Stale cache
Reader->>Cache: miss (stale)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c7de8f0 to
e2c097e
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
crates/vm/src/vm/mod.rs (2)
186-191: Minor: Redundant control flow.The final check can be simplified since both branches return
all_suspended. The comment mentions "verify once more after dropping the lock" but verification happens before the lock is dropped.♻️ Simplify return
- if all_suspended { - // Verify once more after dropping the lock - return true; - } - all_suspended + all_suspended🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 186 - 191, The branch returning true is redundant: where you currently have `if all_suspended { /* Verify once more after dropping the lock */ return true; } all_suspended`, replace that whole conditional with a single `return all_suspended;` and remove or update the misleading comment about verifying after dropping the lock so it accurately reflects where verification occurs; target the block that references the `all_suspended` boolean in this function.
199-218: Consider logging or metrics for stop_the_world timeout.When the 500ms deadline is reached without all threads suspended,
stop_the_worldsilently proceeds. While this matches CPython's best-effort behavior, it could lead to subtle fork-safety issues if threads remain attached. Consider adding a debug log or metric for observability in production debugging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 199 - 218, stop_the_world currently proceeds silently on timeout; after the waiting loop returns, call park_detached_threads(vm) once more (or check the loop exit cause) and if it still fails to suspend all threads, emit a debug log and/or increment an observability metric indicating a stop_the_world timeout, including the requester_ident (from crate::stdlib::thread::get_ident()) for context; use existing symbols stop_the_world, park_detached_threads, requester/requester_ident, and notify_mutex/notify_cv to locate the correct spot to add the log/metric.crates/compiler-core/src/bytecode.rs (1)
658-704: SeqLock implementation looks correct; minor redundancy inbegin_descriptor_write.The SeqLock pattern is correctly implemented. One observation: the
fence(Ordering::Release)at line 690 is redundant because the successful CAS withOrdering::AcqRelalready provides Release semantics. The fence is harmless but adds unnecessary overhead.♻️ Optional: Remove redundant fence
match sequence.compare_exchange_weak( seq, seq.wrapping_add(1), Ordering::AcqRel, Ordering::Acquire, ) { Ok(_) => { - core::sync::atomic::fence(Ordering::Release); break; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 658 - 704, The fence(Ordering::Release) call inside begin_descriptor_write is redundant because the successful compare_exchange_weak already uses Ordering::AcqRel and provides the required release semantics; remove the core::sync::atomic::fence(Ordering::Release) statement from the begin_descriptor_write implementation so the CAS alone handles ordering, leaving the rest of the loop and compare_exchange_weak call intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.cspell.dict/cpython.txt:
- Line 163: The dictionary entry "PYTHONUTF" is misspelled; update the token to
"PYTHONUTF8" so the CPython UTF-8 environment variable is correct; locate the
entry "PYTHONUTF" in .cspell.dict/cpython.txt and replace it with "PYTHONUTF8"
(preserve surrounding formatting).
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 913-918: The stop-the-world call may time out and currently is
ignored before calling libc::fork(), which can fork while another thread holds
interpreter locks; change the call site that invokes
vm.state.stop_the_world.stop_the_world(vm) to check a success/failure return (or
propagate a Result) and if the stop-the-world timed out abort the fork path by
returning an error or raising an exception instead of proceeding to unsafe {
libc::fork() }; update the stop_the_world API (and any callers like the code
invoking fork) to return a boolean/Result so fork only runs when stop-the-world
succeeded.
In `@crates/vm/src/vm/thread.rs`:
- Around line 175-202: do_suspend currently does a direct
s.state.store(THREAD_ATTACHED, Ordering::Release) after the suspension loop,
which can overwrite a concurrently set THREAD_SUSPENDED; change this to the same
CAS-with-retry pattern used by attach_thread: perform a compare_exchange loop on
s.state that only sets THREAD_ATTACHED when the current value is
THREAD_SUSPENDED (use the same Ordering::Acquire/Release semantics as
surrounding loads/stores), retrying or re-reading the state until the
compare_exchange succeeds or the state is no longer THREAD_SUSPENDED so you
avoid clobbering a new suspend request; reference function do_suspend, type
StopTheWorldState, and field s.state and constants THREAD_SUSPENDED and
THREAD_ATTACHED.
---
Nitpick comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 658-704: The fence(Ordering::Release) call inside
begin_descriptor_write is redundant because the successful compare_exchange_weak
already uses Ordering::AcqRel and provides the required release semantics;
remove the core::sync::atomic::fence(Ordering::Release) statement from the
begin_descriptor_write implementation so the CAS alone handles ordering, leaving
the rest of the loop and compare_exchange_weak call intact.
In `@crates/vm/src/vm/mod.rs`:
- Around line 186-191: The branch returning true is redundant: where you
currently have `if all_suspended { /* Verify once more after dropping the lock
*/ return true; } all_suspended`, replace that whole conditional with a single
`return all_suspended;` and remove or update the misleading comment about
verifying after dropping the lock so it accurately reflects where verification
occurs; target the block that references the `all_suspended` boolean in this
function.
- Around line 199-218: stop_the_world currently proceeds silently on timeout;
after the waiting loop returns, call park_detached_threads(vm) once more (or
check the loop exit cause) and if it still fails to suspend all threads, emit a
debug log and/or increment an observability metric indicating a stop_the_world
timeout, including the requester_ident (from crate::stdlib::thread::get_ident())
for context; use existing symbols stop_the_world, park_detached_threads,
requester/requester_ident, and notify_mutex/notify_cv to locate the correct spot
to add the log/metric.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 242defd5-92d2-405d-b0ec-2b02b3d677ad
📒 Files selected for processing (11)
.cspell.dict/cpython.txt.cspell.jsoncrates/common/src/lock.rscrates/compiler-core/src/bytecode.rscrates/vm/src/frame.rscrates/vm/src/stdlib/imp.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/thread.rscrates/vm/src/vm/interpreter.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rs
💤 Files with no reviewable changes (1)
- .cspell.json
crates/vm/src/frame.rs
Outdated
| loop { | ||
| let sequence = self.code.instructions.begin_descriptor_read(cache_base); | ||
| let version_before = self.code.instructions.read_cache_u32(cache_base + 1); | ||
| if version_before != expected_type_version || version_before == 0 { | ||
| if self | ||
| .code | ||
| .instructions | ||
| .end_descriptor_read(cache_base, sequence) | ||
| { | ||
| return None; | ||
| } | ||
| continue; | ||
| } | ||
|
|
||
| let descr_ptr = self.code.instructions.read_cache_ptr(cache_base + 5); | ||
| if descr_ptr == 0 { | ||
| if self | ||
| .code | ||
| .instructions | ||
| .end_descriptor_read(cache_base, sequence) | ||
| { | ||
| return None; | ||
| } | ||
| continue; | ||
| } | ||
|
|
||
| let cloned = unsafe { PyObject::try_to_owned_from_ptr(descr_ptr as *mut PyObject) }; | ||
| let version_after = self.code.instructions.read_cache_u32(cache_base + 1); | ||
| let ptr_after = self.code.instructions.read_cache_ptr(cache_base + 5); | ||
| let stable = self | ||
| .code | ||
| .instructions | ||
| .end_descriptor_read(cache_base, sequence); | ||
|
|
||
| if let Some(cloned) = cloned | ||
| && stable | ||
| && version_after == expected_type_version | ||
| && ptr_after == descr_ptr | ||
| { | ||
| return Some(cloned); | ||
| } | ||
| if stable { | ||
| return None; | ||
| } |
There was a problem hiding this comment.
Include cache_base + 3 in the guarded descriptor snapshot.
try_read_cached_descriptor() only revalidates the type-version slot and descriptor pointer, but LoadAttrClassWithMetaclassCheck still consumes metaclass_version outside that protocol while write_cached_descriptor_with_metaclass() now publishes it inside the write epoch. That leaves the metaclass-aware fast path built from a mixed snapshot: best case it spuriously deopts, worst case it accepts a cache hit against a stale metaclass guard under concurrent mutation. Please fold the metaclass slot into the same read helper, or add a metaclass-aware variant.
Also applies to: 7111-7124
crates/vm/src/stdlib/posix.rs
Outdated
| // Like CPython's PyOS_BeforeFork: stop all other Python threads | ||
| // so they are at safe points, not holding internal Rust/C locks. | ||
| #[cfg(feature = "threading")] | ||
| vm.state.stop_the_world.stop_the_world(vm); | ||
|
|
||
| let pid = unsafe { libc::fork() }; |
There was a problem hiding this comment.
Don't call fork() after a timed-out stop-the-world.
stop_the_world() currently gives up after 500ms in crates/vm/src/vm/mod.rs:199-218, while attached threads only self-suspend from the bytecode loop in crates/vm/src/vm/thread.rs:168-197. A thread blocked in a VM/internal lock never reaches that path, so this code can still execute libc::fork() while another thread owns interpreter state. Please make the stop-the-world phase report success/failure and abort or raise instead of forking on timeout.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/stdlib/posix.rs` around lines 913 - 918, The stop-the-world
call may time out and currently is ignored before calling libc::fork(), which
can fork while another thread holds interpreter locks; change the call site that
invokes vm.state.stop_the_world.stop_the_world(vm) to check a success/failure
return (or propagate a Result) and if the stop-the-world timed out abort the
fork path by returning an error or raising an exception instead of proceeding to
unsafe { libc::fork() }; update the stop_the_world API (and any callers like the
code invoking fork) to return a boolean/Result so fork only runs when
stop-the-world succeeded.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/thread.rs (1)
79-96:⚠️ Potential issue | 🔴 CriticalThe detach-on-wait rollout is still incomplete.
This fixes
Lock.acquire(), butRLock::_acquire_restore()still does a rawself.mu.lock(), and_ThreadHandle.join()still waits ondone_event/JoinHandle::join()while the caller staysTHREAD_ATTACHED. If another thread forks during either path,stop_the_world()can spin forever waiting for them to suspend. Please route those waits throughvm.allow_threads()too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/thread.rs` around lines 79 - 96, The rollout missed converting some blocking waits to detach from the GC: wrap the raw mutex/wait calls in RLock::_acquire_restore and in _ThreadHandle.join so they use vm.allow_threads (same pattern as Lock.acquire used for mu.try_lock_for and mu.lock()); specifically replace direct self.mu.lock() in RLock::_acquire_restore and the waits on done_event / JoinHandle::join inside _ThreadHandle.join with vm.allow_threads(|| ... ) so those waits occur while the thread is THREAD_DETACHED and cannot block stop_the_world.
♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)
7027-7058:⚠️ Potential issue | 🟠 MajorGuard
cache_base + 3in the same snapshot as the descriptor pointer.
LoadAttrClassWithMetaclassCheckstill consumesmetaclass_versionoutsidetry_read_cached_descriptor(), while this writer now publishes that slot inside the descriptor write epoch. That means the fast path can still combine a stale metaclass guard with a freshly published{type_version, descr_ptr}pair and accept a mixed snapshot under concurrent mutation. Please foldcache_base + 3into the read helper, or add a metaclass-aware variant for the class-attribute path.Also applies to: 7091-7105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7027 - 7058, The helper try_read_cached_descriptor can return a false-positive by not checking the metaclass guard slot (cache_base + 3) in the same snapshot as the descr_ptr; add the metaclass check into the atomic double-check or provide a new variant (e.g., try_read_cached_descriptor_with_metaclass) that accepts an expected_metaclass_version and verifies cache_base+3 equals that value in the same re-read that validates the type version and descr_ptr. Update the LoadAttrClassWithMetaclassCheck call site to call the new variant (or fold its metaclass_version consumption into try_read_cached_descriptor) so the metaclass_version, type_version, and descr_ptr are validated together before returning the cloned object.
🧹 Nitpick comments (1)
scripts/fork_stw_compare.sh (1)
17-18: Usepythonfor the CPython side of this harness.Hardcoding
python3can select a different interpreter than the repo's active comparison environment.♻️ Suggested change
-python3 "$SCRIPT_PATH" \ +python "$SCRIPT_PATH" \Based on learnings: When comparing behavior with CPython or running test suites, explicitly use the
pythoncommand for CPython andcargo run --for RustPython.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/fork_stw_compare.sh` around lines 17 - 18, The script uses the hardcoded interpreter "python3" when invoking the CPython side; update the invocation in the fork_stw_compare.sh script (the line calling python3 "$SCRIPT_PATH") to call "python" instead so the harness uses the repo's configured CPython executable; keep the echo "[CPython]" and rest of the invocation unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/stdlib/time.rs`:
- Around line 120-122: The EINTR check is unreliable because errno is read after
vm.allow_threads returns; change the closure passed to vm.allow_threads so it
calls libc::nanosleep and immediately captures errno inside the closure (e.g.
call nix::Error::last_raw() or read libc::errno) and return or otherwise expose
both the nanosleep result and the captured errno to the outer scope; then
compute interrupted with the captured errno (e.g. interrupted = res == -1 &&
captured_errno == libc::EINTR) and call vm.check_signals() when interrupted.
Ensure you update the use of libc::nanosleep, vm.allow_threads,
nix::Error::last_raw(), and the interrupted logic to use the captured errno from
inside the closure.
In `@crates/vm/src/vm/mod.rs`:
- Around line 314-320: The stop-request flag self.requested must be cleared
before waking suspended threads to avoid a race where a thread rechecks
attach_thread() and re-suspends; move the self.requested.store(false,
Ordering::Release) above the loop that iterates registry and before calling
slot.state.store(... THREAD_DETACHED ...) and slot.thread.unpark() in
start_the_world(), so that threads returning from allow_threads() see
requested==false; update any comments around start_the_world(), attach_thread(),
THREAD_SUSPENDED, and THREAD_DETACHED to reflect the new ordering and keep
drop(registry) after the wake loop as needed.
In `@scripts/fork_stw_matrix.sh`:
- Around line 21-22: The script currently calls a hardcoded prebuilt RustPython
binary (target/release/rustpython) which breaks if CARGO_TARGET_DIR or Cargo
config moves build outputs; replace those direct binary invocations with cargo
run --release -- (invoked from the same directory where you run cargo build,
e.g. (cd "$ROOT_DIR" && cargo run --release -- -- <args>)) or alternatively
resolve the actual target dir via CARGO_TARGET_DIR (or cargo metadata) and use
that path; update every occurrence of target/release/rustpython (including the
block around the build step and the invocations in the 35-41 range) to use cargo
run --release -- or the dynamically-resolved path so RustPython is executed
reliably.
In `@scripts/fork_stw_stress.py`:
- Around line 92-105: The benchmark resets STW stats too early and labels
latency incorrectly: move the maybe_reset_stw_stats() call out of the top of the
loop and instead invoke it once immediately after the warmup iterations complete
so rustpython_stw_stats excludes warmup and aligns with lat_ns; also adjust how
fork_latency_us is measured—if you intend to measure pure fork latency (not
child exit+wait), capture t0 = time.perf_counter_ns() before os.fork() and
compute dt immediately after the fork returns (before os.waitpid()), otherwise
update the metric name to indicate it includes the waitpid round-trip; update
references in this block (maybe_reset_stw_stats(), warmup, lat_ns,
fork_latency_us) accordingly.
---
Outside diff comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 79-96: The rollout missed converting some blocking waits to detach
from the GC: wrap the raw mutex/wait calls in RLock::_acquire_restore and in
_ThreadHandle.join so they use vm.allow_threads (same pattern as Lock.acquire
used for mu.try_lock_for and mu.lock()); specifically replace direct
self.mu.lock() in RLock::_acquire_restore and the waits on done_event /
JoinHandle::join inside _ThreadHandle.join with vm.allow_threads(|| ... ) so
those waits occur while the thread is THREAD_DETACHED and cannot block
stop_the_world.
---
Duplicate comments:
In `@crates/vm/src/frame.rs`:
- Around line 7027-7058: The helper try_read_cached_descriptor can return a
false-positive by not checking the metaclass guard slot (cache_base + 3) in the
same snapshot as the descr_ptr; add the metaclass check into the atomic
double-check or provide a new variant (e.g.,
try_read_cached_descriptor_with_metaclass) that accepts an
expected_metaclass_version and verifies cache_base+3 equals that value in the
same re-read that validates the type version and descr_ptr. Update the
LoadAttrClassWithMetaclassCheck call site to call the new variant (or fold its
metaclass_version consumption into try_read_cached_descriptor) so the
metaclass_version, type_version, and descr_ptr are validated together before
returning the cloned object.
---
Nitpick comments:
In `@scripts/fork_stw_compare.sh`:
- Around line 17-18: The script uses the hardcoded interpreter "python3" when
invoking the CPython side; update the invocation in the fork_stw_compare.sh
script (the line calling python3 "$SCRIPT_PATH") to call "python" instead so the
harness uses the repo's configured CPython executable; keep the echo "[CPython]"
and rest of the invocation unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 7b551c33-3f49-4c98-92c9-d0ef02d01082
📒 Files selected for processing (14)
crates/compiler-core/src/bytecode.rscrates/vm/src/frame.rscrates/vm/src/stdlib/imp.rscrates/vm/src/stdlib/os.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/thread.rscrates/vm/src/stdlib/time.rscrates/vm/src/vm/interpreter.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rsfork-align.mdscripts/fork_stw_compare.shscripts/fork_stw_matrix.shscripts/fork_stw_stress.py
scripts/fork_stw_matrix.sh
Outdated
| echo "building RustPython release binary..." | ||
| (cd "$ROOT_DIR" && cargo build --release >/dev/null) |
There was a problem hiding this comment.
Resolve the RustPython executable through Cargo.
target/release/rustpython breaks as soon as CARGO_TARGET_DIR or Cargo config moves the build output. Either invoke RustPython with cargo run --release -- here, or resolve Cargo's target directory dynamically before running a prebuilt binary.
♻️ Suggested change
-echo "building RustPython release binary..."
-(cd "$ROOT_DIR" && cargo build --release >/dev/null)
-
for threads in $THREADS_SET; do
echo "[threads=$threads] CPython"
python3 "$SCRIPT_PATH" \
@@
echo "[threads=$threads] RustPython"
- (cd "$ROOT_DIR" && target/release/rustpython "$SCRIPT_PATH" \
+ (cd "$ROOT_DIR" && cargo run --release -- "$SCRIPT_PATH" \
--loops "$LOOPS" \
--threads "$threads" \
--spin-chunk "$SPIN_CHUNK" \Based on learnings: When comparing behavior with CPython or running test suites, explicitly use the python command for CPython and cargo run -- for RustPython.
Also applies to: 35-41
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/fork_stw_matrix.sh` around lines 21 - 22, The script currently calls
a hardcoded prebuilt RustPython binary (target/release/rustpython) which breaks
if CARGO_TARGET_DIR or Cargo config moves build outputs; replace those direct
binary invocations with cargo run --release -- (invoked from the same directory
where you run cargo build, e.g. (cd "$ROOT_DIR" && cargo run --release -- --
<args>)) or alternatively resolve the actual target dir via CARGO_TARGET_DIR (or
cargo metadata) and use that path; update every occurrence of
target/release/rustpython (including the block around the build step and the
invocations in the 35-41 range) to use cargo run --release -- or the
dynamically-resolved path so RustPython is executed reliably.
scripts/fork_stw_stress.py
Outdated
| maybe_reset_stw_stats() | ||
| lat_ns: list[int] = [] | ||
|
|
||
| total = max(0, args.loops) | ||
| warmup = min(max(0, args.warmup), total) | ||
| for i in range(total): | ||
| t0 = time.perf_counter_ns() | ||
| pid = os.fork() | ||
| if pid == 0: | ||
| os._exit(0) | ||
| os.waitpid(pid, 0) | ||
| dt = time.perf_counter_ns() - t0 | ||
| if i >= warmup: | ||
| lat_ns.append(dt) |
There was a problem hiding this comment.
Align the benchmark window with what the script reports.
fork_latency_us currently includes the child exit + waitpid() round trip, and rustpython_stw_stats includes warmup iterations because the reset happens before the warmup loop. That makes the latency label misleading and the counters non-comparable to the sampled latencies.
📏 Suggested refactor
- maybe_reset_stw_stats()
- lat_ns: list[int] = []
-
- total = max(0, args.loops)
- warmup = min(max(0, args.warmup), total)
- for i in range(total):
- t0 = time.perf_counter_ns()
- pid = os.fork()
- if pid == 0:
- os._exit(0)
- os.waitpid(pid, 0)
- dt = time.perf_counter_ns() - t0
- if i >= warmup:
- lat_ns.append(dt)
+ def run_once() -> int:
+ t0 = time.perf_counter_ns()
+ pid = os.fork()
+ if pid == 0:
+ os._exit(0)
+ dt = time.perf_counter_ns() - t0
+ os.waitpid(pid, 0)
+ return dt
+
+ total = max(0, args.loops)
+ warmup = min(max(0, args.warmup), total)
+ for _ in range(warmup):
+ run_once()
+
+ maybe_reset_stw_stats()
+ lat_ns = [run_once() for _ in range(total - warmup)]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/fork_stw_stress.py` around lines 92 - 105, The benchmark resets STW
stats too early and labels latency incorrectly: move the maybe_reset_stw_stats()
call out of the top of the loop and instead invoke it once immediately after the
warmup iterations complete so rustpython_stw_stats excludes warmup and aligns
with lat_ns; also adjust how fork_latency_us is measured—if you intend to
measure pure fork latency (not child exit+wait), capture t0 =
time.perf_counter_ns() before os.fork() and compute dt immediately after the
fork returns (before os.waitpid()), otherwise update the metric name to indicate
it includes the waitpid round-trip; update references in this block
(maybe_reset_stw_stats(), warmup, lat_ns, fork_latency_us) accordingly.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/posix.rs (1)
1879-1885:⚠️ Potential issue | 🟠 MajorCapture
errnobeforeallow_threads()reattaches the VM thread.
waitpid()now readserrnoaftervm.allow_threads(...)returns, so an EINTR can be misclassified and skipvm.check_signals().Suggested fix
- let res = vm.allow_threads(|| unsafe { libc::waitpid(pid, &mut status, opt) }); + let (res, err) = vm.allow_threads(|| { + let res = unsafe { libc::waitpid(pid, &mut status, opt) }; + (res, nix::Error::last_raw()) + }); if res == -1 { - if nix::Error::last_raw() == libc::EINTR { + if err == libc::EINTR { vm.check_signals()?; continue; } - return Err(nix::Error::last().into_pyexception(vm)); + return Err(io::Error::from_raw_os_error(err).into_pyexception(vm)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/posix.rs` around lines 1879 - 1885, The code currently calls vm.allow_threads(|| unsafe { libc::waitpid(pid, &mut status, opt) }) and then checks nix::Error::last_raw(), but errno can be overwritten when allow_threads reattaches the VM thread; fix by capturing the raw errno inside the closure that calls libc::waitpid and return it alongside the syscall result (e.g., return (res, err_no) from the vm.allow_threads closure), then use that captured errno instead of calling nix::Error::last_raw() after vm.allow_threads returns so EINTR is detected and vm.check_signals() is called correctly; update the branch that compares to libc::EINTR and the error return to use the captured errno and map it via nix::Error::from_errno or into_pyexception as before.
♻️ Duplicate comments (4)
crates/vm/src/stdlib/time.rs (1)
120-122:⚠️ Potential issue | 🟠 MajorCapture
errnoinside theallow_threadsclosure.
allow_threads()reattaches the VM thread before this code readserrno, so the EINTR check can observe a later syscall and missvm.check_signals().Suggested fix
- let res = - vm.allow_threads(|| unsafe { libc::nanosleep(ts.as_ref(), core::ptr::null_mut()) }); - let interrupted = res == -1 && nix::Error::last_raw() == libc::EINTR; + let (res, err) = vm.allow_threads(|| { + let res = unsafe { libc::nanosleep(ts.as_ref(), core::ptr::null_mut()) }; + (res, nix::Error::last_raw()) + }); + let interrupted = res == -1 && err == libc::EINTR;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/time.rs` around lines 120 - 122, The EINTR check reads errno after vm.allow_threads reattaches the VM thread, allowing a different syscall to clobber errno; fix by capturing the errno inside the closure that calls libc::nanosleep: call vm.allow_threads(|| { let res = unsafe { libc::nanosleep(...)}; let err = nix::Error::last_raw() (or read errno) inside the closure; return both res and err (or set interrupted) so the outer code uses the captured err to compute interrupted and then call vm.check_signals() accordingly, updating the handling around res and interrupted variables in time.rs.crates/vm/src/stdlib/posix.rs (1)
948-950:⚠️ Potential issue | 🔴 CriticalDon't proceed to
fork()if stop-the-world can fail or time out.This path still calls
libc::fork()unconditionally afterpy_os_before_fork(vm). IfStopTheWorldState::stop_the_world()still has a timeout/failure mode, the fork can run while another thread still owns interpreter state.Run this to confirm whether
stop_the_world()has a timeout/failure path that is currently ignored here:#!/bin/bash set -euo pipefail rg -n -C4 'struct StopTheWorldState|fn stop_the_world|wait_timeout|timeout|500' crates/vm/src/vm sed -n '764,776p' crates/vm/src/stdlib/posix.rs sed -n '929,952p' crates/vm/src/stdlib/posix.rsIf the search shows a boolean/
Resultreturn or an internal timeout path,py_os_before_fork()should propagate that outcome andfork()should abort instead of continuing tolibc::fork().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/posix.rs` around lines 948 - 950, py_os_before_fork(vm) may hide a failure/timeout from StopTheWorldState::stop_the_world(), so do not call libc::fork() unconditionally; change the call site in the posix fork path to inspect and propagate the outcome of py_os_before_fork(vm) (or directly check StopTheWorldState::stop_the_world()) and abort/return an error instead of proceeding to unsafe { libc::fork() } when stop-the-world failed or timed out; update py_os_before_fork(vm) to return a Result or bool indicating success if needed, log the failure and return a Python error from the surrounding function rather than performing fork when the stop-the-world step did not succeed.crates/vm/src/vm/mod.rs (1)
305-321:⚠️ Potential issue | 🔴 CriticalClear
requestedbefore resuming suspended threads.Waking slots while the global stop flag is still true lets a thread returning through
attach_thread()seerequested == true, move itself back toTHREAD_SUSPENDED, and miss this resume pass.self.requested.store(false, ...)needs to happen before the wake loop.Suggested fix
pub fn start_the_world(&self, vm: &VirtualMachine) { use thread::{THREAD_DETACHED, THREAD_SUSPENDED}; let requester = self.requester.load(Ordering::Relaxed); stw_trace(format_args!("start begin requester={requester}")); + self.requested.store(false, Ordering::Release); let registry = vm.state.thread_frames.lock(); for (&id, slot) in registry.iter() { if id == requester { continue; } if slot.state.load(Ordering::Relaxed) == THREAD_SUSPENDED { slot.state.store(THREAD_DETACHED, Ordering::Release); slot.thread.unpark(); } } drop(registry); - self.requested.store(false, Ordering::Release); self.requester.store(0, Ordering::Relaxed);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/mod.rs` around lines 305 - 321, The global stop flag self.requested is cleared too late in start_the_world, allowing threads returning via attach_thread to see requested == true and re-suspend; move the self.requested.store(false, Ordering::Release) (and keep self.requester.store(0, Ordering::Relaxed) after) to occur before iterating the registry so the wake loop sees requested==false; ensure the loop still checks slot.state for THREAD_SUSPENDED and sets THREAD_DETACHED and unpark() as before (symbols: start_the_world, self.requested, self.requester, THREAD_SUSPENDED, THREAD_DETACHED, attach_thread).crates/vm/src/vm/thread.rs (1)
299-304:⚠️ Potential issue | 🔴 CriticalDon’t recover with a blind
store/swapback toATTACHED.These writes can still overwrite a fresh
THREAD_SUSPENDED. A new stop-the-world request can race in after the flag/wait check and before Line 302 or Line 316, and the slot gets forced back to running code even though a requester already treated it as parked. This still needs the guarded CAS-and-retry reattach flow from the earlier review.Also applies to: 315-320
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/vm/thread.rs` around lines 299 - 304, The blind s.state.store(some) in the suspend-abort path must be replaced with a guarded compare-and-swap loop: use s.state.compare_exchange to change THREAD_DETACHED → THREAD_ATTACHED (with the same Ordering semantics used elsewhere) and retry the CAS; if the CAS observes THREAD_SUSPENDED (or any non-DETACHED that indicates a new STW request) then abort/return without forcing ATTACHED. Apply the same CAS-and-retry reattach flow to both the early-recovery branch around s.state.store(THREAD_ATTACHED) and the later branch (lines ~315-320) so you don't overwrite a freshly suspended slot; reference s.state, THREAD_DETACHED, THREAD_ATTACHED, THREAD_SUSPENDED, and stw.requested and keep ordering Acquire/Release consistent.
🧹 Nitpick comments (2)
crates/compiler-core/src/bytecode.rs (1)
658-688: Use an RAII guard for descriptor write sections.If a caller panics or returns early after
begin_descriptor_write()and beforeend_descriptor_write(), the sequence stays odd forever and later readers/writers will spin on that slot. Returning a small guard that performs the closing increment inDropmakes this much harder to misuse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/compiler-core/src/bytecode.rs` around lines 658 - 688, Replace the manual begin/end pair with an RAII guard: modify begin_descriptor_write to return a small DescriptorWriteGuard (holding &self and the index) instead of unit, implement Drop for DescriptorWriteGuard to perform the increment on self.descriptor_sequences[index] (using fetch_add with Ordering::Release) so the sequence is always closed even on panic/early return, and remove or make end_descriptor_write private/unused; keep the current locking loop logic inside the new begin_descriptor_write before creating/returning the guard so callers simply hold the guard for the write scope.crates/vm/src/frame.rs (1)
7053-7098: Factor the descriptor publish sequence into one helper.
write_cached_descriptorandwrite_cached_descriptor_with_metaclassnow duplicate the same begin/invalidate/publish/end flow, which makes it easy for the cache publication protocol to drift when this layout changes again. A single helper with an optional metaclass write keeps the ordering rules in one place.As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/frame.rs` around lines 7053 - 7098, Extract the common publish protocol used by write_cached_descriptor and write_cached_descriptor_with_metaclass into a single helper (e.g., publish_descriptor_cache) that encapsulates begin_descriptor_write(cache_base), the invalidate write_cache_u32(cache_base + 1, 0), the payload writes (write_cache_u32 for metaclass when present, write_cache_ptr for descr_ptr), the final write_cache_u32(cache_base + 1, type_version), and end_descriptor_write(cache_base); modify write_cached_descriptor and write_cached_descriptor_with_metaclass to compute their metaclass_version (Option<u32>) and descr_ptr values and then call this helper so the begin/invalidate/payload/publish/end sequence is implemented once and both functions delegate to it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/stdlib/imp.rs`:
- Around line 17-20: acquire_lock currently calls acquire_lock_for_fork and may
block on IMP_LOCK while the Python thread remains ATTACHED, preventing
stop_the_world from observing signals; wrap the blocking lock acquisition in the
Python-facing path with vm.allow_threads(...) so the thread detaches while
waiting — modify the acquire_lock function to call vm.allow_threads(||
acquire_lock_for_fork()) (or the equivalent closure/callback pattern used by
VirtualMachine) so the IMP_LOCK.lock() happens while threads are allowed,
ensuring check_signals() can run on the requester.
In `@crates/vm/src/stdlib/os.rs`:
- Around line 325-330: The write implementation in fn write (using
ArgBytesLike::with_ref and vm.allow_threads(|| crt_fd::write(fd, b))) should
loop and retry when the syscall returns an EINTR (io::ErrorKind::Interrupted)
similar to read/readinto: call crt_fd::write inside vm.allow_threads, if it
returns Err(e) with kind Interrupted then continue the loop and retry, otherwise
return the Ok or Err result; ensure you preserve the Borrowed fd and the bytes
slice across retries and break only on non-Interrupted errors or a successful
write.
In `@crates/vm/src/vm/thread.rs`:
- Around line 146-158: After the successful DETACHED->SUSPENDED CAS
(s.state.compare_exchange) you must re-check vm.state.stop_the_world.requested
to avoid parking on a cleared request: after Ok(_), reload requested with
Ordering::Acquire and if it's false, undo the suspend by CAS-ing s.state from
THREAD_SUSPENDED back to THREAD_DETACHED (or retry that revert loop) and skip
notify_suspended()/wait_while_suspended()/add_attach_wait_yields so the thread
self-recovers; apply the same pattern to the other identical suspend path (the
block around lines 205-220) so you never park when the STW request was cleared
between the initial load and the CAS.
---
Outside diff comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 1879-1885: The code currently calls vm.allow_threads(|| unsafe {
libc::waitpid(pid, &mut status, opt) }) and then checks nix::Error::last_raw(),
but errno can be overwritten when allow_threads reattaches the VM thread; fix by
capturing the raw errno inside the closure that calls libc::waitpid and return
it alongside the syscall result (e.g., return (res, err_no) from the
vm.allow_threads closure), then use that captured errno instead of calling
nix::Error::last_raw() after vm.allow_threads returns so EINTR is detected and
vm.check_signals() is called correctly; update the branch that compares to
libc::EINTR and the error return to use the captured errno and map it via
nix::Error::from_errno or into_pyexception as before.
---
Duplicate comments:
In `@crates/vm/src/stdlib/posix.rs`:
- Around line 948-950: py_os_before_fork(vm) may hide a failure/timeout from
StopTheWorldState::stop_the_world(), so do not call libc::fork()
unconditionally; change the call site in the posix fork path to inspect and
propagate the outcome of py_os_before_fork(vm) (or directly check
StopTheWorldState::stop_the_world()) and abort/return an error instead of
proceeding to unsafe { libc::fork() } when stop-the-world failed or timed out;
update py_os_before_fork(vm) to return a Result or bool indicating success if
needed, log the failure and return a Python error from the surrounding function
rather than performing fork when the stop-the-world step did not succeed.
In `@crates/vm/src/stdlib/time.rs`:
- Around line 120-122: The EINTR check reads errno after vm.allow_threads
reattaches the VM thread, allowing a different syscall to clobber errno; fix by
capturing the errno inside the closure that calls libc::nanosleep: call
vm.allow_threads(|| { let res = unsafe { libc::nanosleep(...)}; let err =
nix::Error::last_raw() (or read errno) inside the closure; return both res and
err (or set interrupted) so the outer code uses the captured err to compute
interrupted and then call vm.check_signals() accordingly, updating the handling
around res and interrupted variables in time.rs.
In `@crates/vm/src/vm/mod.rs`:
- Around line 305-321: The global stop flag self.requested is cleared too late
in start_the_world, allowing threads returning via attach_thread to see
requested == true and re-suspend; move the self.requested.store(false,
Ordering::Release) (and keep self.requester.store(0, Ordering::Relaxed) after)
to occur before iterating the registry so the wake loop sees requested==false;
ensure the loop still checks slot.state for THREAD_SUSPENDED and sets
THREAD_DETACHED and unpark() as before (symbols: start_the_world,
self.requested, self.requester, THREAD_SUSPENDED, THREAD_DETACHED,
attach_thread).
In `@crates/vm/src/vm/thread.rs`:
- Around line 299-304: The blind s.state.store(some) in the suspend-abort path
must be replaced with a guarded compare-and-swap loop: use
s.state.compare_exchange to change THREAD_DETACHED → THREAD_ATTACHED (with the
same Ordering semantics used elsewhere) and retry the CAS; if the CAS observes
THREAD_SUSPENDED (or any non-DETACHED that indicates a new STW request) then
abort/return without forcing ATTACHED. Apply the same CAS-and-retry reattach
flow to both the early-recovery branch around s.state.store(THREAD_ATTACHED) and
the later branch (lines ~315-320) so you don't overwrite a freshly suspended
slot; reference s.state, THREAD_DETACHED, THREAD_ATTACHED, THREAD_SUSPENDED, and
stw.requested and keep ordering Acquire/Release consistent.
---
Nitpick comments:
In `@crates/compiler-core/src/bytecode.rs`:
- Around line 658-688: Replace the manual begin/end pair with an RAII guard:
modify begin_descriptor_write to return a small DescriptorWriteGuard (holding
&self and the index) instead of unit, implement Drop for DescriptorWriteGuard to
perform the increment on self.descriptor_sequences[index] (using fetch_add with
Ordering::Release) so the sequence is always closed even on panic/early return,
and remove or make end_descriptor_write private/unused; keep the current locking
loop logic inside the new begin_descriptor_write before creating/returning the
guard so callers simply hold the guard for the write scope.
In `@crates/vm/src/frame.rs`:
- Around line 7053-7098: Extract the common publish protocol used by
write_cached_descriptor and write_cached_descriptor_with_metaclass into a single
helper (e.g., publish_descriptor_cache) that encapsulates
begin_descriptor_write(cache_base), the invalidate write_cache_u32(cache_base +
1, 0), the payload writes (write_cache_u32 for metaclass when present,
write_cache_ptr for descr_ptr), the final write_cache_u32(cache_base + 1,
type_version), and end_descriptor_write(cache_base); modify
write_cached_descriptor and write_cached_descriptor_with_metaclass to compute
their metaclass_version (Option<u32>) and descr_ptr values and then call this
helper so the begin/invalidate/payload/publish/end sequence is implemented once
and both functions delegate to it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: ad49f99a-3dcf-45ac-822f-069c28ac27fb
📒 Files selected for processing (14)
.cspell.dict/cpython.txt.cspell.jsoncrates/common/src/lock.rscrates/compiler-core/src/bytecode.rscrates/vm/src/frame.rscrates/vm/src/object/core.rscrates/vm/src/stdlib/imp.rscrates/vm/src/stdlib/os.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/thread.rscrates/vm/src/stdlib/time.rscrates/vm/src/vm/interpreter.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/thread.rs
💤 Files with no reviewable changes (1)
- .cspell.json
🚧 Files skipped from review as they are similar to previous changes (1)
- .cspell.dict/cpython.txt
| #[pyfunction] | ||
| fn acquire_lock(_vm: &VirtualMachine) { | ||
| IMP_LOCK.lock() | ||
| acquire_lock_for_fork() | ||
| } |
There was a problem hiding this comment.
Detach while waiting on IMP_LOCK.
acquire_lock() can now block in IMP_LOCK.lock() while the thread stays ATTACHED. That gives stop_the_world() the same failure mode this PR is fixing elsewhere: the requester can wait forever on a thread parked inside the import lock instead of reaching check_signals(). The Python-facing path should use vm.allow_threads(...) around the blocking acquire.
Suggested fix
#[pyfunction]
-fn acquire_lock(_vm: &VirtualMachine) {
- acquire_lock_for_fork()
+fn acquire_lock(vm: &VirtualMachine) {
+ vm.allow_threads(acquire_lock_for_fork)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[pyfunction] | |
| fn acquire_lock(_vm: &VirtualMachine) { | |
| IMP_LOCK.lock() | |
| acquire_lock_for_fork() | |
| } | |
| #[pyfunction] | |
| fn acquire_lock(vm: &VirtualMachine) { | |
| vm.allow_threads(acquire_lock_for_fork) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/stdlib/imp.rs` around lines 17 - 20, acquire_lock currently
calls acquire_lock_for_fork and may block on IMP_LOCK while the Python thread
remains ATTACHED, preventing stop_the_world from observing signals; wrap the
blocking lock acquisition in the Python-facing path with vm.allow_threads(...)
so the thread detaches while waiting — modify the acquire_lock function to call
vm.allow_threads(|| acquire_lock_for_fork()) (or the equivalent closure/callback
pattern used by VirtualMachine) so the IMP_LOCK.lock() happens while threads are
allowed, ensuring check_signals() can run on the requester.
| fn write( | ||
| fd: crt_fd::Borrowed<'_>, | ||
| data: ArgBytesLike, | ||
| vm: &VirtualMachine, | ||
| ) -> io::Result<usize> { | ||
| data.with_ref(|b| vm.allow_threads(|| crt_fd::write(fd, b))) |
There was a problem hiding this comment.
Retry os.write() on EINTR after releasing the VM thread.
read() and readinto() already loop on EINTR, but write() now returns the interrupted syscall directly. That regresses signal behavior for a changed blocking path.
Suggested fix
- fn write(
- fd: crt_fd::Borrowed<'_>,
- data: ArgBytesLike,
- vm: &VirtualMachine,
- ) -> io::Result<usize> {
- data.with_ref(|b| vm.allow_threads(|| crt_fd::write(fd, b)))
+ fn write(
+ fd: crt_fd::Borrowed<'_>,
+ data: ArgBytesLike,
+ vm: &VirtualMachine,
+ ) -> PyResult<usize> {
+ data.with_ref(|b| loop {
+ match vm.allow_threads(|| crt_fd::write(fd, b)) {
+ Ok(n) => return Ok(n),
+ Err(e) if e.raw_os_error() == Some(libc::EINTR) => {
+ vm.check_signals()?;
+ continue;
+ }
+ Err(e) => return Err(e.into_pyexception(vm)),
+ }
+ })
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn write( | |
| fd: crt_fd::Borrowed<'_>, | |
| data: ArgBytesLike, | |
| vm: &VirtualMachine, | |
| ) -> io::Result<usize> { | |
| data.with_ref(|b| vm.allow_threads(|| crt_fd::write(fd, b))) | |
| fn write( | |
| fd: crt_fd::Borrowed<'_>, | |
| data: ArgBytesLike, | |
| vm: &VirtualMachine, | |
| ) -> PyResult<usize> { | |
| data.with_ref(|b| loop { | |
| match vm.allow_threads(|| crt_fd::write(fd, b)) { | |
| Ok(n) => return Ok(n), | |
| Err(e) if e.raw_os_error() == Some(libc::EINTR) => { | |
| vm.check_signals()?; | |
| continue; | |
| } | |
| Err(e) => return Err(e.into_pyexception(vm)), | |
| } | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/stdlib/os.rs` around lines 325 - 330, The write implementation
in fn write (using ArgBytesLike::with_ref and vm.allow_threads(||
crt_fd::write(fd, b))) should loop and retry when the syscall returns an EINTR
(io::ErrorKind::Interrupted) similar to read/readinto: call crt_fd::write inside
vm.allow_threads, if it returns Err(e) with kind Interrupted then continue the
loop and retry, otherwise return the Ok or Err result; ensure you preserve the
Borrowed fd and the bytes slice across retries and break only on non-Interrupted
errors or a successful write.
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_fork1.py dependencies: dependent tests: (no tests depend on fork1) [x] lib: cpython/Lib/os.py dependencies:
dependent tests: (169 tests)
Legend:
|
Add stop-the-world thread suspension around fork() to prevent
deadlocks from locks held by dead parent threads in the child.
- Thread states: DETACHED / ATTACHED / SUSPENDED with atomic CAS
transitions matching _PyThreadState_{Attach,Detach,Suspend}
- stop_the_world / start_the_world: park all non-requester threads
before fork, resume after (parent) or reset (child)
- allow_threads (Py_BEGIN/END_ALLOW_THREADS): detach around blocking
syscalls (os.read/write, waitpid, Lock.acquire, time.sleep) so
stop_the_world can force-park via CAS
- Acquire/release import lock around fork lifecycle
- zero_reinit_after_fork: generic lock reset for parking_lot types
- gc_clear_raw: detach dict instead of clearing entries
- Lock-free double-check for descriptor cache reads (no read-side
seqlock); write-side seqlock retained for writer serialization
- fork() returns PyResult, checks PythonFinalizationError, calls
sys.audit
* Suspend Python threads before fork()
Add stop-the-world thread suspension around fork() to prevent
deadlocks from locks held by dead parent threads in the child.
- Thread states: DETACHED / ATTACHED / SUSPENDED with atomic CAS
transitions matching _PyThreadState_{Attach,Detach,Suspend}
- stop_the_world / start_the_world: park all non-requester threads
before fork, resume after (parent) or reset (child)
- allow_threads (Py_BEGIN/END_ALLOW_THREADS): detach around blocking
syscalls (os.read/write, waitpid, Lock.acquire, time.sleep) so
stop_the_world can force-park via CAS
- Acquire/release import lock around fork lifecycle
- zero_reinit_after_fork: generic lock reset for parking_lot types
- gc_clear_raw: detach dict instead of clearing entries
- Lock-free double-check for descriptor cache reads (no read-side
seqlock); write-side seqlock retained for writer serialization
- fork() returns PyResult, checks PythonFinalizationError, calls
sys.audit
Replace transmute+AtomicCell::swap with ptr::write_bytes(0) for resetting parking_lot mutexes in _at_fork_reinit methods. Simpler and avoids relying on AtomicCell's internal layout matching the mutex type.
Summary by CodeRabbit
Bug Fixes
New Features