reinit-after-fork by youknowone · Pull Request #7321 · RustPython/RustPython

youknowone · 2026-03-03T00:28:17Z

Summary by CodeRabbit

Bug Fixes
- Improved post-fork lock state handling across VM components, replacing manual unlock patterns with centralized reinitialization logic for greater reliability.
Refactor
- Unified fork-handling mechanism with new reinitialization helpers for internal synchronization primitives.
Documentation
- Updated safety notes and control flow descriptions for post-fork operations.

Use lock_api's Mutex::raw() to access the underlying RawMutex instead of casting &PyMutex<T> directly. This avoids layout assumptions about lock_api::Mutex<R, T> field ordering.

coderabbitai · 2026-03-03T00:28:26Z

📝 Walkthrough

Walkthrough

This PR refactors fork-after-fork lock handling by replacing force-unlock semantics with centralized reinitialization helpers. It introduces reinit_mutex_after_fork and reinit_rwlock_after_fork in crates/common/src/lock.rs, updates multiple VM state modules to use these helpers instead of manual unlocking, reorganizes after-fork callback phases in POSIX initialization, and changes feature gates from host_env to threading for consistency.

Changes

Cohort / File(s)	Summary
Lock reinitialization helpers `crates/common/src/lock.rs`	Adds `reinit_mutex_after_fork` and introduces new `reinit_rwlock_after_fork` function to reset locks by zeroing underlying raw lock bytes instead of direct memory layout manipulation.
VM state management `crates/vm/src/codecs.rs`, `crates/vm/src/gc_state.rs`, `crates/vm/src/intern.rs`	Renames `force_unlock_after_fork` to `reinit_after_fork`, changes cfg gate from `host_env` to `threading`, and delegates to centralized `reinit_mutex_after_fork` and `reinit_rwlock_after_fork` helpers. GcState reinitializes all tracked mutexes and rwlocks; GcGeneration reinitializes stats.
After-fork callback phases `crates/vm/src/stdlib/posix.rs`	Introduces Phase 1-4 after-fork handling: Phase 1 calls `reinit_locks_after_fork(vm)` to reset all internal locks before Python code runs; Phase 2 resets atomic state; Phase 3 moves thread-state cleanup; Phase 4 runs Python-level callbacks. Replaces previous force-unlock approach with centralized reinit logic.
Thread lock reinitialization `crates/vm/src/stdlib/thread.rs`	Replaces force-unlock paths with explicit reinitialization of per-handle and per-lock state, including new `reinit_parking_lot_mutex` helper. Updates thread handle and shutdown handle cleanup to account for reinitialized locks.
VM frame slot reset `crates/vm/src/vm/thread.rs`	Changes from `try_lock()` with fallback `force_unlock()` to direct `lock()` call, relying on prior `reinit_locks_after_fork()` to have reinitialized locks. Adds clarifying comments about lock state preconditions.

Sequence Diagram(s)

sequenceDiagram
    participant Parent as Parent Process
    participant Fork as fork() syscall
    participant Child as Child Process
    participant Phase1 as Phase 1: Reinit
    participant Phase2 as Phase 2: Atomic State
    participant Phase3 as Phase 3: Thread State
    participant Phase4 as Phase 4: Python Callbacks

    Parent->>Fork: fork()
    Fork->>Child: Child created (threads stopped)
    
    Child->>Phase1: reinit_locks_after_fork(vm)
    Phase1->>Phase1: reinit_mutex_after_fork (collecting, garbage, callbacks)
    Phase1->>Phase1: reinit_rwlock_after_fork (generation_objects, trackers)
    Phase1-->>Child: All locks reset
    
    Child->>Phase2: Reset atomic state (signal handling)
    Phase2-->>Child: Low-level state reset
    
    Child->>Phase3: after_fork_child cleanup
    Phase3->>Phase3: Acquire reinitialized locks
    Phase3->>Phase3: Update thread state
    Phase3-->>Child: Thread state cleaned up
    
    Child->>Phase4: Execute Python at-fork callbacks
    Phase4->>Phase4: run_after_forkers_child()
    Phase4-->>Child: Callbacks complete
    
    Child-->>Parent: Fork handling done

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Fix deadlock after fork #7205: This PR replaces the earlier force-unlock-after-fork changes from that PR by introducing centralized reinit helpers and rewiring all callers to use the new reinit_* functions.
Fix _at_fork_reinit to write INIT directly instead of calling unlock() #7312: Both PRs modify fork-after-fork lock-reset codepaths by adding low-level reinit_* helpers in common/lock.rs and switching from force-unlock to reinit semantics across multiple modules.
Use try_lock in py_os_after_fork_child #7178: Both PRs modify py_os_after_fork_child in posix.rs and reorganize after-fork callback handling, with this PR introducing centralized lock reinitialization before callbacks.

Poem

🐰 Forks fork clean and new,
Locks reboot with morning dew,
No more force-unlocks in sight,
Reinit magic sets things right! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'reinit-after-fork' clearly and specifically describes the main change: refactoring post-fork lock handling from force-unlock to reinitialization semantics across multiple modules.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Replace all force_unlock() + try_lock() patterns with zero-based reinit that bypasses parking_lot internals entirely. After fork(), the child is single-threaded so reinited locks won't contend. Add reinit_rwlock_after_fork to common::lock alongside the existing reinit_mutex_after_fork. Replace force_unlock_after_fork methods in codecs, intern, and gc_state with reinit_after_fork equivalents. This fixes after_fork_child silently dropping thread handles when try_lock() failed on per-handle Arc<Mutex> locks.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 931-946: Early-return branches inside the thread handle cleanup
are returning before the done_event mutex is reinitialized, which can leave a
stale lock and cause later deadlocks; update the branches that check
inner_guard.ident == current_ident and inner_guard.state ==
ThreadHandleState::NotStarted (and the analogous branch around the block
referenced by the reviewer) to reinitialize and set the done_event before
returning: call reinit_parking_lot_mutex on the done_event lock, set
*lock.lock() = true and cvar.notify_all() (using the same done_event tuple)
prior to the return, ensuring the done_event is always reset whether you take
the early-return or the normal cleanup path; look for symbols done_event,
reinit_parking_lot_mutex, inner_guard, ThreadHandleState::NotStarted to apply
the fix.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43e2df3 and 3e8d703.

📒 Files selected for processing (7)

crates/common/src/lock.rs
crates/vm/src/codecs.rs
crates/vm/src/gc_state.rs
crates/vm/src/intern.rs
crates/vm/src/stdlib/posix.rs
crates/vm/src/stdlib/thread.rs
crates/vm/src/vm/thread.rs

coderabbitai · 2026-03-03T01:26:42Z

crates/vm/src/stdlib/thread.rs

                if inner_guard.ident == current_ident {
                    return true;
                }
                if inner_guard.state == ThreadHandleState::NotStarted {
                    return true;
                }

-                // Mark as done and notify waiters
                inner_guard.state = ThreadHandleState::Done;
-                inner_guard.join_handle = None; // Can't join OS thread from child
+                inner_guard.join_handle = None;
                drop(inner_guard);

-                // Try to notify waiters - skip if we can't acquire the lock
+                // Reinit and set the done event
                let (lock, cvar) = &*done_event;
-                if let Some(mut done) = lock.try_lock() {
-                    *done = true;
-                    cvar.notify_all();
-                }
+                reinit_parking_lot_mutex(lock);
+                *lock.lock() = true;
+                cvar.notify_all();


⚠️ Potential issue | 🟠 Major

Reinitialize done_event mutexes before early-return branches.

Current/not-started entries are retained before done-event mutex reset, so a stale lock state from dead parent threads can survive and later deadlock join/shutdown paths.

🔧 Proposed fix

@@ - if inner_guard.ident == current_ident { + let (done_lock, _) = &*done_event; + reinit_parking_lot_mutex(done_lock); + + if inner_guard.ident == current_ident { return true; } if inner_guard.state == ThreadHandleState::NotStarted { return true; } @@ - let (lock, cvar) = &*done_event; - reinit_parking_lot_mutex(lock); + let (lock, cvar) = &*done_event; *lock.lock() = true; cvar.notify_all(); @@ - if inner_guard.ident == current_ident { + let (done_lock, _) = &*done_event; + reinit_parking_lot_mutex(done_lock); + + if inner_guard.ident == current_ident { return true; } if inner_guard.state == ThreadHandleState::NotStarted { return true; } @@ - let (lock, cvar) = &*done_event; - reinit_parking_lot_mutex(lock); + let (lock, cvar) = &*done_event; *lock.lock() = true; cvar.notify_all();

Also applies to: 966-979

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/vm/src/stdlib/thread.rs` around lines 931 - 946, Early-return branches inside the thread handle cleanup are returning before the done_event mutex is reinitialized, which can leave a stale lock and cause later deadlocks; update the branches that check inner_guard.ident == current_ident and inner_guard.state == ThreadHandleState::NotStarted (and the analogous branch around the block referenced by the reviewer) to reinitialize and set the done_event before returning: call reinit_parking_lot_mutex on the done_event lock, set *lock.lock() = true and cvar.notify_all() (using the same done_event tuple) prior to the return, ensuring the done_event is always reset whether you take the early-return or the normal cleanup path; look for symbols done_event, reinit_parking_lot_mutex, inner_guard, ThreadHandleState::NotStarted to apply the fix.

Use Mutex::raw() accessor in reinit_mutex_after_fork

ce2be2a

Use lock_api's Mutex::raw() to access the underlying RawMutex instead of casting &PyMutex<T> directly. This avoids layout assumptions about lock_api::Mutex<R, T> field ordering.

youknowone force-pushed the force-unlock branch from bf941da to 3e8d703 Compare March 3, 2026 00:38

youknowone marked this pull request as ready for review March 3, 2026 01:16

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

youknowone merged commit 7eb1821 into RustPython:main Mar 3, 2026
13 checks passed

youknowone deleted the force-unlock branch March 3, 2026 02:25

This was referenced Mar 4, 2026

Reinit IO buffer locks after fork to prevent deadlocks #7339

Merged

Suspend Python threads before fork() #7364

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reinit-after-fork#7321

reinit-after-fork#7321
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:force-unlock

youknowone commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

youknowone commented Mar 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youknowone commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading