Skip to content

reinit-after-fork#7321

Merged
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:force-unlock
Mar 3, 2026
Merged

reinit-after-fork#7321
youknowone merged 2 commits intoRustPython:mainfrom
youknowone:force-unlock

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 3, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved post-fork lock state handling across VM components, replacing manual unlock patterns with centralized reinitialization logic for greater reliability.
  • Refactor

    • Unified fork-handling mechanism with new reinitialization helpers for internal synchronization primitives.
  • Documentation

    • Updated safety notes and control flow descriptions for post-fork operations.

Use lock_api's Mutex::raw() to access the underlying RawMutex
instead of casting &PyMutex<T> directly. This avoids layout
assumptions about lock_api::Mutex<R, T> field ordering.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

This PR refactors fork-after-fork lock handling by replacing force-unlock semantics with centralized reinitialization helpers. It introduces reinit_mutex_after_fork and reinit_rwlock_after_fork in crates/common/src/lock.rs, updates multiple VM state modules to use these helpers instead of manual unlocking, reorganizes after-fork callback phases in POSIX initialization, and changes feature gates from host_env to threading for consistency.

Changes

Cohort / File(s) Summary
Lock reinitialization helpers
crates/common/src/lock.rs
Adds reinit_mutex_after_fork and introduces new reinit_rwlock_after_fork function to reset locks by zeroing underlying raw lock bytes instead of direct memory layout manipulation.
VM state management
crates/vm/src/codecs.rs, crates/vm/src/gc_state.rs, crates/vm/src/intern.rs
Renames force_unlock_after_fork to reinit_after_fork, changes cfg gate from host_env to threading, and delegates to centralized reinit_mutex_after_fork and reinit_rwlock_after_fork helpers. GcState reinitializes all tracked mutexes and rwlocks; GcGeneration reinitializes stats.
After-fork callback phases
crates/vm/src/stdlib/posix.rs
Introduces Phase 1-4 after-fork handling: Phase 1 calls reinit_locks_after_fork(vm) to reset all internal locks before Python code runs; Phase 2 resets atomic state; Phase 3 moves thread-state cleanup; Phase 4 runs Python-level callbacks. Replaces previous force-unlock approach with centralized reinit logic.
Thread lock reinitialization
crates/vm/src/stdlib/thread.rs
Replaces force-unlock paths with explicit reinitialization of per-handle and per-lock state, including new reinit_parking_lot_mutex helper. Updates thread handle and shutdown handle cleanup to account for reinitialized locks.
VM frame slot reset
crates/vm/src/vm/thread.rs
Changes from try_lock() with fallback force_unlock() to direct lock() call, relying on prior reinit_locks_after_fork() to have reinitialized locks. Adds clarifying comments about lock state preconditions.

Sequence Diagram(s)

sequenceDiagram
    participant Parent as Parent Process
    participant Fork as fork() syscall
    participant Child as Child Process
    participant Phase1 as Phase 1: Reinit
    participant Phase2 as Phase 2: Atomic State
    participant Phase3 as Phase 3: Thread State
    participant Phase4 as Phase 4: Python Callbacks

    Parent->>Fork: fork()
    Fork->>Child: Child created (threads stopped)
    
    Child->>Phase1: reinit_locks_after_fork(vm)
    Phase1->>Phase1: reinit_mutex_after_fork (collecting, garbage, callbacks)
    Phase1->>Phase1: reinit_rwlock_after_fork (generation_objects, trackers)
    Phase1-->>Child: All locks reset
    
    Child->>Phase2: Reset atomic state (signal handling)
    Phase2-->>Child: Low-level state reset
    
    Child->>Phase3: after_fork_child cleanup
    Phase3->>Phase3: Acquire reinitialized locks
    Phase3->>Phase3: Update thread state
    Phase3-->>Child: Thread state cleaned up
    
    Child->>Phase4: Execute Python at-fork callbacks
    Phase4->>Phase4: run_after_forkers_child()
    Phase4-->>Child: Callbacks complete
    
    Child-->>Parent: Fork handling done
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 Forks fork clean and new,
Locks reboot with morning dew,
No more force-unlocks in sight,
Reinit magic sets things right! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'reinit-after-fork' clearly and specifically describes the main change: refactoring post-fork lock handling from force-unlock to reinitialization semantics across multiple modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Replace all force_unlock() + try_lock() patterns with zero-based
reinit that bypasses parking_lot internals entirely. After fork(),
the child is single-threaded so reinited locks won't contend.

Add reinit_rwlock_after_fork to common::lock alongside the existing
reinit_mutex_after_fork. Replace force_unlock_after_fork methods in
codecs, intern, and gc_state with reinit_after_fork equivalents.

This fixes after_fork_child silently dropping thread handles when
try_lock() failed on per-handle Arc<Mutex> locks.
@youknowone youknowone marked this pull request as ready for review March 3, 2026 01:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 931-946: Early-return branches inside the thread handle cleanup
are returning before the done_event mutex is reinitialized, which can leave a
stale lock and cause later deadlocks; update the branches that check
inner_guard.ident == current_ident and inner_guard.state ==
ThreadHandleState::NotStarted (and the analogous branch around the block
referenced by the reviewer) to reinitialize and set the done_event before
returning: call reinit_parking_lot_mutex on the done_event lock, set
*lock.lock() = true and cvar.notify_all() (using the same done_event tuple)
prior to the return, ensuring the done_event is always reset whether you take
the early-return or the normal cleanup path; look for symbols done_event,
reinit_parking_lot_mutex, inner_guard, ThreadHandleState::NotStarted to apply
the fix.

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43e2df3 and 3e8d703.

📒 Files selected for processing (7)
  • crates/common/src/lock.rs
  • crates/vm/src/codecs.rs
  • crates/vm/src/gc_state.rs
  • crates/vm/src/intern.rs
  • crates/vm/src/stdlib/posix.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/vm/thread.rs

Comment on lines 931 to +946
if inner_guard.ident == current_ident {
return true;
}
if inner_guard.state == ThreadHandleState::NotStarted {
return true;
}

// Mark as done and notify waiters
inner_guard.state = ThreadHandleState::Done;
inner_guard.join_handle = None; // Can't join OS thread from child
inner_guard.join_handle = None;
drop(inner_guard);

// Try to notify waiters - skip if we can't acquire the lock
// Reinit and set the done event
let (lock, cvar) = &*done_event;
if let Some(mut done) = lock.try_lock() {
*done = true;
cvar.notify_all();
}
reinit_parking_lot_mutex(lock);
*lock.lock() = true;
cvar.notify_all();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reinitialize done_event mutexes before early-return branches.

Current/not-started entries are retained before done-event mutex reset, so a stale lock state from dead parent threads can survive and later deadlock join/shutdown paths.

🔧 Proposed fix
@@
-                if inner_guard.ident == current_ident {
+                let (done_lock, _) = &*done_event;
+                reinit_parking_lot_mutex(done_lock);
+
+                if inner_guard.ident == current_ident {
                     return true;
                 }
                 if inner_guard.state == ThreadHandleState::NotStarted {
                     return true;
                 }
@@
-                let (lock, cvar) = &*done_event;
-                reinit_parking_lot_mutex(lock);
+                let (lock, cvar) = &*done_event;
                 *lock.lock() = true;
                 cvar.notify_all();
@@
-                if inner_guard.ident == current_ident {
+                let (done_lock, _) = &*done_event;
+                reinit_parking_lot_mutex(done_lock);
+
+                if inner_guard.ident == current_ident {
                     return true;
                 }
                 if inner_guard.state == ThreadHandleState::NotStarted {
                     return true;
                 }
@@
-                let (lock, cvar) = &*done_event;
-                reinit_parking_lot_mutex(lock);
+                let (lock, cvar) = &*done_event;
                 *lock.lock() = true;
                 cvar.notify_all();

Also applies to: 966-979

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/thread.rs` around lines 931 - 946, Early-return branches
inside the thread handle cleanup are returning before the done_event mutex is
reinitialized, which can leave a stale lock and cause later deadlocks; update
the branches that check inner_guard.ident == current_ident and inner_guard.state
== ThreadHandleState::NotStarted (and the analogous branch around the block
referenced by the reviewer) to reinitialize and set the done_event before
returning: call reinit_parking_lot_mutex on the done_event lock, set
*lock.lock() = true and cvar.notify_all() (using the same done_event tuple)
prior to the return, ensuring the done_event is always reset whether you take
the early-return or the normal cleanup path; look for symbols done_event,
reinit_parking_lot_mutex, inner_guard, ThreadHandleState::NotStarted to apply
the fix.

@youknowone youknowone merged commit 7eb1821 into RustPython:main Mar 3, 2026
13 checks passed
@youknowone youknowone deleted the force-unlock branch March 3, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant