Conversation
Use lock_api's Mutex::raw() to access the underlying RawMutex instead of casting &PyMutex<T> directly. This avoids layout assumptions about lock_api::Mutex<R, T> field ordering.
📝 WalkthroughWalkthroughThis PR refactors fork-after-fork lock handling by replacing force-unlock semantics with centralized reinitialization helpers. It introduces Changes
Sequence Diagram(s)sequenceDiagram
participant Parent as Parent Process
participant Fork as fork() syscall
participant Child as Child Process
participant Phase1 as Phase 1: Reinit
participant Phase2 as Phase 2: Atomic State
participant Phase3 as Phase 3: Thread State
participant Phase4 as Phase 4: Python Callbacks
Parent->>Fork: fork()
Fork->>Child: Child created (threads stopped)
Child->>Phase1: reinit_locks_after_fork(vm)
Phase1->>Phase1: reinit_mutex_after_fork (collecting, garbage, callbacks)
Phase1->>Phase1: reinit_rwlock_after_fork (generation_objects, trackers)
Phase1-->>Child: All locks reset
Child->>Phase2: Reset atomic state (signal handling)
Phase2-->>Child: Low-level state reset
Child->>Phase3: after_fork_child cleanup
Phase3->>Phase3: Acquire reinitialized locks
Phase3->>Phase3: Update thread state
Phase3-->>Child: Thread state cleaned up
Child->>Phase4: Execute Python at-fork callbacks
Phase4->>Phase4: run_after_forkers_child()
Phase4-->>Child: Callbacks complete
Child-->>Parent: Fork handling done
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Replace all force_unlock() + try_lock() patterns with zero-based reinit that bypasses parking_lot internals entirely. After fork(), the child is single-threaded so reinited locks won't contend. Add reinit_rwlock_after_fork to common::lock alongside the existing reinit_mutex_after_fork. Replace force_unlock_after_fork methods in codecs, intern, and gc_state with reinit_after_fork equivalents. This fixes after_fork_child silently dropping thread handles when try_lock() failed on per-handle Arc<Mutex> locks.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/stdlib/thread.rs`:
- Around line 931-946: Early-return branches inside the thread handle cleanup
are returning before the done_event mutex is reinitialized, which can leave a
stale lock and cause later deadlocks; update the branches that check
inner_guard.ident == current_ident and inner_guard.state ==
ThreadHandleState::NotStarted (and the analogous branch around the block
referenced by the reviewer) to reinitialize and set the done_event before
returning: call reinit_parking_lot_mutex on the done_event lock, set
*lock.lock() = true and cvar.notify_all() (using the same done_event tuple)
prior to the return, ensuring the done_event is always reset whether you take
the early-return or the normal cleanup path; look for symbols done_event,
reinit_parking_lot_mutex, inner_guard, ThreadHandleState::NotStarted to apply
the fix.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
crates/common/src/lock.rscrates/vm/src/codecs.rscrates/vm/src/gc_state.rscrates/vm/src/intern.rscrates/vm/src/stdlib/posix.rscrates/vm/src/stdlib/thread.rscrates/vm/src/vm/thread.rs
| if inner_guard.ident == current_ident { | ||
| return true; | ||
| } | ||
| if inner_guard.state == ThreadHandleState::NotStarted { | ||
| return true; | ||
| } | ||
|
|
||
| // Mark as done and notify waiters | ||
| inner_guard.state = ThreadHandleState::Done; | ||
| inner_guard.join_handle = None; // Can't join OS thread from child | ||
| inner_guard.join_handle = None; | ||
| drop(inner_guard); | ||
|
|
||
| // Try to notify waiters - skip if we can't acquire the lock | ||
| // Reinit and set the done event | ||
| let (lock, cvar) = &*done_event; | ||
| if let Some(mut done) = lock.try_lock() { | ||
| *done = true; | ||
| cvar.notify_all(); | ||
| } | ||
| reinit_parking_lot_mutex(lock); | ||
| *lock.lock() = true; | ||
| cvar.notify_all(); |
There was a problem hiding this comment.
Reinitialize done_event mutexes before early-return branches.
Current/not-started entries are retained before done-event mutex reset, so a stale lock state from dead parent threads can survive and later deadlock join/shutdown paths.
🔧 Proposed fix
@@
- if inner_guard.ident == current_ident {
+ let (done_lock, _) = &*done_event;
+ reinit_parking_lot_mutex(done_lock);
+
+ if inner_guard.ident == current_ident {
return true;
}
if inner_guard.state == ThreadHandleState::NotStarted {
return true;
}
@@
- let (lock, cvar) = &*done_event;
- reinit_parking_lot_mutex(lock);
+ let (lock, cvar) = &*done_event;
*lock.lock() = true;
cvar.notify_all();
@@
- if inner_guard.ident == current_ident {
+ let (done_lock, _) = &*done_event;
+ reinit_parking_lot_mutex(done_lock);
+
+ if inner_guard.ident == current_ident {
return true;
}
if inner_guard.state == ThreadHandleState::NotStarted {
return true;
}
@@
- let (lock, cvar) = &*done_event;
- reinit_parking_lot_mutex(lock);
+ let (lock, cvar) = &*done_event;
*lock.lock() = true;
cvar.notify_all();Also applies to: 966-979
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/stdlib/thread.rs` around lines 931 - 946, Early-return branches
inside the thread handle cleanup are returning before the done_event mutex is
reinitialized, which can leave a stale lock and cause later deadlocks; update
the branches that check inner_guard.ident == current_ident and inner_guard.state
== ThreadHandleState::NotStarted (and the analogous branch around the block
referenced by the reviewer) to reinitialize and set the done_event before
returning: call reinit_parking_lot_mutex on the done_event lock, set
*lock.lock() = true and cvar.notify_all() (using the same done_event tuple)
prior to the return, ensuring the done_event is always reset whether you take
the early-return or the normal cleanup path; look for symbols done_event,
reinit_parking_lot_mutex, inner_guard, ThreadHandleState::NotStarted to apply
the fix.
Summary by CodeRabbit
Bug Fixes
Refactor
Documentation