Skip to content

Fix deadlock after fork#7205

Merged
youknowone merged 1 commit intoRustPython:mainfrom
youknowone:fork-deadlock
Feb 23, 2026
Merged

Fix deadlock after fork#7205
youknowone merged 1 commit intoRustPython:mainfrom
youknowone:fork-deadlock

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Feb 23, 2026

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced fork safety to prevent potential deadlocks and lock contention in child processes following multiprocessing operations.
  • Chores

    • Improved internal synchronization to better integrate with Python's threading model for increased reliability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

Replaces standard synchronization primitives with Python-threaded equivalents in the GC state structures and adds fork-safety utilities to forcibly release locks in child processes after fork operations, preventing potential deadlocks.

Changes

Cohort / File(s) Summary
GC State Fork-Safety Implementation
crates/vm/src/gc_state.rs
Replaces Mutex and RwLock with PyMutex and PyRwLock across GC state structures; adds force_unlock_after_fork() and force_unlock_stats_after_fork() methods for Unix; introduces helper function to forcibly unlock PyRwLock variants; updates object tracking, untracking, finalization, and resurrection logic to use new locking primitives.
Post-Fork Cleanup Integration
crates/vm/src/stdlib/posix.rs
Integrates GC state fork-safety by calling force_unlock_after_fork() in post-fork child process cleanup routine.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 With forks that split and locks that bind,
We force them free, no deadlock kind!
PyRwLocks dance where mutexes stood,
The garbage collected—all is good!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix deadlock after fork' directly corresponds to the main objective of the changeset, which addresses deadlock prevention through fork-safety utilities and lock management in the GC state.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review February 23, 2026 07:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

ℹ️ Review info

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccfa938 and 45e47ad.

📒 Files selected for processing (2)
  • crates/vm/src/gc_state.rs
  • crates/vm/src/stdlib/posix.rs
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm/src/gc_state.rs`:
- Around line 793-856: The code incorrectly uses parking_lot's
force_unlock_read/force_unlock_write (and force_unlock) in
force_unlock_after_fork and force_unlock_rwlock_after_fork to recover locks held
by other threads after fork, which is undefined behavior; remove any calls to
force_unlock_read/force_unlock_write/force_unlock and stop trying to forcibly
recover locks from dead threads. Instead, change force_unlock_after_fork and
force_unlock_rwlock_after_fork to not perform unsafe forcible unlocks: either
(A) make them a no-op that logs/returns an Err indicating fork-without-exec is
unsupported, or (B) abort/panic with a clear message instructing callers to use
pthread_atfork handlers or ensure fork is followed by exec; also update any
callers to handle the error or avoid calling these functions after fork. Ensure
references: update function force_unlock_after_fork, helper
force_unlock_rwlock_after_fork, and remove usages of
PyRwLock::force_unlock_read, PyRwLock::force_unlock_write, and
Mutex::force_unlock in this module.

Comment on lines +793 to 856

/// Force-unlock all locks after fork() in the child process.
///
/// After fork(), only the forking thread survives. Any lock held by another
/// thread is permanently stuck. This method releases all such stuck locks.
///
/// # Safety
/// Must only be called after fork() in the child process when no other
/// threads exist. The calling thread must NOT hold any of these locks.
#[cfg(unix)]
pub unsafe fn force_unlock_after_fork(&self) {
// Force-unlock the collecting mutex
if self.collecting.try_lock().is_none() {
unsafe { self.collecting.force_unlock() };
}

// Force-unlock garbage and callbacks mutexes
if self.garbage.try_lock().is_none() {
unsafe { self.garbage.force_unlock() };
}
if self.callbacks.try_lock().is_none() {
unsafe { self.callbacks.force_unlock() };
}

// Force-unlock generation stats mutexes
for generation in &self.generations {
unsafe { generation.force_unlock_stats_after_fork() };
}
unsafe { self.permanent.force_unlock_stats_after_fork() };

// Force-unlock RwLocks
for rw in &self.generation_objects {
unsafe { force_unlock_rwlock_after_fork(rw) };
}
unsafe { force_unlock_rwlock_after_fork(&self.permanent_objects) };
unsafe { force_unlock_rwlock_after_fork(&self.tracked_objects) };
unsafe { force_unlock_rwlock_after_fork(&self.finalized_objects) };
}
}

/// Force-unlock a PyRwLock after fork() in the child process.
///
/// # Safety
/// Must only be called after fork() in the child process when no other
/// threads exist. The calling thread must NOT hold this lock.
#[cfg(unix)]
unsafe fn force_unlock_rwlock_after_fork<T>(lock: &PyRwLock<T>) {
if lock.try_write().is_some() {
return;
}
let is_shared = lock.try_read().is_some();
if is_shared {
loop {
// SAFETY: Lock is shared-locked by dead thread(s).
unsafe { lock.force_unlock_read() };
if lock.try_write().is_some() {
return;
}
self.generations[2].count.fetch_add(count, Ordering::SeqCst);
}
} else {
// SAFETY: Lock is exclusively locked by a dead thread.
unsafe { lock.force_unlock_write() };
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -nP 'fn force_unlock_read|fn force_unlock_write' --type=rust -C3

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

rg -nP 'type PyRwLock|struct PyRwLock|impl.*PyRwLock' --type=rust -C5

Repository: RustPython/RustPython

Length of output: 2107


🏁 Script executed:

rg -nP 'type RawRwLock|struct RawRwLock|impl.*RawRwLock|fn force_unlock' --type=rust -C3

Repository: RustPython/RustPython

Length of output: 5373


🏁 Script executed:

rg -nP 'force_unlock_read|force_unlock_write' --type=rust -B2 -A2

Repository: RustPython/RustPython

Length of output: 2026


🏁 Script executed:

rg -nP 'fn force_unlock_read|fn force_unlock_write' --type=rust -B5 -A5

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

rg -nP 'parking_lot|force_unlock' crates/common/src/lock.rs -B3 -A3

Repository: RustPython/RustPython

Length of output: 306


🏁 Script executed:

rg -nP 'trait.*force_unlock|extension|impl.*RwLock' --type=rust -C3

Repository: RustPython/RustPython

Length of output: 50377


🏁 Script executed:

cat crates/common/src/lock/cell_lock.rs | grep -A10 "force_unlock"

Repository: RustPython/RustPython

Length of output: 47


🌐 Web query:

parking_lot RwLock force_unlock_read force_unlock_write fork safety

💡 Result:

parking_lot’s RwLock::{force_unlock_read, force_unlock_write} (actually provided by lock_api::RwLock) are not “recovery” tools for fork().

  • They are unsafe escape hatches meant only for the case where the current thread still logically owns the lock but you intentionally leaked the guard (e.g., mem::forget(guard) for FFI), and you need to unlock without having the guard object anymore. Calling them when the lock isn’t held in that mode is undefined behavior. [1]
  • After fork() in a multi-threaded process, only the forking thread exists in the child, but the child inherits the memory state of all mutex/rwlock objects. If another (now-nonexistent) thread held a lock at fork time, that lock can remain locked forever in the child, and attempting to use such locks can deadlock or worse. [2]
  • POSIX’s intended mitigation is “fork then immediately exec” (avoid touching library code/allocators/locks in between), or to use pthread_atfork() handlers to put your process into a consistent lock state across fork—but that only works if you can reliably coordinate all relevant locks. [2][3]

So: do not use force_unlock_read/write to “make the child safe after fork”. It violates the ownership assumptions those methods require [1], and it cannot make the protected data consistent if a different thread was in the middle of updating it at fork time. [2]

Sources: [1] [2] [3]


Critical unsafe code misuse: force_unlock_read and force_unlock_write are not safe to use for fork() recovery.

The force_unlock_read() and force_unlock_write() methods from parking_lot are designed only for cases where the current thread still logically owns the lock but intentionally leaked the guard (e.g., via mem::forget). They are not meant to recover from locks held by dead threads after fork(), as they violate the ownership assumptions these methods require. Calling them when a different (now-nonexistent) thread held the lock is undefined behavior.

After fork(), using pthread_atfork() handlers or relying on the POSIX principle of "fork then immediately exec" are the intended mitigations. The current approach of force-unlocking locks held by other threads is unsafe and should not be used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/gc_state.rs` around lines 793 - 856, The code incorrectly uses
parking_lot's force_unlock_read/force_unlock_write (and force_unlock) in
force_unlock_after_fork and force_unlock_rwlock_after_fork to recover locks held
by other threads after fork, which is undefined behavior; remove any calls to
force_unlock_read/force_unlock_write/force_unlock and stop trying to forcibly
recover locks from dead threads. Instead, change force_unlock_after_fork and
force_unlock_rwlock_after_fork to not perform unsafe forcible unlocks: either
(A) make them a no-op that logs/returns an Err indicating fork-without-exec is
unsupported, or (B) abort/panic with a clear message instructing callers to use
pthread_atfork handlers or ensure fork is followed by exec; also update any
callers to handle the error or avoid calling these functions after fork. Ensure
references: update function force_unlock_after_fork, helper
force_unlock_rwlock_after_fork, and remove usages of
PyRwLock::force_unlock_read, PyRwLock::force_unlock_write, and
Mutex::force_unlock in this module.

@youknowone youknowone merged commit f096bb7 into RustPython:main Feb 23, 2026
13 checks passed
@youknowone youknowone deleted the fork-deadlock branch February 23, 2026 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant