Fix SSL read over-consuming TCP data and ZeroReturn handling#7418
Fix SSL read over-consuming TCP data and ZeroReturn handling#7418youknowone merged 1 commit intoRustPython:mainfrom
Conversation
📝 WalkthroughWalkthroughSimplified SSL read handling: zero-return now unconditionally returns empty bytes; introduced and renamed single-record TLS intake helpers and switched handshake and post-handshake reads to boundary-aware, single-record semantics to preserve control records and adjust EOF/WantRead/WantWrite propagation. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant PySSLSock as PySSLSocket
participant Recv as recv_one_tls_record / recv_one_tls_record_for_data
participant Sock as KernelSocket
participant VM as VirtualMachine
App->>PySSLSock: ssl_read()
PySSLSock->>Recv: request single TLS record (vm)
Recv->>Sock: peek/read up to one TLS record header/body
Sock-->>Recv: record data or control (close_notify) / EAGAIN / EOF
Recv-->>PySSLSock: record bytes or EOF/WantRead/WantWrite info
PySSLSock-->>App: return data or empty-bytes (ZeroReturn) or propagate WantRead/WantWrite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/ftplib.py dependencies:
dependent tests: (34 tests)
Legend:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/stdlib/src/ssl/compat.rs (1)
1196-1198:⚠️ Potential issue | 🔴 CriticalReturn empty bytes on EOF instead of
WantReadat line 1196.When
sock_peek()returns empty bytes, it signals TCP EOF (peer closure). Currently line 1196–1198 converts this toSslError::WantRead, which bypasses the EOF detection logic inssl_read_tls_records()(lines 1985–2012). The function explicitly handles empty socket-mode bytes by callingprocess_new_packets()to distinguish clean shutdown (ZeroReturn) from ragged EOF (Eof). ReturningWantReadinstead prevents that logic from executing and can cause retry loops or incorrect error reporting. Callers at lines 1719 and 2125 also expect to receive bytes and checkbytes_read == 0for EOF detection; the current path short-circuits this.Return
Ok(vm.ctx.new_bytes(vec![]).into())to allow empty bytes to flow downstream for proper EOF classification.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl/compat.rs` around lines 1196 - 1198, When sock_peek() returns an empty Vec, don't convert that to SslError::WantRead; instead return an empty-byte VM object so downstream EOF logic can run. Replace the branch in compat::sock_peek handling that currently does "return Err(SslError::WantRead)" with returning Ok(vm.ctx.new_bytes(vec![]).into()) (so callers like the code at ssl_read_tls_records() and callers at the sites that check bytes_read == 0 receive empty bytes). This ensures ssl_read_tls_records() can call process_new_packets() and correctly distinguish ZeroReturn vs Eof rather than short-circuiting with WantRead.
🧹 Nitpick comments (1)
crates/stdlib/src/ssl.rs (1)
3642-3648: Extract the exception value once, then returnErr(...)once.Line 3642–Line 3648 duplicates the same return path and only changes the exception constructor.
♻️ Proposed refactor
- if our_shutdown_state == ShutdownState::SentCloseNotify - || our_shutdown_state == ShutdownState::Completed - { - Err(create_ssl_zero_return_error(vm).upcast()) - } else { - Err(create_ssl_eof_error(vm).upcast()) - } + let exc = if our_shutdown_state == ShutdownState::SentCloseNotify + || our_shutdown_state == ShutdownState::Completed + { + create_ssl_zero_return_error(vm).upcast() + } else { + create_ssl_eof_error(vm).upcast() + }; + Err(exc)As per coding guidelines: "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl.rs` around lines 3642 - 3648, The two branches return the same Err(...) path but construct different exceptions; refactor by computing the exception value first (choose between create_ssl_zero_return_error(vm) and create_ssl_eof_error(vm) based on our_shutdown_state and the ShutdownState::SentCloseNotify/ShutdownState::Completed checks) and then return Err(exception.upcast()) once. Use the existing symbols our_shutdown_state, ShutdownState::SentCloseNotify, ShutdownState::Completed, create_ssl_zero_return_error, create_ssl_eof_error and vm to locate code and replace the duplicate Err(...) branches with a single common return that upcasts the previously selected exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/stdlib/src/ssl/compat.rs`:
- Around line 1196-1198: When sock_peek() returns an empty Vec, don't convert
that to SslError::WantRead; instead return an empty-byte VM object so downstream
EOF logic can run. Replace the branch in compat::sock_peek handling that
currently does "return Err(SslError::WantRead)" with returning
Ok(vm.ctx.new_bytes(vec![]).into()) (so callers like the code at
ssl_read_tls_records() and callers at the sites that check bytes_read == 0
receive empty bytes). This ensures ssl_read_tls_records() can call
process_new_packets() and correctly distinguish ZeroReturn vs Eof rather than
short-circuiting with WantRead.
---
Nitpick comments:
In `@crates/stdlib/src/ssl.rs`:
- Around line 3642-3648: The two branches return the same Err(...) path but
construct different exceptions; refactor by computing the exception value first
(choose between create_ssl_zero_return_error(vm) and create_ssl_eof_error(vm)
based on our_shutdown_state and the
ShutdownState::SentCloseNotify/ShutdownState::Completed checks) and then return
Err(exception.upcast()) once. Use the existing symbols our_shutdown_state,
ShutdownState::SentCloseNotify, ShutdownState::Completed,
create_ssl_zero_return_error, create_ssl_eof_error and vm to locate code and
replace the duplicate Err(...) branches with a single common return that upcasts
the previously selected exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 62131672-c4e9-41b0-8b79-13a1afe8f296
⛔ Files ignored due to path filters (1)
Lib/test/test_ftplib.pyis excluded by!Lib/**
📒 Files selected for processing (2)
crates/stdlib/src/ssl.rscrates/stdlib/src/ssl/compat.rs
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/stdlib/src/ssl/compat.rs (2)
1184-1189:⚠️ Potential issue | 🟠 MajorKeep EOF/reset normalization in the new helper.
recv_one_tls_record()now only special-casesBlockingIOError. On the non-BIO paths that switched to this helper,ConnectionResetError/ConnectionAbortedErrorfrom either the peek or the recv will bubble out as raw Python exceptions instead ofSslError::Eof, which regresses the SSL EOF path andsuppress_ragged_eofs.Proposed fix
Err(e) => { if is_blocking_io_error(&e, vm) { return Err(SslError::WantRead); } + if is_connection_closed_error(&e, vm) { + return Err(SslError::Eof); + } return Err(SslError::Py(e)); } }; @@ socket.sock_recv(recv_size, vm).map_err(|e| { if is_blocking_io_error(&e, vm) { SslError::WantRead + } else if is_connection_closed_error(&e, vm) { + SslError::Eof } else { SslError::Py(e) } })If the old
conn.process_new_packets()check is still needed before classifying EOF, keep that logic at the non-BIO call sites. As per coding guidelines, "Follow Rust best practices for error handling and memory management".Also applies to: 1234-1240
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl/compat.rs` around lines 1184 - 1189, The new helper recv_one_tls_record() currently only maps BlockingIOError to SslError::WantRead and lets ConnectionResetError/ConnectionAbortedError escape as raw Python exceptions, breaking EOF normalization and suppress_ragged_eofs; update recv_one_tls_record() (and the duplicate logic around the other occurrence at ~1234-1240) to map connection-reset/aborted errors to SslError::Eof (as the previous code did), and if the original conn.process_new_packets() check is required for correct EOF classification on non-BIO paths, reintroduce that check at the non-BIO call sites before calling recv_one_tls_record() so EOF/reset are normalized to SslError::Eof instead of propagating raw Python errors.
1182-1228:⚠️ Potential issue | 🟠 MajorPeek a full wire-format record; 16 KiB is insufficient for both TLS 1.2 and TLS 1.3.
SSL3_RT_MAX_PLAIN_LENGTH(16384) is a plaintext bound, not a wire-format record bound. TLS 1.3 records can reach 5 + 16640 = 16645 bytes (header + ciphertext expansion). TLS 1.2 records can reach 5 + 18432 = 18437 bytes. The current 16384-byte peek may capture only a prefix of a valid ciphertext record, causing the remaining bytes to be misinterpreted as a new record header on the next call.The proposed fix of adding 256 bytes (16645 total) correctly handles TLS 1.3 but is insufficient for TLS 1.2 support, which requires approximately 18437 bytes. Increase the peek bound to cover the maximum record size for the configured TLS version(s), or read the 5-byte header first and then peek the remaining
record_lengthbytes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl/compat.rs` around lines 1182 - 1228, The current peek uses SSL3_RT_MAX_PLAIN_LENGTH (16384) which is only plaintext size and can truncate TLS wire-format records (TLS1.3 up to ~16645 bytes and TLS1.2 up to ~18437 bytes); update the logic in the function using socket.sock_peek and SSL3_RT_MAX_PLAIN_LENGTH so we never mis-parse a partial record: first call sock_peek for exactly TLS_RECORD_HEADER_SIZE (5) to obtain the header, parse record_body_len from bytes [3..5] (as done with record_body_len = u16::from_be_bytes(...)), then call sock_peek again for TLS_RECORD_HEADER_SIZE + record_body_len (or ensure you peek that many bytes) before deciding recv_size; keep the same WantRead/Eof/Py error handling around sock_peek and socket.sock_recv.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/stdlib/src/ssl/compat.rs`:
- Around line 1184-1189: The new helper recv_one_tls_record() currently only
maps BlockingIOError to SslError::WantRead and lets
ConnectionResetError/ConnectionAbortedError escape as raw Python exceptions,
breaking EOF normalization and suppress_ragged_eofs; update
recv_one_tls_record() (and the duplicate logic around the other occurrence at
~1234-1240) to map connection-reset/aborted errors to SslError::Eof (as the
previous code did), and if the original conn.process_new_packets() check is
required for correct EOF classification on non-BIO paths, reintroduce that check
at the non-BIO call sites before calling recv_one_tls_record() so EOF/reset are
normalized to SslError::Eof instead of propagating raw Python errors.
- Around line 1182-1228: The current peek uses SSL3_RT_MAX_PLAIN_LENGTH (16384)
which is only plaintext size and can truncate TLS wire-format records (TLS1.3 up
to ~16645 bytes and TLS1.2 up to ~18437 bytes); update the logic in the function
using socket.sock_peek and SSL3_RT_MAX_PLAIN_LENGTH so we never mis-parse a
partial record: first call sock_peek for exactly TLS_RECORD_HEADER_SIZE (5) to
obtain the header, parse record_body_len from bytes [3..5] (as done with
record_body_len = u16::from_be_bytes(...)), then call sock_peek again for
TLS_RECORD_HEADER_SIZE + record_body_len (or ensure you peek that many bytes)
before deciding recv_size; keep the same WantRead/Eof/Py error handling around
sock_peek and socket.sock_recv.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 17e1ee9f-471f-4a5c-92db-938969bdaca7
⛔ Files ignored due to path filters (1)
Lib/test/test_ftplib.pyis excluded by!Lib/**
📒 Files selected for processing (2)
crates/stdlib/src/ssl.rscrates/stdlib/src/ssl/compat.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/stdlib/src/ssl.rs
Use single-record reading (recv_one_tls_record) for all SSL reads, not just handshake. This prevents rustls from eagerly consuming close_notify alongside application data, which left the TCP buffer empty and caused select()-based servers to miss readability and time out. Also fix recv_one_tls_record to return Eof (not WantRead) when peek returns empty bytes, since empty peek means the peer has closed the TCP connection.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/stdlib/src/ssl/compat.rs (1)
1167-1241:⚠️ Potential issue | 🔴 CriticalThe generalized single-record helper can still split a valid TLS record mid-stream.
recv_one_tls_record()still peeks onlySSL3_RT_MAX_PLAIN_LENGTH, which is just the record body cap. A legal 16 KiB record is already 16,389 bytes on the wire before any ciphertext overhead, so this logic will treat it as “incomplete”, consume only the first 16,384 bytes, and leave the next call starting in the middle of that same record. Once this PR routes normalssl_read()/ssl_ensure_data_available()traffic through the helper, the next header parse is reading payload bytes rather than a real TLS header and the stream state is corrupted. Please size the boundary detection for the maximum wire-record length rustls accepts before reusing this helper outside the handshake.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl/compat.rs` around lines 1167 - 1241, recv_one_tls_record currently peeks only SSL3_RT_MAX_PLAIN_LENGTH which can split a legal max-size TLS record; change the initial peek size passed to socket.sock_peek in recv_one_tls_record to the maximum possible on‑the‑wire TLS record length rustls accepts (i.e. TLS_RECORD_HEADER_SIZE + SSL3_RT_MAX_PLAIN_LENGTH plus any max cipher/nonce/tag overhead your TLS implementation permits) instead of SSL3_RT_MAX_PLAIN_LENGTH so the header/length parsing will always see a full record boundary; update any associated constant (introduce something like MAX_WIRE_RECORD_LEN or TLS_MAX_ON_WIRE_SIZE) and use that symbol in the peek and related logic so the helper never consumes only the first fragment of a legal record.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/stdlib/src/ssl/compat.rs`:
- Around line 1167-1241: recv_one_tls_record currently peeks only
SSL3_RT_MAX_PLAIN_LENGTH which can split a legal max-size TLS record; change the
initial peek size passed to socket.sock_peek in recv_one_tls_record to the
maximum possible on‑the‑wire TLS record length rustls accepts (i.e.
TLS_RECORD_HEADER_SIZE + SSL3_RT_MAX_PLAIN_LENGTH plus any max cipher/nonce/tag
overhead your TLS implementation permits) instead of SSL3_RT_MAX_PLAIN_LENGTH so
the header/length parsing will always see a full record boundary; update any
associated constant (introduce something like MAX_WIRE_RECORD_LEN or
TLS_MAX_ON_WIRE_SIZE) and use that symbol in the peek and related logic so the
helper never consumes only the first fragment of a legal record.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: df9ee0ba-8364-4ee7-bb49-81dabb3b0d10
⛔ Files ignored due to path filters (1)
Lib/test/test_ftplib.pyis excluded by!Lib/**
📒 Files selected for processing (2)
crates/stdlib/src/ssl.rscrates/stdlib/src/ssl/compat.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/stdlib/src/ssl.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
crates/stdlib/src/ssl/compat.rs (2)
2162-2162: Replace BIO recv chunk magic number with a named constantLine 2162 uses a raw
2048. Consider introducing a constant (e.g.,BIO_RECV_CHUNK_SIZE) for readability and easier tuning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl/compat.rs` at line 2162, Replace the magic number 2048 used in the call to socket.sock_recv(2048, vm) with a named constant (e.g., BIO_RECV_CHUNK_SIZE) for clarity and tunability: define const BIO_RECV_CHUNK_SIZE: usize = 2048 (or the appropriate integer type used by sock_recv) near the top of the ssl compat module, then update the call to socket.sock_recv(BIO_RECV_CHUNK_SIZE, vm); ensure the constant's visibility (pub(crate) if needed) and type match other uses of sock_recv in functions like socket.sock_recv and any related recv logic.
1248-1272: Factor duplicatedprocess_new_packets()handling in the new helperLine 1256-Line 1258 and Line 1262-Line 1264 repeat the same
process_new_packets()+ rustls-error mapping. This is small, but extracting it once would reduce branching noise and future drift.♻️ Optional refactor sketch
+fn process_pending_tls(conn: &mut TlsConnection) -> SslResult<()> { + conn.process_new_packets().map_err(SslError::from_rustls)?; + Ok(()) +} + fn recv_one_tls_record_for_data( conn: &mut TlsConnection, socket: &PySSLSocket, vm: &VirtualMachine, ) -> SslResult<PyObjectRef> { match recv_one_tls_record(socket, vm) { Ok(data) => Ok(data), Err(SslError::Eof) => { - if let Err(rustls_err) = conn.process_new_packets() { - return Err(SslError::from_rustls(rustls_err)); - } + process_pending_tls(conn)?; Ok(vm.ctx.new_bytes(vec![]).into()) } Err(SslError::Py(e)) => { - if let Err(rustls_err) = conn.process_new_packets() { - return Err(SslError::from_rustls(rustls_err)); - } + process_pending_tls(conn)?; if is_connection_closed_error(&e, vm) { return Err(SslError::Eof); } Err(SslError::Py(e)) } Err(e) => Err(e), } }As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/stdlib/src/ssl/compat.rs` around lines 1248 - 1272, In recv_one_tls_record_for_data, extract the duplicated process_new_packets() + rustls->SslError mapping into a small helper (e.g., ensure_process_new_packets(conn: &mut TlsConnection) -> Result<(), SslError>) that calls conn.process_new_packets() and maps any rustls error with SslError::from_rustls; then replace both occurrences in the Err(SslError::Eof) and Err(SslError::Py(e)) arms with a single call to ensure_process_new_packets(...) and early-return the mapped SslError on Err, leaving the existing empty-bytes return and is_connection_closed_error(&e, vm) logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/stdlib/src/ssl.rs`:
- Around line 3632-3634: The SslError::ZeroReturn handling currently always
returns empty bytes (via return_data(vec![], &buffer, vm)), losing the
shutdown-state distinction; change the branch that handles SslError::ZeroReturn
to inspect the SSL shutdown state (and the suppress_ragged_eofs behavior) and,
if shutdown has not been initiated, raise SSLEOFError instead of returning empty
data, while preserving the empty-byte return only for shutdown-driven close
paths; update the logic around the SslError::ZeroReturn match arm (the code that
calls return_data(vec![], &buffer, vm)) to check the SSL connection shutdown
flag and call the VM error raising path for SSLEOFError when appropriate.
---
Nitpick comments:
In `@crates/stdlib/src/ssl/compat.rs`:
- Line 2162: Replace the magic number 2048 used in the call to
socket.sock_recv(2048, vm) with a named constant (e.g., BIO_RECV_CHUNK_SIZE) for
clarity and tunability: define const BIO_RECV_CHUNK_SIZE: usize = 2048 (or the
appropriate integer type used by sock_recv) near the top of the ssl compat
module, then update the call to socket.sock_recv(BIO_RECV_CHUNK_SIZE, vm);
ensure the constant's visibility (pub(crate) if needed) and type match other
uses of sock_recv in functions like socket.sock_recv and any related recv logic.
- Around line 1248-1272: In recv_one_tls_record_for_data, extract the duplicated
process_new_packets() + rustls->SslError mapping into a small helper (e.g.,
ensure_process_new_packets(conn: &mut TlsConnection) -> Result<(), SslError>)
that calls conn.process_new_packets() and maps any rustls error with
SslError::from_rustls; then replace both occurrences in the Err(SslError::Eof)
and Err(SslError::Py(e)) arms with a single call to
ensure_process_new_packets(...) and early-return the mapped SslError on Err,
leaving the existing empty-bytes return and is_connection_closed_error(&e, vm)
logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: a0683761-f69e-42a5-9bf0-9bc481a0d9f6
📒 Files selected for processing (2)
crates/stdlib/src/ssl.rscrates/stdlib/src/ssl/compat.rs
crates/stdlib/src/ssl.rs
Outdated
| // Python's read() returns empty bytes on SSL_ERROR_ZERO_RETURN | ||
| // when the peer's close_notify has been received. | ||
| return_data(vec![], &buffer, vm) |
There was a problem hiding this comment.
Restore shutdown-aware ZeroReturn handling (behavior regression).
At Line 3632, SslError::ZeroReturn now always returns empty data. This removes the shutdown-state distinction and can suppress EOF signaling that callers rely on when suppress_ragged_eofs=False. Keep ZeroReturn state-aware: raise SSLEOFError when shutdown has not started; return empty only for shutdown-driven close paths.
Suggested fix
Err(crate::ssl::compat::SslError::ZeroReturn) => {
@@
- // Python's read() returns empty bytes on SSL_ERROR_ZERO_RETURN
- // when the peer's close_notify has been received.
- return_data(vec![], &buffer, vm)
+ if shutdown_state == ShutdownState::NotStarted {
+ // Preserve EOF signaling for non-shutdown reads.
+ Err(create_ssl_eof_error(vm).upcast())
+ } else {
+ // During/after shutdown, return empty data on close_notify.
+ return_data(vec![], &buffer, vm)
+ }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/stdlib/src/ssl.rs` around lines 3632 - 3634, The SslError::ZeroReturn
handling currently always returns empty bytes (via return_data(vec![], &buffer,
vm)), losing the shutdown-state distinction; change the branch that handles
SslError::ZeroReturn to inspect the SSL shutdown state (and the
suppress_ragged_eofs behavior) and, if shutdown has not been initiated, raise
SSLEOFError instead of returning empty data, while preserving the empty-byte
return only for shutdown-driven close paths; update the logic around the
SslError::ZeroReturn match arm (the code that calls return_data(vec![], &buffer,
vm)) to check the SSL connection shutdown flag and call the VM error raising
path for SSLEOFError when appropriate.
Use single-record reading (recv_one_tls_record) for all SSL reads, not just handshake. This prevents rustls from eagerly consuming close_notify alongside application data, which left the TCP buffer empty and caused select()-based servers to miss readability and time out.
Also fix ZeroReturn to raise SSLEOFError when shutdown has not started, so suppress_ragged_eofs=True can suppress it properly.
Summary by CodeRabbit
Bug Fixes
Refactor