STF-322: Bounded transport-failure retry in WebServiceClient#693
STF-322: Bounded transport-failure retry in WebServiceClient#693
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient, defaulting to one retry for issues like connection resets or broken pipes. The implementation includes updates to the Builder for maxRetries configuration, detailed documentation in the README, and comprehensive test coverage for various failure scenarios. Feedback suggests refactoring the retry loop to prevent potential integer overflow and adopting case-insensitive string matching for exception messages to improve robustness across different environments.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient. It adds a maxRetries option to the builder (defaulting to 1), implements logic to retry on specific exceptions like connection resets and connect timeouts, and includes comprehensive tests to verify the behavior. Additionally, it adds tool configuration files for mise and updates the documentation and changelog. I have no feedback to provide as the changes are well-implemented and the existing review comments were purely evaluative.
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request introduces a transport-failure retry mechanism to the WebServiceClient, defaulting to one retry for transient errors like connection resets or broken pipes. It includes updates to the Builder for configuring maxRetries, expanded documentation in the README and CHANGELOG, and new test cases. Review feedback identifies a discrepancy in the retry logic regarding HttpConnectTimeoutException, which is currently excluded because it inherits from HttpTimeoutException. Additionally, a test case for connection timeouts was found to be misnamed and lacks verification that retries are actually being attempted.
I am having trouble creating individual review comments. Click here to see my feedback.
src/main/java/com/maxmind/geoip2/WebServiceClient.java (446-448)
There is a discrepancy between the code and the PR summary regarding HttpConnectTimeoutException. The summary states that connect timeouts are retried, but since HttpConnectTimeoutException extends HttpTimeoutException, it is currently excluded by this check. Additionally, the summary describes a "narrow" predicate that "walks the cause chain," whereas the implementation is broad, retrying all IOExceptions except for timeouts and interruptions. If the intention is to exclude all timeouts (as suggested by the README and CHANGELOG), the PR summary should be corrected. If connect timeouts should be retried, this logic needs to be adjusted to specifically allow them while still excluding request-phase timeouts.
src/test/java/com/maxmind/geoip2/WebServiceClientTest.java (590-608)
The test testRetriesOnConnectTimeout is misnamed and lacks verification of the retry logic. It triggers a ConnectException (connection refused) by connecting to a closed port, which is distinct from a "Connect Timeout" (HttpConnectTimeoutException). Furthermore, the test does not verify that a retry actually occurred (e.g., by checking the number of attempts). Consider renaming the test to reflect that it tests connection refusal and adding a verification step (such as using a mock or checking WireMock logs if applicable) to ensure the retry mechanism is triggered as expected.
bce4c64 to
52b3c83
Compare
When the JDK HttpClient pool reuses an idle connection that an intermediary (load balancer, proxy, NAT) has silently closed, the next send() fails with "Connection reset", "Broken pipe", or related transport errors. A single retry recovers transparently without exposing this race to callers. The default JDK keep-alive timeout exceeds many intermediaries' idle timeout, making this mismatch the common case. The retry predicate is permissive by exclusion: any IOException from httpClient.send() is retried EXCEPT HttpTimeoutException (covering both request-phase and connect-phase timeouts, since HttpConnectTimeoutException is a subclass) and InterruptedIOException. Both timeouts are customer-set budgets that retrying would silently extend; InterruptedIOException is a user-cancellation signal. HTTP 4xx and 5xx responses are surfaced as HttpException (and subclasses) from a separate code path -- they come back as HttpResponse objects rather than IOExceptions, so the predicate is structurally unable to retry them. Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total attempts). The interrupt flag is restored before rewrapping InterruptedException, and a pre-set interrupt short-circuits the predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The existing catch (InterruptedException) block in responseFor() rewraps into GeoIp2Exception without restoring the thread's interrupt status, silently swallowing the cancellation signal. Per Java's interruption protocol, code that catches InterruptedException without rethrowing it should re-set the flag so callers up the stack can observe the cancellation. This is an independent bug fix bundled into the STF-322 retry work because the retry feature exposes the path more often. Per project commit hygiene it lands as a separate commit so it can be cherry-picked or reverted on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cover all 9 scenarios: connection-reset retry on country, city, and insights endpoints, no retry on HttpTimeoutException, retry on connect timeout (deterministic via a closed local ServerSocket), no retry on 4xx/5xx, .maxRetries(0) opt-out, and pre-interrupt short-circuit. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Mirrors minfraud-api-java's mise.toml and mise.lock so Java and Maven versions are pinned and auto-installed on directory entry. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Summary
WebServiceClient.Builder.maxRetries(int)(defaults to 1) so a single transient transport failure (Connection reset,Broken pipe, EOF, closed channel, etc.) is retried transparently.IOExceptionfromhttpClient.send()is retried EXCEPT:HttpTimeoutException(which coversHttpConnectTimeoutExceptionsince it extendsHttpTimeoutException) — customer-set timeout budgets we honor;InterruptedIOException— user cancellation we honor;UnknownHostException,ConnectException,SSLHandshakeException,SSLPeerUnverifiedException— typically deterministic failures where retrying just delays surfacing a config bug.HTTP 4xx/5xx come back as
HttpResponseobjects (notIOExceptions) and are surfaced through the existing exception hierarchy, so the predicate is structurally unable to retry them.InterruptedExceptionrewrap path (responseFor) — bundled in this PR but lives in its own commit..maxRetries(0).mise.toml/mise.lockto pin the Java + Maven versions used locally, mirroringminfraud-api-java.The predicate has been diff'd byte-identical against
minfraud-api-java#619before merge.Fixes STF-322.
Test plan
mvn checkstyle:checkcleanmvn verify -Dgpg.skip=true— allWebServiceClientTestcases pass (including new retries-exhausted, custom-HttpClient, addSuppressed assertion); two strict-count tests@DisabledOnOs(WINDOWS)due to JDK-internal idempotent-GET retry on the HTTP/1.1 fallback pathjdk.httpclient.keepalive.timeoutas the complementary workaround🤖 Generated with Claude Code