feat(mint): Automatic Reconnection with Exponential Backoff#512
Open
arctarus wants to merge 1 commit intoelixir-grpc:masterfrom
Open
feat(mint): Automatic Reconnection with Exponential Backoff#512arctarus wants to merge 1 commit intoelixir-grpc:masterfrom
arctarus wants to merge 1 commit intoelixir-grpc:masterfrom
Conversation
5820c2b to
af91a16
Compare
Implements retry logic in GRPC.Client.Adapters.Mint.ConnectionProcess so that dropped HTTP/2 connections are transparently re-established without requiring a new channel. Changes: - ConnectionProcess.State: add :scheme, :host, :port, :connect_opts, :retry, and :retry_attempt fields so the process can reconnect autonomously. - ConnectionProcess.init/1: persists connection params in state; pops :retry from opts before forwarding to Mint.HTTP.connect/4. - ConnectionProcess: add attempt_reconnect/1, handle_info(:reconnect), and retry_timeout/1 (exponential backoff, base 1.6, capped at 120s, with jitter). finish_all_pending_requests/1 triggers reconnection when retry > 0 instead of immediately notifying the parent. - Mint.connect/2: extracts :retry from adapter opts and passes it through to ConnectionProcess; documents the new option. - Remove Stub.retry_timeout/1 — dead code that was never called and had a broken guard making it fail for curr >= 11. The correct implementation now lives in ConnectionProcess. Tests: - connection_process_test.exs: unit tests for retry_timeout/1, immediate reconnect on drop, exhaustion notification, scheduled retry on failure, and successful reconnect resetting the attempt counter. - mint_test.exs: integration tests verifying :retry propagation to state and correct default of 0. Docs: - README.md / grpc_client/README.md: document the :retry option under the Mint adapter section with usage example and behaviour notes. Made-with: Cursor
af91a16 to
6c6ccd8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Mint adapter had no way to recover from a dropped HTTP/2 connection. Once the server restarted, went through a rolling deploy, or the network blipped, the
ConnectionProcesswould detect the close event, drain any in-flight requests with an error, notify the parent via{:elixir_grpc, :connection_down, self()}, and then sit inert — leaving the caller responsible for creating a whole new channel.The Gun adapter already supports transparent reconnection through its built-in
:retry/:retry_timeoutoptions. This PR brings equivalent behaviour to the Mint adapter so that operators who prefer pure-Elixir HTTP/2 are not at a disadvantage.A secondary cleanup is also included:
GRPC.Stub.retry_timeout/1— dead code that was never called and had a broken guard (when curr < 11) that made it raiseFunctionClauseErrorfor any input ≥ 11 — is removed. The correct implementation now lives where it is actually used, insideConnectionProcess.What changed
GRPC.Client.Adapters.Mint.ConnectionProcess.StateSix new fields were added to the state struct:
:scheme:host:port:connect_optsMint.HTTP.connect/4:retry0= disabled):retry_attempt0after a successful reconnectThe
State.new/2constructor signature changed fromnew(conn, parent)tonew(conn, opts)to accommodate all fields from a single keyword list, keeping callers clean.GRPC.Client.Adapters.Mint.ConnectionProcessThree areas were extended:
init/1Pops
:retryfromoptsbefore forwarding toMint.HTTP.connect/4(Mint doesn't understand that key), then builds the state with all connection parameters persisted for future reconnection.Reconnect logic
A new
handle_info(:reconnect, state)clause dispatches toattempt_reconnect/1, which has two clauses:retry_attempt >= retry): logs a warning and sends{:elixir_grpc, :connection_down, self()}to the parent — the same signal the process would have sent immediately without retry enabled.Mint.HTTP.connect/4. On success, replaces the deadconnand resetsretry_attemptto0. On failure, incrementsretry_attempt, schedulesProcess.send_after(self(), :reconnect, timeout)with backoff, and stays alive to handle the next attempt.finish_all_pending_requests/1After draining in-flight requests and the request queue (sending errors to waiting callers), the function now branches on
retry > 0:retry > 0→ callsattempt_reconnect/1instead of notifying the parent.retry == 0→ original behaviour: immediately sends:connection_down.retry_timeout/1(public for testability)Computes the delay before the next reconnection attempt:
This matches the algorithm already used by
Gun.retry_fun/2, giving consistent backoff behaviour across both adapters.GRPC.Client.Adapters.MintThe
:retryoption is extracted fromadapter_optsinconnect/2before the opts are forwarded toconnect_opts/2(which only knows about Mint transport options), then re-injected into the final keyword list that reachesConnectionProcess.start_link/4. The@docwas updated to document the new option.GRPC.Stubretry_timeout/1was removed. It was never called anywhere in the codebase, its guardwhen curr < 11made it always raiseFunctionClauseErrorfor inputs ≥ 11, and theelse 120_000branch inside the body was unreachable dead code.Usage
With
retry: 5, if the connection drops the adapter will try to reconnect up to 5 times. Delays grow from ~1 s to ~120 s using exponential backoff with jitter. After 5 failed attempts the parent receives{:elixir_grpc, :connection_down, pid}.By default
:retryis0— no reconnection, preserving the existing behaviour.Tests
connection_process_test.exsretry_timeout/1backoff values and cap; immediate reconnect on TCP close when server is still up; retry exhaustion notifies parent; failed reconnect schedules next attempt; successful reconnect resets countermint_test.exs:retryoption propagated toConnectionProcessstate; default is0The existing test suite (224 tests) passes without modification.