Skip to content

feat(mint): Automatic Reconnection with Exponential Backoff#512

Open
arctarus wants to merge 1 commit intoelixir-grpc:masterfrom
arctarus:feat/mint-retry-on-connection-drop
Open

feat(mint): Automatic Reconnection with Exponential Backoff#512
arctarus wants to merge 1 commit intoelixir-grpc:masterfrom
arctarus:feat/mint-retry-on-connection-drop

Conversation

@arctarus
Copy link
Copy Markdown
Contributor

Problem

The Mint adapter had no way to recover from a dropped HTTP/2 connection. Once the server restarted, went through a rolling deploy, or the network blipped, the ConnectionProcess would detect the close event, drain any in-flight requests with an error, notify the parent via {:elixir_grpc, :connection_down, self()}, and then sit inert — leaving the caller responsible for creating a whole new channel.

The Gun adapter already supports transparent reconnection through its built-in :retry / :retry_timeout options. This PR brings equivalent behaviour to the Mint adapter so that operators who prefer pure-Elixir HTTP/2 are not at a disadvantage.

A secondary cleanup is also included: GRPC.Stub.retry_timeout/1 — dead code that was never called and had a broken guard (when curr < 11) that made it raise FunctionClauseError for any input ≥ 11 — is removed. The correct implementation now lives where it is actually used, inside ConnectionProcess.


What changed

GRPC.Client.Adapters.Mint.ConnectionProcess.State

Six new fields were added to the state struct:

Field Purpose
:scheme Stored on connect so the process can reconnect autonomously
:host idem
:port idem
:connect_opts The full keyword list passed to Mint.HTTP.connect/4
:retry Maximum number of reconnection attempts (default 0 = disabled)
:retry_attempt Running counter; reset to 0 after a successful reconnect

The State.new/2 constructor signature changed from new(conn, parent) to new(conn, opts) to accommodate all fields from a single keyword list, keeping callers clean.

GRPC.Client.Adapters.Mint.ConnectionProcess

Three areas were extended:

init/1
Pops :retry from opts before forwarding to Mint.HTTP.connect/4 (Mint doesn't understand that key), then builds the state with all connection parameters persisted for future reconnection.

Reconnect logic
A new handle_info(:reconnect, state) clause dispatches to attempt_reconnect/1, which has two clauses:

  • Retries exhausted (retry_attempt >= retry): logs a warning and sends {:elixir_grpc, :connection_down, self()} to the parent — the same signal the process would have sent immediately without retry enabled.
  • Attempt available: calls Mint.HTTP.connect/4. On success, replaces the dead conn and resets retry_attempt to 0. On failure, increments retry_attempt, schedules Process.send_after(self(), :reconnect, timeout) with backoff, and stays alive to handle the next attempt.

finish_all_pending_requests/1
After draining in-flight requests and the request queue (sending errors to waiting callers), the function now branches on retry > 0:

  • retry > 0 → calls attempt_reconnect/1 instead of notifying the parent.
  • retry == 0 → original behaviour: immediately sends :connection_down.

retry_timeout/1 (public for testability)
Computes the delay before the next reconnection attempt:

timeout = 1.6^(attempt - 1) * 1000 ms   (capped at 120 000 ms for attempt >= 11)
jitter  = ±20% of timeout

This matches the algorithm already used by Gun.retry_fun/2, giving consistent backoff behaviour across both adapters.

GRPC.Client.Adapters.Mint

The :retry option is extracted from adapter_opts in connect/2 before the opts are forwarded to connect_opts/2 (which only knows about Mint transport options), then re-injected into the final keyword list that reaches ConnectionProcess.start_link/4. The @doc was updated to document the new option.

GRPC.Stub

retry_timeout/1 was removed. It was never called anywhere in the codebase, its guard when curr < 11 made it always raise FunctionClauseError for inputs ≥ 11, and the else 120_000 branch inside the body was unreachable dead code.


Usage

{:ok, channel} = GRPC.Stub.connect("localhost:50051",
  adapter: GRPC.Client.Adapters.Mint,
  adapter_opts: [retry: 5]
)

With retry: 5, if the connection drops the adapter will try to reconnect up to 5 times. Delays grow from ~1 s to ~120 s using exponential backoff with jitter. After 5 failed attempts the parent receives {:elixir_grpc, :connection_down, pid}.

By default :retry is 0 — no reconnection, preserving the existing behaviour.

Note: In-flight requests at the time of the drop fail immediately. Reconnection re-establishes the transport only — it does not replay requests.


Tests

File What is covered
connection_process_test.exs retry_timeout/1 backoff values and cap; immediate reconnect on TCP close when server is still up; retry exhaustion notifies parent; failed reconnect schedules next attempt; successful reconnect resets counter
mint_test.exs :retry option propagated to ConnectionProcess state; default is 0

The existing test suite (224 tests) passes without modification.

@arctarus arctarus force-pushed the feat/mint-retry-on-connection-drop branch from 5820c2b to af91a16 Compare March 27, 2026 15:40
Implements retry logic in GRPC.Client.Adapters.Mint.ConnectionProcess so
that dropped HTTP/2 connections are transparently re-established without
requiring a new channel.

Changes:
- ConnectionProcess.State: add :scheme, :host, :port, :connect_opts, :retry,
  and :retry_attempt fields so the process can reconnect autonomously.
- ConnectionProcess.init/1: persists connection params in state; pops :retry
  from opts before forwarding to Mint.HTTP.connect/4.
- ConnectionProcess: add attempt_reconnect/1, handle_info(:reconnect),
  and retry_timeout/1 (exponential backoff, base 1.6, capped at 120s, with
  jitter). finish_all_pending_requests/1 triggers reconnection when retry > 0
  instead of immediately notifying the parent.
- Mint.connect/2: extracts :retry from adapter opts and passes it through to
  ConnectionProcess; documents the new option.
- Remove Stub.retry_timeout/1 — dead code that was never called and had a
  broken guard making it fail for curr >= 11. The correct implementation now
  lives in ConnectionProcess.

Tests:
- connection_process_test.exs: unit tests for retry_timeout/1, immediate
  reconnect on drop, exhaustion notification, scheduled retry on failure,
  and successful reconnect resetting the attempt counter.
- mint_test.exs: integration tests verifying :retry propagation to state
  and correct default of 0.

Docs:
- README.md / grpc_client/README.md: document the :retry option under the
  Mint adapter section with usage example and behaviour notes.

Made-with: Cursor
@arctarus arctarus force-pushed the feat/mint-retry-on-connection-drop branch from af91a16 to 6c6ccd8 Compare March 27, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant