Skip to content

fix(tunnel): add timeout to keepAliveLoop to detect dead connections after sleep/wake#581

Open
ncodespot wants to merge 2 commits intojpillora:masterfrom
ncodespot:fix/keepalive-timeout-on-sleep-wake
Open

fix(tunnel): add timeout to keepAliveLoop to detect dead connections after sleep/wake#581
ncodespot wants to merge 2 commits intojpillora:masterfrom
ncodespot:fix/keepalive-timeout-on-sleep-wake

Conversation

@ncodespot
Copy link

Problem

After OS sleep/wake, port forwarding becomes unresponsive even with --keepalive set. Root cause: SendRequest("ping") in keepAliveLoop blocks indefinitely on a dead TCP connection. When the system wakes from sleep, the TCP connection is stale but no RST/FIN has been received, so the OS has not yet detected failure. Both the keepalive ping and concurrent OpenChannel calls block until TCP retransmit timeout fires (potentially 15+ minutes).

Fix

Run SendRequest in a goroutine and race it against time.After(KeepAlive). If no pong is received within the keepalive interval, treat it as a dead connection and call sshConn.Close(). This also unblocks any goroutines stuck in OpenChannel on the same connection, triggering reconnection.

With --keepalive 15s, worst-case detection is now ~30s (15s sleep + 15s ping timeout) instead of indefinite.

Tests

  • Unit test (share/tunnel/tunnel_keepalive_test.go): mock ssh.Conn whose SendRequest blocks forever; asserts Close() is called within 2×keepalive.
  • E2E test (test/e2e/keepalive_test.go): freezable TCP proxy between client and server simulates sleep/wake (silent packet drop, no RST); verifies tunnel recovers automatically.

Your Name and others added 2 commits March 10, 2026 10:37
…after sleep/wake

SendRequest blocks indefinitely on a dead TCP connection (e.g. after OS
sleep/wake where no RST/FIN is received). Run it in a goroutine and use
select with time.After(KeepAlive) as a deadline. On timeout, close the
SSH connection so that blocked OpenChannel calls are also unblocked and
the client can reconnect.

Also add unit tests for keepAliveLoop and an E2E test with a freezable
TCP proxy that simulates the sleep/wake scenario end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant