Skip to content

feat(pam tunnel): add --foreground, --background, --run flags and cross-process tunnel registry#1848

Open
msawczynk wants to merge 3 commits intoKeeper-Security:releasefrom
msawczynk:feat/pam-tunnel-foreground
Open

feat(pam tunnel): add --foreground, --background, --run flags and cross-process tunnel registry#1848
msawczynk wants to merge 3 commits intoKeeper-Security:releasefrom
msawczynk:feat/pam-tunnel-foreground

Conversation

@msawczynk
Copy link
Copy Markdown
Contributor

@msawczynk msawczynk commented Mar 6, 2026

Summary

Add --foreground / -fg, --pid-file, --run / -R, and --timeout flags to pam tunnel start that enable tunnels to be used in non-interactive contexts: shell scripts, systemd services, CI/CD pipelines, and headless automation -- without requiring an interactive keeper shell session.

  • --foreground: Blocks the Commander process after the tunnel connects, keeping it alive until SIGTERM/SIGINT/Ctrl+C. Now waits for the WebRTC connection to reach "connected" state before printing the status banner and writing the PID file.
  • --pid-file: Writes the process PID to a file for external signal-based management.
  • --run <COMMAND>: Starts the tunnel, waits for connection, executes the given shell command, stops the tunnel, and exits with the command's exit code. Ideal for single-script workflows.
  • --timeout <SECONDS>: Controls how long to wait for the tunnel to connect (default: 30s). Used with --foreground, --background, and --run.
  • --background / -bg: Launches a separate Commander process with --foreground, polls the file-based tunnel registry for readiness, then returns control to the caller. The tunnel continues running as an independent background process. Works on all platforms.
  • Batch mode fix: When --target-host/--target-port are missing in non-interactive mode, raises CommandError instead of calling input() (which hangs in batch/script contexts).
  • Cross-process tunnel visibility: File-based tunnel registry (<tempdir>/keeper-tunnel-sessions/) enables pam tunnel list to discover tunnels from any Commander session. pam tunnel stop <RECORD_UID> sends SIGTERM to the owning process. Stale entries (dead PIDs) are auto-cleaned.

Motivation

Currently, pam tunnel start works only inside an interactive keeper shell session because tunnels run as in-process background threads (spawned by start_rust_tunnel() via keeper_pam_webrtc_rs). When Commander is invoked in single-command mode (keeper "pam tunnel start <UID>"), PAMTunnelStartCommand.execute() returns after starting the tunnel, Commander exits, and all tunnel threads die with the process.

This is a common pain point for users automating infrastructure with Keeper PAM:

  • Headless servers (NUCs, embedded devices) that need persistent tunnels to databases or other resources
  • CI/CD pipelines that need ephemeral tunnels for deployment tasks (e.g., pg_dump through a tunnel)
  • systemd services that manage tunnel lifecycle with auto-restart
  • Bash/automation scripts that start a tunnel, run commands against it, then stop the tunnel

Changes

File: keepercommander/commands/tunnel_and_connections.py (1 file changed)

  1. New imports: signal, subprocess, threading (all stdlib); unregister_tunnel_session, wait_for_tunnel_connection (from tunnel_helpers)

  2. New arguments on PAMTunnelStartCommand.pam_cmd_parser:

    pam_cmd_parser.add_argument('--foreground', '-fg', ...)
    pam_cmd_parser.add_argument('--pid-file', ...)
    pam_cmd_parser.add_argument('--run', '-R', dest='run_command', ...)
    pam_cmd_parser.add_argument('--timeout', dest='connect_timeout', type=int, default=30, ...)
    pam_cmd_parser.add_argument('--background', '-bg', ...)
  3. --background logic (subprocess-based, before start_rust_tunnel()):

    • Builds a keeper command with --foreground and all user-supplied flags
    • Launches it via subprocess.Popen(start_new_session=True) as an independent process
    • Polls the file-based tunnel registry (<tempdir>/keeper-tunnel-sessions/) for readiness
    • Once the subprocess registers, prints connection info and returns
    • If the subprocess exits early, reports the error
    • Works on all platforms (Linux, macOS, Windows)
  4. --run logic (new branch, checked before --foreground):

    • Starts the tunnel, waits for WebRTC connection via wait_for_tunnel_connection(result, timeout=connect_timeout)
    • On connected: prints endpoint summary, executes command via subprocess.run(command, shell=True)
    • On command completion: stops tunnel via close_tube() + unregister_tunnel_session()
    • Exits with the command's exit code via sys.exit(proc.returncode)
    • On KeyboardInterrupt: exits with code 130
  5. --foreground connection readiness: Before printing the status banner and writing the PID file, calls wait_for_tunnel_connection(result, timeout=connect_timeout) to ensure the tunnel is actually usable.

  6. Interactive shell detection: When --foreground or --background is used inside keeper shell (interactive mode, batch_mode=False), the blocking logic is skipped and a message informs the user that tunnels already persist in the shell.

  7. File-based tunnel registry: New module-level functions (_register_tunnel, _unregister_tunnel, _list_registered_tunnels, _tunnel_registry_dir, _is_pid_alive) manage JSON metadata files in <tempdir>/keeper-tunnel-sessions/. Each foreground/background/run tunnel registers on connect and unregisters on cleanup.

  8. Enhanced PAMTunnelListCommand: Now reads both the in-process PyTubeRegistry and the file-based registry. Cross-process tunnels appear in the listing with their mode and PID. Stale entries (dead PIDs) are auto-cleaned.

  9. Enhanced PAMTunnelStopCommand: When a tunnel is not found in the in-process registry, falls back to the file registry and sends SIGTERM to the owning process. --all also stops cross-process tunnels.

  10. Batch mode fix for --target-host/--target-port: When params.batch_mode is True and these values are missing, raises CommandError instead of calling input(), which would hang in scripts.

No other files are modified. No new external dependencies are introduced. Only Python stdlib modules (json, signal, subprocess, threading, time) are used.

Usage

# Simplest single-command workflow
keeper pam tunnel start <UID> --port 5432 --run "pg_dump -h localhost -p 5432 mydb > backup.sql"

# Multiple commands: daemonize, work, stop
keeper pam tunnel start <UID> --port 5432 --background --pid-file /tmp/tunnel.pid
pg_dump -h localhost -p 5432 mydb > backup.sql
psql -h localhost -p 5432 -c "VACUUM ANALYZE"
kill -SIGTERM $(cat /tmp/tunnel.pid)

# With custom connection timeout
keeper pam tunnel start <UID> --port 5432 --run "my_command" --timeout 60

# Run tunnel in foreground -- stays alive until killed
keeper pam tunnel start <UID> --port 5432 --foreground

# With PID file for external management
keeper pam tunnel start <UID> --port 5432 --foreground --pid-file /tmp/tunnel.pid

# Background it in a script (manual pattern)
keeper pam tunnel start <UID> --port 5432 --foreground --pid-file /tmp/tunnel.pid &
sleep 5
pg_dump -h localhost -p 5432 mydb > backup.sql
kill -SIGTERM $(cat /tmp/tunnel.pid)

# As a systemd service
# [Service]
# ExecStart=/usr/bin/keeper pam tunnel start <UID> --port 5432 --foreground --pid-file /run/keeper-tunnel.pid
# ExecStop=/bin/kill -SIGTERM $MAINPID
# PIDFile=/run/keeper-tunnel.pid

# Inside keeper shell (--foreground is gracefully skipped):
# My Vault> pam tunnel start <UID> --port 5432 --foreground
# Note: --foreground is not needed inside the interactive shell.
# The tunnel is already running. Use 'pam tunnel list' / 'pam tunnel stop'.

Testing

  • Parser: --foreground accepted -- argument parser recognizes the flag
  • Parser: -fg shorthand accepted -- short form works
  • Parser: defaults to False -- without the flag, foreground is False
  • Parser: --pid-file accepted -- argument parser recognizes the flag
  • Parser: --pid-file defaults to None -- without the flag, pid_file is None
  • Parser: --run accepted -- argument parser recognizes the flag, stores string value
  • Parser: --run defaults to None -- without the flag, run_command is None
  • Parser: --run with --port -- --run and --port work together
  • Parser: --timeout accepted -- argument parser recognizes the flag, stores int value
  • Parser: --timeout defaults to 30 -- without the flag, connect_timeout is 30
  • Parser: --timeout custom value -- --timeout 120 sets connect_timeout to 120
  • Parser: --background accepted -- argument parser recognizes the flag
  • Parser: -bg shorthand accepted -- short form works
  • Parser: --background defaults to False -- without the flag, background is False
  • Parser: --background with --pid-file -- both flags work together
  • Parser: compatible with existing args -- --host, --port, --no-trickle-ice, --target-host, --target-port all work alongside new flags
  • Import: unregister_tunnel_session -- importable from tunnel_and_connections
  • Import: _stop_tunnel_process -- importable from tunnel_and_connections
  • Parser: mutual exclusivity -- --foreground, --background, --run all parse individually but runtime rejects combinations
  • Integration: tunnel persists -- tunnel stays alive in single-command mode when --foreground is set
  • Integration: --run executes and exits -- tunnel starts, command runs, tunnel stops, exit code propagated
  • Registry: tunnels registered on connect -- JSON metadata written to <tempdir>/keeper-tunnel-sessions/
  • Registry: tunnels unregistered on cleanup -- JSON file removed on shutdown
  • Registry: stale PID cleanup -- dead PIDs auto-cleaned on tunnel list
  • Integration: tunnel list shows cross-process tunnels -- tunnels visible from any Commander session
  • Integration: tunnel stop stops cross-process tunnels -- sends SIGTERM to owning process
  • Integration: --background daemonizes and returns -- tunnel starts, waits, daemonizes, returns prompt
  • Integration: --foreground waits for connection -- status banner only prints after WebRTC connection is established
  • Integration: SIGTERM stops cleanly -- kill -SIGTERM triggers close_tube() and process exits 0
  • Integration: Ctrl+C stops cleanly -- KeyboardInterrupt triggers the same clean shutdown
  • Integration: PID file written and cleaned up -- file created on start, removed on stop
  • Integration: interactive shell skip -- --foreground inside keeper shell prints info message, does not block
  • Integration: batch mode error -- missing --target-host/--target-port in batch mode raises CommandError
  • Backward compat: without flag -- behavior is identical to current master when new flags are not used
  • systemd: works as service -- Type=simple systemd unit starts, runs, and stops cleanly

Backward Compatibility

  • All new flags (--foreground, --pid-file, --run, --timeout, --background) are optional and default to False/None/None/30/False
  • When not set, the execute() method follows the exact same code path as before (the replaced pass was a no-op)
  • The batch mode error for missing --target-host/--target-port only triggers when params.batch_mode is True and the resource requires host/port supply -- interactive shell users still get the input() prompt
  • No changes to the argument parser's existing arguments
  • PAMTunnelListCommand and PAMTunnelStopCommand are enhanced to read the file-based registry but continue to work identically for in-process tunnels
  • No changes to PAMTunnelEditCommand, PAMTunnelDiagnoseCommand, or any other command
  • No new external dependencies; only json, signal, subprocess, threading, time from stdlib
  • No changes to the PAMTunnelCommand group command registration
  • --foreground, --background, and --run are mutually exclusive (enforced at runtime)

@craiglurey craiglurey requested a review from miroberts March 20, 2026 01:18
@craiglurey
Copy link
Copy Markdown
Contributor

needs to be rebased from release branch

@craiglurey craiglurey changed the base branch from master to release March 20, 2026 01:19
Copy link
Copy Markdown
Contributor

@miroberts miroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@msawczynk msawczynk force-pushed the feat/pam-tunnel-foreground branch from 485ea3e to 8800be8 Compare March 24, 2026 11:11
…urn, port safety

- Add Windows hard-termination warning to --background and --run banners (was only on --foreground)
- Use clean_stale=False in --background polling loop to avoid filesystem churn every 0.5s
- Guard port comparison in register_tunnel against non-int-coercible values
- Document normalize_bind_host limitation (0.0.0.0 vs 127.0.0.1 caught at OS level)
- Note thread-safety characteristic of _registry_dir_initialized flag
@msawczynk
Copy link
Copy Markdown
Contributor Author

@miroberts — Ready for final review. Here's a summary of all changes from the original PR and why.

Structural

  • Extracted tunnel_registry.py — the file-based registry (register_tunnel, unregister_tunnel, list_registered_tunnels, is_pid_alive, stop_tunnel_process) was a self-contained subsystem inlined in tunnel_and_connections.py. Moving it to its own module makes it independently testable and cuts review surface on the main file.
  • Added unit-tests/test_tunnel_registry.py — 15 unit tests covering registry CRUD, stale cleanup, duplicate port detection, PID liveness, parser defaults, mutual exclusivity, batch-mode input() guard, and the PARENT_GRACE_SECONDS constant. All tests use temp dirs; no Keeper account required.
  • Rebased onto upstream/release (per Craig's request). 3 commits: feature, tests, review fixes.

Behavioral fixes (vs original PR)

Change Why
sys.exit(proc.returncode)raise SystemExit(rc) with None1 fallback Prevents silent exit-0 when returncode is None; raise SystemExit is clearer intent than sys.exit
Windows stop_tunnel_process calls unregister_tunnel(pid) before os.kill On Windows, SIGTERM maps to TerminateProcess (hard kill) — the target can't clean up its own registry entry
Windows hard-termination warning in --foreground, --background, and --run banners Original only warned in --foreground; all three modes use the same stop mechanism
--background parent timeout = connect_timeout + PARENT_GRACE_SECONDS (10s constant) Original used connect_timeout + 20 (magic number); named constant is tunable and self-documenting
--background polling checks registry before poll() Avoids race: child could register and then exit before parent checks, which the old order would misreport as failure
--background polling uses clean_stale=False Avoids deleting stale files every 0.5s during the poll loop; stale cleanup still runs on register_tunnel and first dir access
register_tunnel checks for duplicate host:port (normalized) Catches accidental double-start before OS bind fails; localhost127.0.0.1 normalized; TOCTOU acceptable for v1
register_tunnel triggers stale cleanup on every call Dead entries are cleaned even if nobody runs tunnel list
Port comparison guards against non-int-coercible values int(entry.get('port')) could crash on corrupt registry files; now skips gracefully
shell=True in --run kept but documented Code comment + expanded --run help text explain intentional shell use (pipes, redirects, env vars)

Not changed (intentional)

  • Argument names and semantics (--foreground, --background, --run, --timeout, --pid-file)
  • WebRTC readiness wait before banner
  • Interactive shell detection (skip --foreground inside keeper shell)
  • In-process PyTubeRegistry — file registry augments it, doesn't replace it
  • Other command classes (PAMTunnelEditCommand, PAMTunnelDiagnoseCommand, etc.)

Known limitations (v1, documented)

  • normalize_bind_host doesn't handle 0.0.0.0 vs 127.0.0.1 (OS catches this at bind)
  • _registry_dir_initialized flag is not thread-safe (concurrent first-access double-cleans; harmless)
  • clean_stale=False returns only live entries but doesn't delete dead files (naming could be clearer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants