Skip to content

fix: enforce max waiting on control plane also if response is slow#2656

Open
alepane21 wants to merge 3 commits intomainfrom
ale/eng-9201-router-router-should-work-if-controlplane-is-down
Open

fix: enforce max waiting on control plane also if response is slow#2656
alepane21 wants to merge 3 commits intomainfrom
ale/eng-9201-router-router-should-work-if-controlplane-is-down

Conversation

@alepane21
Copy link
Contributor

@alepane21 alepane21 commented Mar 17, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced control plane registration resilience by implementing a 15-second timeout on HTTP requests, ensuring bounded registration attempts and preventing potential hangs.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/docs-website.
  • I have read the Contributors Guide.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Walkthrough

Modified control plane registration to introduce a timeout-bounded HTTP client. Adjusted imports to include net/http and time, configured a new HTTP client with a 15-second timeout, and updated NodeServiceClient construction to use the new client instead of the retryablehttp client.

Changes

Cohort / File(s) Summary
HTTP Client Configuration
router/pkg/controlplane/selfregister/self_register.go
Added net/http and time imports; introduced httpClient with 15-second timeout; replaced retryablehttp client usage with the new timeout-bounded httpClient for NodeServiceClient initialization.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main change: introducing a 15-second HTTP client timeout to enforce maximum waiting on control plane communication, addressing slow responses.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.

Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present.

@github-actions
Copy link

github-actions bot commented Mar 17, 2026

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-34b58fb3ac4f15afd917c5d4d768466a6c56adf7-nonroot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
router/pkg/controlplane/selfregister/self_register.go (1)

65-71: Timeout may limit retry effectiveness when control plane is slow.

The 15-second http.Client.Timeout applies to the entire operation including all retries, since StandardClient() wraps retry logic in its RoundTripper. With RetryMax=3 (4 total attempts) and DefaultBackoff waits (~7s total), only ~8 seconds remain for actual HTTP round-trips.

If the control plane is slow (e.g., first request takes 10s before failing), retries may never execute. This is fine if the intent is a hard 15-second cap regardless of retries, but consider:

  1. Increasing the timeout (e.g., 30-45s) to allow retries when responses are slow, or
  2. Setting retryClient.HTTPClient.Timeout instead, which applies per-attempt rather than total

If the current behavior is intentional (hard 15s cap trumps retries), the code is correct as-is.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/pkg/controlplane/selfregister/self_register.go` around lines 65 - 71,
The 15s httpClient.Timeout set on the client returned by
retryClient.StandardClient() caps the entire multi-attempt operation (including
retries) and can prevent retries when the control plane is slow; to fix, either
increase the timeout (e.g., set httpClient.Timeout = 30*time.Second or
45*time.Second after calling retryClient.StandardClient()) if you want a larger
hard cap, or instead set the per-attempt timeout on the retry client itself by
configuring retryClient.HTTPClient.Timeout = 30*time.Second before calling
retryClient.StandardClient(); update the code around
retryClient.StandardClient(), httpClient.Timeout and where c.nodeServiceClient
is created to apply the chosen change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@router/pkg/controlplane/selfregister/self_register.go`:
- Around line 65-71: The 15s httpClient.Timeout set on the client returned by
retryClient.StandardClient() caps the entire multi-attempt operation (including
retries) and can prevent retries when the control plane is slow; to fix, either
increase the timeout (e.g., set httpClient.Timeout = 30*time.Second or
45*time.Second after calling retryClient.StandardClient()) if you want a larger
hard cap, or instead set the per-attempt timeout on the retry client itself by
configuring retryClient.HTTPClient.Timeout = 30*time.Second before calling
retryClient.StandardClient(); update the code around
retryClient.StandardClient(), httpClient.Timeout and where c.nodeServiceClient
is created to apply the chosen change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d4981a0b-c497-4125-8c0a-d12688f7a40b

📥 Commits

Reviewing files that changed from the base of the PR and between 02ce559 and 35aa64b.

📒 Files selected for processing (1)
  • router/pkg/controlplane/selfregister/self_register.go

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.89%. Comparing base (3426dd3) to head (1082f10).

Files with missing lines Patch % Lines
...ter/pkg/controlplane/selfregister/self_register.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2656       +/-   ##
===========================================
- Coverage   89.54%   62.89%   -26.66%     
===========================================
  Files          20      244      +224     
  Lines        4360    26134    +21774     
  Branches     1199        0     -1199     
===========================================
+ Hits         3904    16436    +12532     
- Misses        456     8357     +7901     
- Partials        0     1341     +1341     
Files with missing lines Coverage Δ
...ter/pkg/controlplane/selfregister/self_register.go 0.00% <0.00%> (ø)

... and 263 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants