-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add live-update support #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
0a55b02
feat(bot): add update_binary instruction handler for live self-update
Embers-of-the-Fire 8f06ea8
feat(service): add update_binary instruction and API endpoints
Embers-of-the-Fire 7cd2f04
feat(workstation): add binary update management UI
Embers-of-the-Fire 252b3a1
docs: add live-update feature documentation
Embers-of-the-Fire 9c6ba39
fix(bot): redact artifact URL query params from info log
Embers-of-the-Fire 620c22e
fix(bot): add request timeout and context cancellation to binary down…
Embers-of-the-Fire c67657d
fix(service): properly aggregate per-robot results in update_binary_all
Embers-of-the-Fire 5b03f7b
fix(workstation): include server error body in client error messages
Embers-of-the-Fire fa4bb73
docs: align update_binary_all response example with service implement…
Embers-of-the-Fire 3ccef8f
fix(bot): replace fixed 1s sleep with write-completion signal before …
Embers-of-the-Fire 1ac6088
fix(service): treat unknown or malformed bot responses as failures
Embers-of-the-Fire 04403fd
docs: fix batch update status example
Embers-of-the-Fire fce49fe
fix: gate bot restart on successful ws writes
Embers-of-the-Fire afa09a8
fix: bound update binary api waits
Embers-of-the-Fire d42ceef
fix(live-update): chore fix job
Embers-of-the-Fire 67c36cf
perf(service): run update_binary_all robot requests concurrently
Embers-of-the-Fire File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,159 @@ | ||
| # Live-Update Feature | ||
|
|
||
| Remote binary update mechanism for edge bot devices. The service sends an update instruction with an artifact URL to a connected bot over WebSocket. The bot downloads, validates, and atomically replaces its own executable, then restarts in-place. | ||
|
|
||
| ## End-to-End Flow | ||
|
|
||
| ``` | ||
| Frontend Service Bot | ||
| │ │ │ | ||
| │ POST /action/ │ │ | ||
| │ update_binary │ │ | ||
| │────────────────────────>│ │ | ||
| │ │ WS: update_binary │ | ||
| │ │ instruction │ | ||
| │ │───────────────────────>│ | ||
| │ │ │ 1. Resolve exec path | ||
| │ │ │ 2. Download binary | ||
| │ │ │ 3. Validate ELF magic | ||
| │ │ │ 4. chmod 0755 | ||
| │ │ │ 5. Atomic rename | ||
| │ │ │ | ||
| │ │ WS: response │ | ||
| │ │<───────────────────────│ | ||
| │ 200 OK │ │ | ||
| │<────────────────────────│ │ | ||
| │ │ │ 6. Wait for WS flush | ||
| │ │ │ 7. syscall.Exec | ||
| │ │ │ (process restarts) | ||
| │ │ │ | ||
| │ │ WS: reconnect │ | ||
| │ │<───────────────────────│ | ||
| ``` | ||
|
|
||
| ## Wire Protocol | ||
|
|
||
| ### Instruction (Service → Bot) | ||
|
|
||
| ```json | ||
| { | ||
| "instruction": "update_binary", | ||
| "message": { | ||
| "artifact_url": "https://artifacts.example.com/bot/v1.2.3/bot-linux-amd64" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Response (Bot → Service) | ||
|
|
||
| **Success:** | ||
|
|
||
| ```json | ||
| { | ||
| "status": "post_update", | ||
| "message": "success, restarting..." | ||
| } | ||
| ``` | ||
|
|
||
| **Error:** | ||
|
|
||
| ```json | ||
| { | ||
| "status": "error", | ||
| "message": "downloaded file is not a valid ELF binary" | ||
| } | ||
| ``` | ||
|
|
||
| ## REST API Endpoints | ||
|
|
||
| ### `POST /action/update_binary` | ||
|
|
||
| Update a single bot's binary. | ||
|
|
||
| **Request:** | ||
|
|
||
| ```json | ||
| { | ||
| "robot_id": "550e8400-e29b-41d4-a716-446655440000", | ||
| "artifact_url": "https://artifacts.example.com/bot/v1.2.3/bot-linux-amd64" | ||
| } | ||
| ``` | ||
|
|
||
| **Response:** | ||
|
|
||
| ```json | ||
| { | ||
| "status": "post_update", | ||
| "message": "success, restarting..." | ||
| } | ||
| ``` | ||
|
|
||
| If the bot reports an instruction failure or the operation times out, the | ||
| endpoint still responds with HTTP 200 and a business error payload: | ||
|
|
||
| ```json | ||
| { | ||
| "status": "error", | ||
| "message": "instruction timed out after 60 seconds" | ||
| } | ||
| ``` | ||
|
|
||
| ### `POST /action/update_binary_all` | ||
|
|
||
| Update all connected bots. Returns per-robot results with individual status and message fields. | ||
|
|
||
| **Request:** | ||
|
|
||
| ```json | ||
| { | ||
| "artifact_url": "https://artifacts.example.com/bot/v1.2.3/bot-linux-amd64" | ||
| } | ||
| ``` | ||
|
|
||
| **Response:** | ||
|
|
||
| ```json | ||
| { | ||
| "status": "partial_failure", | ||
| "results": [ | ||
| { | ||
| "robot_id": "550e8400-e29b-41d4-a716-446655440000", | ||
| "status": "post_update", | ||
| "message": "success, restarting..." | ||
| }, | ||
| { | ||
| "robot_id": "660e8400-e29b-41d4-a716-446655440001", | ||
| "status": "error", | ||
| "message": "downloaded file is not a valid ELF binary" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| The overall `status` is `"ok"` when every bot succeeds and `"partial_failure"` when any bot reports an error or returns an unrecognised response. | ||
|
|
||
| ## Safety Guarantees | ||
|
|
||
| | Mechanism | Purpose | | ||
| | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | | ||
| | **Same-directory temp file** | Ensures temp file and target are on the same filesystem, which is required for `os.Rename` to be atomic | | ||
| | **ELF magic validation** | Checks first 4 bytes (`\x7fELF`) to prevent replacing the binary with an HTML error page or other invalid content | | ||
| | **Atomic rename** | `os.Rename` on the same filesystem is an atomic operation at the VFS level — the old binary is fully replaced in a single syscall | | ||
| | **Temp file cleanup** | On any error path, the temp file is removed before returning | | ||
|
|
||
| ## Restart Semantics | ||
|
|
||
| The bot uses `syscall.Exec(execPath, os.Args, os.Environ())` to restart: | ||
|
|
||
| - **In-place replacement**: The current process image is replaced with the new binary. The PID remains the same. | ||
| - **Write-completion gated**: A goroutine waits for the send-done signal from the eventloop (indicating the WebSocket response has been flushed) before calling `syscall.Exec`, instead of relying on a fixed delay. | ||
| - **Re-initialization**: The new binary runs `main()` from scratch, re-authenticates with the service, and re-establishes the WebSocket connection via the existing retry loop. | ||
| - **No rollback (v1)**: If the new binary fails to start, the bot stays down. Rollback is a future enhancement. | ||
|
|
||
| ## Batch Update Behavior | ||
|
|
||
| When using `update_binary_all`: | ||
|
|
||
| - The instruction is sent to each connected bot sequentially. | ||
| - Per-bot results are collected and returned in the response, including individual status and error messages. | ||
| - Bot restarts will drop the WebSocket connection, which is expected — the bot reconnects automatically after restart. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,159 @@ | ||
| package instructions | ||
|
|
||
| import ( | ||
| "context" | ||
| "fmt" | ||
| "io" | ||
| "net/http" | ||
| "net/url" | ||
| "os" | ||
| "path/filepath" | ||
| "syscall" | ||
| "time" | ||
|
|
||
| "github.com/Alliance-Algorithm/rmcs-actions/packages/bot/eventloop/share" | ||
| "github.com/Alliance-Algorithm/rmcs-actions/packages/bot/lib" | ||
| "github.com/Alliance-Algorithm/rmcs-actions/packages/bot/logger" | ||
| "go.uber.org/zap" | ||
| ) | ||
|
|
||
| const InstructionUpdateBinary = "update_binary" | ||
|
|
||
| // UpdateBinaryRequest is the request payload sent from the service. | ||
| type UpdateBinaryRequest struct { | ||
| ArtifactUrl string `json:"artifact_url"` | ||
| } | ||
|
|
||
| // UpdateBinaryResponse is the response payload sent back to the service. | ||
| type UpdateBinaryResponse struct { | ||
| Status string `json:"status"` | ||
| Message string `json:"message"` | ||
| } | ||
|
|
||
| // UpdateBinaryHandler registers the update_binary instruction using the | ||
| // ResponseAction pattern. | ||
| var UpdateBinaryHandler = InstructionHandler{ | ||
| Instruction: InstructionUpdateBinary, | ||
| Action: share.WrapResponseAction(UpdateBinaryAction), | ||
| } | ||
|
|
||
| // elfMagic is the first 4 bytes of any valid ELF binary. | ||
| var elfMagic = []byte{0x7f, 'E', 'L', 'F'} | ||
|
|
||
| // UpdateBinaryAction downloads a new binary from the given artifact URL, | ||
| // validates it as an ELF executable, atomically replaces the current | ||
| // executable, and schedules a restart via syscall.Exec. | ||
| // sanitizeURL returns a host/path summary with query parameters stripped to | ||
| // avoid leaking presigned URL credentials into logs. | ||
| func sanitizeURL(raw string) string { | ||
| u, err := url.Parse(raw) | ||
| if err != nil { | ||
| return "<invalid-url>" | ||
| } | ||
| return u.Host + u.Path | ||
| } | ||
|
|
||
| func UpdateBinaryAction(ctx context.Context, req UpdateBinaryRequest) UpdateBinaryResponse { | ||
| logger.Logger().Info("UpdateBinaryAction called", zap.String("artifact_url", sanitizeURL(req.ArtifactUrl))) | ||
|
|
||
| execPath, err := os.Executable() | ||
| if err != nil { | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to get executable path: %v", err)} | ||
| } | ||
| execPath, err = filepath.EvalSymlinks(execPath) | ||
| if err != nil { | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to resolve symlinks: %v", err)} | ||
| } | ||
|
|
||
| execDir := filepath.Dir(execPath) | ||
|
|
||
| // Create temp file in the same directory to ensure same-filesystem for | ||
| // atomic rename. | ||
| tmpFile, err := os.CreateTemp(execDir, ".update_binary_*") | ||
| if err != nil { | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to create temp file: %v", err)} | ||
| } | ||
| tmpPath := tmpFile.Name() | ||
|
|
||
| // Cleanup helper — removes the temp file on any error path. | ||
| cleanup := func() { | ||
| tmpFile.Close() | ||
| os.Remove(tmpPath) | ||
| } | ||
|
|
||
| // Download the binary. | ||
| httpClient := &http.Client{Timeout: 30 * time.Second} | ||
| httpReq, err := http.NewRequestWithContext(ctx, http.MethodGet, req.ArtifactUrl, nil) | ||
| if err != nil { | ||
| cleanup() | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to create request: %v", err)} | ||
| } | ||
| resp, err := httpClient.Do(httpReq) | ||
| if err != nil { | ||
| cleanup() | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to download binary: %v", err)} | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| cleanup() | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("download returned status %d", resp.StatusCode)} | ||
| } | ||
|
|
||
| _, err = io.Copy(tmpFile, resp.Body) | ||
| if err != nil { | ||
| cleanup() | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to write binary: %v", err)} | ||
| } | ||
| tmpFile.Close() | ||
|
|
||
| // Validate ELF magic bytes. | ||
| header := make([]byte, 4) | ||
| f, err := os.Open(tmpPath) | ||
| if err != nil { | ||
| os.Remove(tmpPath) | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to open temp file for validation: %v", err)} | ||
| } | ||
| _, err = io.ReadFull(f, header) | ||
| f.Close() | ||
| if err != nil { | ||
| os.Remove(tmpPath) | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to read header: %v", err)} | ||
| } | ||
| for i := 0; i < 4; i++ { | ||
| if header[i] != elfMagic[i] { | ||
| os.Remove(tmpPath) | ||
| return UpdateBinaryResponse{Status: "error", Message: "downloaded file is not a valid ELF binary"} | ||
| } | ||
| } | ||
|
|
||
| // Set executable permissions. | ||
| if err := os.Chmod(tmpPath, 0755); err != nil { | ||
| os.Remove(tmpPath) | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to chmod: %v", err)} | ||
| } | ||
|
|
||
| // Atomic replace via same-filesystem rename. | ||
| if err := os.Rename(tmpPath, execPath); err != nil { | ||
| os.Remove(tmpPath) | ||
| return UpdateBinaryResponse{Status: "error", Message: fmt.Sprintf("failed to replace binary: %v", err)} | ||
| } | ||
|
|
||
| logger.Logger().Info("Binary replaced successfully, scheduling restart", zap.String("path", execPath)) | ||
|
|
||
| // Schedule restart after the WebSocket response has been flushed. | ||
| // WsSendDoneCtxKey carries a channel that is closed by the eventloop | ||
| // send goroutine once wsjson.Write completes for this response. | ||
| done, _ := ctx.Value(lib.WsSendDoneCtxKey{}).(chan struct{}) | ||
| go func() { | ||
| if done != nil { | ||
| <-done | ||
| } | ||
| logger.Logger().Info("Restarting via syscall.Exec", zap.String("path", execPath)) | ||
| if err := syscall.Exec(execPath, os.Args, os.Environ()); err != nil { | ||
| logger.Logger().Error("Failed to exec new binary", zap.Error(err)) | ||
| } | ||
| }() | ||
|
|
||
| return UpdateBinaryResponse{Status: "post_update", Message: "success, restarting..."} | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.