fix: use public Chainguard base images instead of private ECR#170
Merged
RoxyFarhad merged 1 commit intomainfrom Mar 18, 2026
Merged
fix: use public Chainguard base images instead of private ECR#170RoxyFarhad merged 1 commit intomainfrom
RoxyFarhad merged 1 commit intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
scale-ballen
approved these changes
Mar 18, 2026
scale-ballen
added a commit
that referenced
this pull request
Mar 18, 2026
PR #170 switched to cgr.dev/chainguard/python which requires authentication. Since scale-agentex is a public open-source repo, keep python:3.12-slim-trixie (0 OS CVEs, no auth required). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 tasks
sayakmaity
added a commit
that referenced
this pull request
Mar 18, 2026
The public Chainguard base image change (#170) causes "no users found" CreateContainerError on the dev cluster. Revert to the golden ECR base image that was working.
2 tasks
sayakmaity
added a commit
that referenced
this pull request
Mar 18, 2026
…rror (#171) ## Summary The public Chainguard base image change (#170) uses `USER nonroot`, but the golden base image has the user named `node` (not `nonroot`) at UID 65532. This causes `CreateContainerError: no users found` on the dev cluster. Switches to `USER 65532` (numeric UID) which works with both base images. This unblocks deployment of the SGPINF-1217 fix (#165). ## Test plan - [ ] Image builds successfully - [ ] Pod starts without CreateContainerError on dev cluster <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR fixes the `CreateContainerError: no users found` regression on the dev cluster by changing `USER nonroot` to `USER 65532` (numeric UID) in `agentex-ui/Dockerfile`. Using a numeric UID avoids a `/etc/passwd` lookup, which is the correct approach for minimal/distroless-style images like Chainguard where named users may not be registered. **Important discrepancy:** The PR title and description say this restores the golden ECR base image (`022465994601.dkr.ecr.us-west-2.amazonaws.com/golden/chainguard/node:20-dev`), but the `FROM` line is not changed — the image remains `cgr.dev/chainguard/node:latest-dev`. The only actual code change is the `USER` directive on line 53. This should be clarified to avoid misleading git history. Key points: - The `USER 65532` numeric-UID fix directly addresses the `no users found` error and is technically sound. - The base image (`cgr.dev/chainguard/node:latest-dev`) uses a **floating `latest-dev` tag**, so builds remain non-reproducible — this was already the case before and is not introduced by this PR. - If the golden ECR image is needed for reasons beyond the user setup (e.g., internal CA trust, private registry hardening), the `FROM` line still needs to be updated. <details><summary><h3>Confidence Score: 4/5</h3></summary> - Safe to merge as a targeted fix for the pod-start error; the discrepancy between the PR description and the actual change (FROM line not updated) should be confirmed as intentional before landing. - The change is minimal (one line, `USER nonroot` → `USER 65532`) and directly resolves the described `CreateContainerError: no users found`. Numeric UIDs are the idiomatic fix for Chainguard distroless images. The only concern is that the PR description claims a base-image revert that did not actually happen, which could cause confusion. Once that intent is confirmed/clarified, the risk is very low. - agentex-ui/Dockerfile line 2 — the `FROM` image is still the public Chainguard image, not the golden ECR image described in the PR. </details> <h3>Important Files Changed</h3> | Filename | Overview | |----------|----------| | agentex-ui/Dockerfile | Single-line change switching `USER nonroot` to `USER 65532` (numeric UID) to fix `CreateContainerError: no users found`; base image (public `cgr.dev/chainguard/node:latest-dev`) is unchanged despite PR description claiming a revert to the golden ECR image. | </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant Docker as Docker Build participant Image as cgr.dev/chainguard/node:latest-dev participant App as /app (Node.js) Docker->>Image: FROM cgr.dev/chainguard/node:latest-dev Docker->>App: USER root → apk add libvips-dev, python3, etc. Docker->>App: npm ci (all deps incl. dev) Docker->>App: npm run build Docker->>App: npm prune --omit=dev Docker->>App: chown -R 65532:65532 /app Note over Docker,App: PR #171 change: USER nonroot → USER 65532 Docker->>App: USER 65532 (numeric UID — no /etc/passwd lookup) App-->>Docker: EXPOSE 3000, CMD ["npm", "start"] ``` </details> <details><summary>Prompt To Fix All With AI</summary> `````markdown This is a comment left during a code review. Path: agentex-ui/Dockerfile Line: 2 Comment: **Base image still public Chainguard, not the golden ECR image** The PR description states this revert restores the golden ECR base image (`022465994601.dkr.ecr.us-west-2.amazonaws.com/golden/chainguard/node:20-dev`), but the `FROM` line is unchanged and still points to the public image `cgr.dev/chainguard/node:latest-dev`. The actual fix is only the `USER nonroot` → `USER 65532` change on line 53. That numeric-UID approach is the correct workaround for `CreateContainerError: no users found` in distroless/minimal images that lack a full `/etc/passwd`, and it will likely resolve the immediate pod-start failure. However, if the golden ECR image provides additional hardening (internal CAs, registry access control, pre-vetted dependency pinning, etc.) that the team relies on, this PR does not restore that. Please confirm whether the intent is: 1. **Just fix the USER directive** (current state — acceptable) — update the PR title/description to avoid confusion in git history. 2. **Actually revert to the golden ECR image** — the `FROM` line needs to be changed to `022465994601.dkr.ecr.us-west-2.amazonaws.com/golden/chainguard/node:20-dev`. How can I resolve this? If you propose a fix, please make it concise. ````` </details> <sub>Last reviewed commit: ["fix: use numeric UID..."](https://github.com/scaleapi/scale-agentex/commit/f2a90eff90ec89f0e1a824fcbcd9608e5fb1fc6e)</sub> <!-- /greptile_comment -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Testing:
Greptile Summary
This PR removes the dependency on private ECR golden base images by switching both the Python backend (
agentex/Dockerfile) and the Node.js frontend (agentex-ui/Dockerfile) to use publicly available Chainguard images pulled directly fromcgr.dev. The corresponding AWS OIDC authentication and ECR login steps are removed from the integration-test workflow. As a secondary change, the Python Dockerfile is refactored to use a standard/opt/venvvirtual environment instead of installing packages directly into the system Python prefix, which simplifies the multi-stageCOPYlogic considerably. The UI Dockerfile also fixes a pre-existing bug whereNODE_ENV=productionwas set beforenpm ci, inadvertently excluding dev dependencies needed for the Next.js build.Key changes:
cgr.dev/chainguardinstead of022465994601.dkr.ecr.us-west-2.amazonaws.comid-token: write) permissions and the ECR login steps; job-level permissions block is dropped and the job correctly inherits workflow-levelcontents: read+packages: read/opt/venvinstead of the system Python, simplifying the production-stageCOPYto a single directoryNODE_ENV=production-before-npm ciorderinglatest-devtags (previouslypython:3.12-devandnode:20-dev), which introduces non-determinism and risks silent major-version upgrades on future buildsConfidence Score: 3/5
latest-devtags introduces build non-determinism that could silently break things on future runs.cgr.dev/chainguard/python:latest-devandcgr.dev/chainguard/node:latest-devwithout version pins. Chainguard'slatesttag tracks the newest stable release and can advance major versions (e.g., Python 3.12 → 3.13, Node 20 → 22) automatically. The original ECR images were pinned topython:3.12andnode:20, so this is a regression in reproducibility. Additionally, usinglatest-devindependently in both thebaseandproductionstages of the Python Dockerfile creates a subtle risk: if Chainguard pushes an update between the two pulls within a single build, the build-time and run-time Python versions could differ.agentex/Dockerfileandagentex-ui/Dockerfileuse unpinnedlatest-devimage tags and should be revisited to restore version pins.Important Files Changed
latest-devtags, which introduces non-determinism and the risk of silent major-version upgrades.latest-devtag.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD subgraph Before A1[Private ECR\n022465994601.dkr.ecr.us-west-2.amazonaws.com] -->|AWS OIDC + ECR Login| B1[python:3.12-dev / node:20-dev] B1 --> C1[Build image] end subgraph After A2[Public Chainguard Registry\ncgr.dev/chainguard] -->|No auth required| B2[python:latest-dev / node:latest-dev] B2 --> C2[Build image] end subgraph Workflow Change W1[run-integration-tests job] -->|Removed| W2[AWS credentials step] W1 -->|Removed| W3[ECR login step] W1 -->|Removed| W4[Job-level permissions block] W1 -->|Inherits from workflow level| W5[contents: read\npackages: read] endPrompt To Fix All With AI
Last reviewed commit: "fix: use public Chai..."