Skip to content

fix: use numeric UID in agentex-ui Dockerfile to fix CreateContainerError#171

Merged
sayakmaity merged 1 commit intomainfrom
fix/revert-chainguard-dockerfile
Mar 18, 2026
Merged

fix: use numeric UID in agentex-ui Dockerfile to fix CreateContainerError#171
sayakmaity merged 1 commit intomainfrom
fix/revert-chainguard-dockerfile

Conversation

@sayakmaity
Copy link
Contributor

@sayakmaity sayakmaity commented Mar 18, 2026

Summary

The public Chainguard base image change (#170) uses USER nonroot, but the golden base image has the user named node (not nonroot) at UID 65532. This causes CreateContainerError: no users found on the dev cluster.

Switches to USER 65532 (numeric UID) which works with both base images.

This unblocks deployment of the SGPINF-1217 fix (#165).

Test plan

  • Image builds successfully
  • Pod starts without CreateContainerError on dev cluster

Greptile Summary

This PR fixes the CreateContainerError: no users found regression on the dev cluster by changing USER nonroot to USER 65532 (numeric UID) in agentex-ui/Dockerfile. Using a numeric UID avoids a /etc/passwd lookup, which is the correct approach for minimal/distroless-style images like Chainguard where named users may not be registered.

Important discrepancy: The PR title and description say this restores the golden ECR base image (022465994601.dkr.ecr.us-west-2.amazonaws.com/golden/chainguard/node:20-dev), but the FROM line is not changed — the image remains cgr.dev/chainguard/node:latest-dev. The only actual code change is the USER directive on line 53. This should be clarified to avoid misleading git history.

Key points:

  • The USER 65532 numeric-UID fix directly addresses the no users found error and is technically sound.
  • The base image (cgr.dev/chainguard/node:latest-dev) uses a floating latest-dev tag, so builds remain non-reproducible — this was already the case before and is not introduced by this PR.
  • If the golden ECR image is needed for reasons beyond the user setup (e.g., internal CA trust, private registry hardening), the FROM line still needs to be updated.

Confidence Score: 4/5

  • Safe to merge as a targeted fix for the pod-start error; the discrepancy between the PR description and the actual change (FROM line not updated) should be confirmed as intentional before landing.
  • The change is minimal (one line, USER nonrootUSER 65532) and directly resolves the described CreateContainerError: no users found. Numeric UIDs are the idiomatic fix for Chainguard distroless images. The only concern is that the PR description claims a base-image revert that did not actually happen, which could cause confusion. Once that intent is confirmed/clarified, the risk is very low.
  • agentex-ui/Dockerfile line 2 — the FROM image is still the public Chainguard image, not the golden ECR image described in the PR.

Important Files Changed

Filename Overview
agentex-ui/Dockerfile Single-line change switching USER nonroot to USER 65532 (numeric UID) to fix CreateContainerError: no users found; base image (public cgr.dev/chainguard/node:latest-dev) is unchanged despite PR description claiming a revert to the golden ECR image.

Sequence Diagram

sequenceDiagram
    participant Docker as Docker Build
    participant Image as cgr.dev/chainguard/node:latest-dev
    participant App as /app (Node.js)

    Docker->>Image: FROM cgr.dev/chainguard/node:latest-dev
    Docker->>App: USER root → apk add libvips-dev, python3, etc.
    Docker->>App: npm ci (all deps incl. dev)
    Docker->>App: npm run build
    Docker->>App: npm prune --omit=dev
    Docker->>App: chown -R 65532:65532 /app
    Note over Docker,App: PR #171 change: USER nonroot → USER 65532
    Docker->>App: USER 65532 (numeric UID — no /etc/passwd lookup)
    App-->>Docker: EXPOSE 3000, CMD ["npm", "start"]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: agentex-ui/Dockerfile
Line: 2

Comment:
**Base image still public Chainguard, not the golden ECR image**

The PR description states this revert restores the golden ECR base image (`022465994601.dkr.ecr.us-west-2.amazonaws.com/golden/chainguard/node:20-dev`), but the `FROM` line is unchanged and still points to the public image `cgr.dev/chainguard/node:latest-dev`.

The actual fix is only the `USER nonroot``USER 65532` change on line 53. That numeric-UID approach is the correct workaround for `CreateContainerError: no users found` in distroless/minimal images that lack a full `/etc/passwd`, and it will likely resolve the immediate pod-start failure.

However, if the golden ECR image provides additional hardening (internal CAs, registry access control, pre-vetted dependency pinning, etc.) that the team relies on, this PR does not restore that. Please confirm whether the intent is:

1. **Just fix the USER directive** (current state — acceptable) — update the PR title/description to avoid confusion in git history.
2. **Actually revert to the golden ECR image** — the `FROM` line needs to be changed to `022465994601.dkr.ecr.us-west-2.amazonaws.com/golden/chainguard/node:20-dev`.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix: use numeric UID..."

@sayakmaity sayakmaity requested a review from a team as a code owner March 18, 2026 14:54
Comment on lines 29 to 38
# Install dependencies with better reliability settings
RUN npm config set registry https://registry.npmjs.org/ && \
npm ci --verbose
npm ci --omit=dev --verbose

# Copy source code (node_modules and .next excluded by .dockerignore)
COPY ${SOURCE_DIR} .
COPY LICENSE /app/LICENSE

# Build the application (creates fresh .next directory)
ENV NODE_ENV=production
RUN npm run build
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dev dependencies omitted before build step — may break npm run build

The previous Dockerfile explicitly commented "Install all dependencies (including dev) needed for build" and ran npm ci --verbose (without --omit=dev) before setting NODE_ENV=production. The build step then had access to TypeScript, ESLint plugins, CSS processors, and any other build-time devDependencies.

This PR moves ENV NODE_ENV=production to line 20 (before the install step) and adds --omit=dev to npm ci. Both changes together mean devDependencies are never installed, yet RUN npm run build (line 38) still expects them. If any devDependency is consumed during the Next.js build (e.g. typescript, @types/*, Sass/CSS-in-JS processors, custom ESLint configs), the build will silently fail or error out.

The correct pattern for a multi-stage-style single-stage build is:

  1. Install all deps (omit NODE_ENV=production / --omit=dev at this step)
  2. Run the build
  3. Prune dev deps after the build
Suggested change
# Install dependencies with better reliability settings
RUN npm config set registry https://registry.npmjs.org/ && \
npm ci --verbose
npm ci --omit=dev --verbose
# Copy source code (node_modules and .next excluded by .dockerignore)
COPY ${SOURCE_DIR} .
COPY LICENSE /app/LICENSE
# Build the application (creates fresh .next directory)
ENV NODE_ENV=production
RUN npm run build
# Install all dependencies (including dev) needed for build
RUN npm config set registry https://registry.npmjs.org/ && \
npm ci --verbose
# Copy source code (node_modules and .next excluded by .dockerignore)
COPY ${SOURCE_DIR} .
COPY LICENSE /app/LICENSE
# Build the application (creates fresh .next directory)
RUN npm run build
# Remove dev dependencies after build
RUN npm prune --omit=dev

If the project has already confirmed that no devDependencies are consumed during next build (e.g. Next.js SWC handles all transpilation and all type/lint checks are CI-only), this can be left as-is — but that should be validated against the test plan item "Image builds successfully."

Prompt To Fix With AI
This is a comment left during a code review.
Path: agentex-ui/Dockerfile
Line: 29-38

Comment:
**Dev dependencies omitted before build step — may break `npm run build`**

The previous Dockerfile explicitly commented "Install all dependencies (including dev) needed for build" and ran `npm ci --verbose` (without `--omit=dev`) *before* setting `NODE_ENV=production`. The build step then had access to TypeScript, ESLint plugins, CSS processors, and any other build-time devDependencies.

This PR moves `ENV NODE_ENV=production` to line 20 (before the install step) and adds `--omit=dev` to `npm ci`. Both changes together mean devDependencies are **never installed**, yet `RUN npm run build` (line 38) still expects them. If any devDependency is consumed during the Next.js build (e.g. `typescript`, `@types/*`, Sass/CSS-in-JS processors, custom ESLint configs), the build will silently fail or error out.

The correct pattern for a multi-stage-style single-stage build is:
1. Install **all** deps (omit `NODE_ENV=production` / `--omit=dev` at this step)
2. Run the build
3. Prune dev deps after the build

```suggestion
# Install all dependencies (including dev) needed for build
RUN npm config set registry https://registry.npmjs.org/ && \
    npm ci --verbose

# Copy source code (node_modules and .next excluded by .dockerignore)
COPY ${SOURCE_DIR} .
COPY LICENSE /app/LICENSE

# Build the application (creates fresh .next directory)
RUN npm run build

# Remove dev dependencies after build
RUN npm prune --omit=dev
```

If the project has already confirmed that no devDependencies are consumed during `next build` (e.g. Next.js SWC handles all transpilation and all type/lint checks are CI-only), this can be left as-is — but that should be validated against the test plan item "Image builds successfully."

How can I resolve this? If you propose a fix, please make it concise.

The golden base image has user 'node' at UID 65532, while the public
Chainguard image uses 'nonroot' at the same UID. Using `USER nonroot`
causes "no users found" CreateContainerError on the golden image.

Switch to `USER 65532` which works with both base images.
@sayakmaity sayakmaity force-pushed the fix/revert-chainguard-dockerfile branch from 74721e3 to f2a90ef Compare March 18, 2026 15:05
@sayakmaity sayakmaity changed the title revert: restore agentex-ui Dockerfile to golden base image fix: use numeric UID in agentex-ui Dockerfile to fix CreateContainerError Mar 18, 2026
@sayakmaity sayakmaity enabled auto-merge (squash) March 18, 2026 16:06
@sayakmaity sayakmaity merged commit 9dd3ac3 into main Mar 18, 2026
12 checks passed
@sayakmaity sayakmaity deleted the fix/revert-chainguard-dockerfile branch March 18, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants