Skip to content

fix: pin Docker builds to workspace lockfile#166

Merged
scale-ballen merged 2 commits intomainfrom
fix/dockerfile-lockfile-pinning
Mar 17, 2026
Merged

fix: pin Docker builds to workspace lockfile#166
scale-ballen merged 2 commits intomainfrom
fix/dockerfile-lockfile-pinning

Conversation

@scale-ballen
Copy link
Contributor

@scale-ballen scale-ballen commented Mar 17, 2026

Summary

Fixes two Docker build failures in the release workflow:

  1. 401 Unauthorized pulling golden base image — The golden image migration (PR sec: migrate standard Dockerfiles to Chainguard golden base images #159) switched the base image from public Docker Hub to private ECR, but the release workflow was never updated to authenticate to ECR.

  2. Broken transitive dependency resolution — The Dockerfile didn't use the workspace lockfile, so uv sync resolved dependencies fresh from PyPI every build. This caused agentex-sdk to resolve to 0.9.4 (latest) which transitively pulls in claude-agent-sdk==0.1.49 — a broken release with only a macOS ARM64 wheel.

Root Cause #1: ECR Auth (build-agentex.yml)

The release workflow only authenticated to GHCR, not ECR. The Dockerfile's FROM 022465994601.dkr.ecr... requires ECR credentials.

Fix: Add OIDC auth + ECR login steps, matching the existing pattern in integration-tests.yml.

Root Cause #2: Unpinned Dependencies (Dockerfile)

agentex/pyproject.toml (dev group)
  └── agentex-sdk (unpinned)
        └── resolves to 0.9.4 (latest on PyPI, without lockfile)
              └── claude-agent-sdk>=0.1.0
                    └── resolves to 0.1.49
                          └── only has macosx_11_0_arm64 wheel → FAILS on Linux

The workspace root pyproject.toml pins agentex-sdk==0.4.18 and uv.lock locks all transitive deps, but the Dockerfile only copied the member pyproject.toml — no lockfile, fresh resolution every build.

Fix: Copy workspace root files + use --frozen --package agentex-backend.

Changes

File Change
.github/workflows/build-agentex.yml Add id-token: write permission for OIDC
.github/workflows/build-agentex.yml Add AWS OIDC auth + ECR login steps before Docker build
agentex/Dockerfile Copy workspace root pyproject.toml + uv.lock into build context
agentex/Dockerfile Add --frozen to all uv sync commands (base, dev, docs stages)
agentex/Dockerfile Add --package agentex-backend to target member deps, not root
agentex/Dockerfile Fix default SOURCE_DIR from public/agentexagentex to match workspace members = ["agentex"]

Test Evidence

All tests run locally against Docker images built without --build-arg overrides (using the corrected default SOURCE_DIR=agentex).

Build tests (all 3 Dockerfile stages)

  • base stage builds (--no-cache)
  • dev stage builds (--no-cache)
  • production stage builds (--no-cache, includes docs)

Production runtime tests

  • ✅ All 21 core dependencies import (fastapi, ddtrace, uvicorn, redis, sqlalchemy, temporalio, litellm, aiohttp, pymongo, httpx, docker, alembic, asyncpg, aiodocker, kubernetes_asyncio, websockets, json_log_formatter, datadog, opentelemetry, dotenv, multipart)
  • ✅ Console scripts present: /usr/bin/uvicorn, /usr/bin/ddtrace-run, /usr/bin/python3
  • ✅ FastAPI app module loads (src.api.app)
  • ✅ Docs built at /app/docs/site/index.html
  • ✅ Runs as nonroot (uid=65532)
  • agentex-sdk NOT installed (dev-only, correctly excluded by --no-dev)
  • claude-agent-sdk NOT installed

Dev runtime tests

  • agentex-sdk==0.4.18 (lockfile pin, NOT 0.9.4 from PyPI)
  • claude-agent-sdk NOT installed (0.4.18 doesn't depend on it)

Negative test (reproducing the original failure)

  • ✅ Confirmed: without lockfile, uv sync --group dev resolves agentex-sdk==0.9.4claude-agent-sdk==0.1.49 → fails on Linux with "no matching distribution"

ECR auth pattern verification

  • ✅ OIDC role (github-action-scale-agentex-ecr-read) matches integration-tests.yml
  • ✅ ECR registry (022465994601) matches golden image account
  • id-token: write permission added for OIDC federation

🤖 Generated with Claude Code

…tive deps

The Dockerfile was not copying the workspace root pyproject.toml or uv.lock
into the build context. Without the lockfile, `uv sync` resolved dependencies
fresh from PyPI on every build, causing agentex-sdk to resolve to 0.9.4
(latest) instead of the pinned 0.4.18. Version 0.9.4 introduced a transitive
dependency on claude-agent-sdk>=0.1.0, and the latest release (0.1.49) only
published a macOS ARM64 wheel — breaking all Linux Docker builds.

Changes:
- Copy workspace root pyproject.toml and uv.lock into Docker build context
- Add --frozen flag to all uv sync commands (base, dev, docs stages)
- Add --package agentex-backend to target member dependencies, not root

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scale-ballen scale-ballen requested a review from a team as a code owner March 17, 2026 19:14
The default ARG SOURCE_DIR=public/agentex placed the member pyproject.toml
at /app/public/agentex/ which doesn't match the workspace root's
members = ["agentex"] declaration. This would cause uv to fail resolving
agentex-backend as a workspace member. The public/agentex path also doesn't
exist in the repo — docker-compose already overrides to SOURCE_DIR=agentex.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@scale-ballen scale-ballen merged commit af5dc58 into main Mar 17, 2026
28 checks passed
@scale-ballen scale-ballen deleted the fix/dockerfile-lockfile-pinning branch March 17, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants