feat(lib): capture client-attested build provenance#454
feat(lib): capture client-attested build provenance#454max-parke-scale wants to merge 3 commits into
Conversation
Add agentex.lib.utils.build_provenance — the single producer of source identity for agent builds (git coordinates + a deterministic content hash of the build context). prepare_cloud_build_context now writes build-info.json into the staged context (populates runtime registration_metadata with no server change) and exposes provenance on CloudBuildContext so the upload can send source_* fields. Archive member order is now deterministic via a sorted enumeration shared with the hash. The hash is computed only when there is no clean commit to identify the build (dirty tree or non-git context). First of three surfaces for AGX1-418 (Phase 1, client-attested); the SGP build-record columns and the sgpctl/Gitea uploaders follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address Greptile review on the build-provenance capture util: - Always compute working_tree_hash (drop the "skip on clean commit" path). A `git status` clean tree can still contain .gitignore'd-but-not- .dockerignore'd files the commit can't reproduce; an always-present content hash identifies the exact shipped bytes and closes that gap. - Guard the hash (_safe_working_tree_hash) so a permission error or filesystem race degrades to None instead of aborting the build — the module contract is that capture never raises into a build. - Record dirtiness as a first-class `dirty` flag (surfaced as `source_dirty` / `dirty`) rather than overloading hash-presence, matching Go's vcs.modified and Nix's dirtyRev. None outside a git work tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed both Greptile findings in cf9994d:
Also, per design discussion: dirtiness is now a first-class 🧑💻🤖 — posted via Claude Code |
| context_root=build_context_root, | ||
| content_root=staged_root, | ||
| ) | ||
| (staged_root / _BUILD_INFO_FILENAME).write_text(json.dumps(provenance.build_info(), indent=2, sort_keys=True)) |
There was a problem hiding this comment.
build-info.json is added to the archive root, but the generated Dockerfiles only copy the project subdirectory contents into the image, such as COPY {{ project_path_from_build_root }}/project /app/{{ project_path_from_build_root }}/project. FastACP.locate_build_info_path() then looks next to the importing project/acp.py. For the default, temporal, and sync templates, the file can be present in the tarball but absent from /app/<agent>/project/build-info.json, so runtime registration sends no provenance metadata. Please stage the file under the path that is copied and read at runtime, or update the Dockerfiles/runtime lookup to handle the root-level file.
Artifacts
Repro: focused build context mismatch script
- Contains supporting evidence from the run (text/x-python; charset=utf-8).
Repro: script output showing root-only build-info.json and missing project build-info path
- Keeps the command output available without making the summary code-heavy.
Ran code and verified through T-Rex
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agentex/lib/cli/handlers/agent_handlers.py
Line: 279
Comment:
**Build info not copied**
`build-info.json` is added to the archive root, but the generated Dockerfiles only copy the project subdirectory contents into the image, such as `COPY {{ project_path_from_build_root }}/project /app/{{ project_path_from_build_root }}/project`. `FastACP.locate_build_info_path()` then looks next to the importing `project/acp.py`. For the default, temporal, and sync templates, the file can be present in the tarball but absent from `/app/<agent>/project/build-info.json`, so runtime registration sends no provenance metadata. Please stage the file under the path that is copied and read at runtime, or update the Dockerfiles/runtime lookup to handle the root-level file.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — fixed in 923a110 by removing the build-info.json write entirely. It was dead-on-arrival as you found (archive root, not COPYd into the image / not read by locate_build_info_path()), and it's redundant anyway: AgentexCloudDeploy.build_id → AgentexCloudBuild lets a deployment's source provenance derive from the build record (the source_* columns this work adds) over the FK — no need to denormalize onto registration_metadata. The runtime sink can be revived (correctly placed) if a real consumer for deployment-history provenance shows up.
🧑💻🤖 — posted via Claude Code
Greptile (T-Rex repro) showed build-info.json was written to the archive root, which the templates' Dockerfiles don't COPY and the runtime locate_build_info_path() doesn't read — so it never reached the image and the registration_metadata sink stayed empty. Beyond the placement bug, the sink is redundant: AgentexCloudDeploy.build_id is an FK to AgentexCloudBuild, so a deployment's source provenance derives from the build record (the source_* columns this work adds, Surface C) over that join — the same Build->Deploy edge lineage already traverses. No need to denormalize provenance onto registration_metadata/DeploymentHistory (which has had no producer since its read path landed 2025-09, so its git fields have never been populated). #454 now ships only the shared capture util (agentex.lib.build_provenance) plus a deterministic build-archive ordering. Provenance is delivered via the build-record sink; the runtime sink can be revived (correctly placed) if a real consumer for deployment-history provenance ever appears. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Adds
agentex.lib.utils.build_provenance— the shared capture util for client-attested build provenance: git coordinates (repo/commit/ref/subpath), a deterministicworking_tree_hashover the build inputs (not the tarball), adirtyflag (Govcs.modified/ NixdirtyRevshape), andnormalize_remote. Capture is best-effort and never raises into a build. Also makes the build archive’s member order deterministic via a sorted enumeration shared with the hash.First of three surfaces for AGX1-418 (Phase 1, client-attested). Provenance is delivered via the build-record sink —
source_*columns onPOST /v5/builds(Surface C, scaleapi) consumed by the sgpctl + CI uploaders (Surface B, scaleapi/sgp). This PR lands the util + archive determinism whereagentex.liblives; the uploaders/columns follow.Scope notes
build-info.json/ runtime sink. An earlier revision wrotebuild-info.jsoninto the build context for theregister_agent()→registration_metadatapath. Greptile (T-Rex) correctly flagged it as dead-on-arrival (written to the archive root, which the templates’ Dockerfiles don’t COPY andlocate_build_info_path()doesn’t read). It’s also redundant:AgentexCloudDeploy.build_idis an FK toAgentexCloudBuild, so a deployment’s source provenance derives from the build record over that join — the same Build→Deploy edge lineage already traverses. Dropped; can be revived (correctly placed) if a real consumer for deployment-history provenance ever appears.Identity model
working_tree_hashis always computed (content identity);commit/ref/repoanchor it to source when in a git work tree;dirtyrecords uncommitted changes (Noneoutside git).Tests
20 provenance unit tests (clean/dirty/untracked/detached-HEAD/no-remote/non-git/monorepo-subpath, hash determinism + one-byte/added/exec-bit/symlink sensitivity, and a never-raises-on-hash-failure guard).
ruff/pyrightclean; fulllibsuite green.🧑💻🤖 — posted via Claude Code
Greptile Summary
This PR adds client-side build provenance for agent build contexts. The main changes are:
Confidence Score: 4/5
The main provenance utility work is well-scoped, but the cloud build packaging path currently omits the generated build metadata needed for the feature to function.
The review focused on the changed provenance and packaging flow, and runtime evidence confirmed that the returned build context archive can be produced without build-info.json or exposed provenance metadata.
src/agentex/lib/cli/handlers/agent_handlers.py
What T-Rex did
Comments Outside Diff (1)
General comment
Reviews (3): Last reviewed commit: "refactor(lib): drop the build-info.json ..." | Re-trigger Greptile