Skip to content

fix(clone): clone agentic project documents + resolve exported agentic tools#23

Merged
chandrasekharan-zipstack merged 4 commits into
mainfrom
fix/clone-agentic-docs-and-tool
Jun 23, 2026
Merged

fix(clone): clone agentic project documents + resolve exported agentic tools#23
chandrasekharan-zipstack merged 4 commits into
mainfrom
fix/clone-agentic-docs-and-tool

Conversation

@chandrasekharan-zipstack

@chandrasekharan-zipstack chandrasekharan-zipstack commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What

Completes org-to-org clone for cloud Agentic ("Agentic Prompt Studio") projects, which previously cloned only the project shell:

  1. Documents — each project's uploaded docs are cloned (download raw bytes from source → skip names already on target → upload to the target project). Honours file_strategy="skip" like the files/lookups phases.
  2. Verified data — the curated ground-truth rows are cloned, re-pointed to the cloned document by filename. Extracted/comparison data is regenerable and intentionally not cloned.
  3. Exported agentic tools — a workflow tool_instance.tool_id now resolves against agentic_studio_registry in addition to prompt_studio_registry.

Why

  • Agentic projects keep uploads in a separate store (agentic/documents/), distinct from Prompt Studio prompt-documents. The files phase only iterates the custom_tool remap, so agentic docs were silently dropped — a clone landed the project with zero documents.
  • agentic_verified_data is "ground truth manually verified by user" — curated input (the accuracy baseline), not regenerable output. Without it the target org can't measure extraction accuracy without a human re-verifying every doc.
  • A workflow tool_instance.tool_id is a registry id. Exported agentic projects register under agentic_studio_registry, but ToolInstancePhase resolved only via prompt_studio_registry, so an agentic-tool instance found no remap and was skipped (no registry remap for tool_id …). The dependent workflow (e.g. "Agentic tool API") then landed with no tool wired.

All three reproduced on a staging org→org run.

How

  • client.py: list_agentic_documents, download_agentic_document (raw binary, like download_lookup_file), upload_agentic_document (multipart to agentic/projects/{id}/documents/upload/ — the real upload route; the documents viewset upload action is a backend stub); list_agentic_verified_data, create_agentic_verified_data.
  • agentic_studio.py: _clone_documents / _clone_one_document and _clone_verified_data run after schemas, before registry republish; idempotent by filename; honour max_file_size and file_strategy; dry-run plan counts both.
  • tool_instance.py: tool_id resolves via prompt_studio_registry or agentic_studio_registry; corrected the misleading "custom tool unpublished" skip message.

Like Prompt Studio uploads, this clones the file + creates the document row; extraction/summary stays a UI step.

Can this PR break any existing features?

No. New client methods are additive. The doc/verified-data paths are gated to the cloud-only AgenticStudioPhase (probe-skipped on OSS) and add work that previously didn't happen at all. The tool_instance change only adds a fallback resolve when the primary lookup misses, so existing Prompt Studio tool instances are unaffected.

Database Migrations

  • None.

Env Config

  • None.

Relevant Docs

  • N/A

Related Issues or PRs

  • Extends the agentic-studio clone support in AgenticStudioPhase.

Dependencies Versions

  • None.

Notes on Testing

  • tests/clone/ green (192 passing). Added: agentic doc-clone (skip-existing, dry-run count, file_strategy=skip), verified-data clone (filename mapping, skip-existing, skip-when-doc-missing, dry-run count), and tool_instance resolve-via-agentic-registry.
  • Pre-existing ruff format/E501 drift in untouched files left alone; changed lines are lint + format clean.

Checklist

I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

…c tools

Two gaps surfaced cloning agentic ("Agentic Prompt Studio") projects:

1. Documents were dropped. Agentic projects keep their uploads in their own
   store (agentic/documents/), separate from Prompt Studio prompt-documents.
   The files phase only iterates the custom_tool remap, so agentic docs were
   never cloned. AgenticStudioPhase now clones them per project (download raw
   bytes from source, skip names already on target, upload to the target
   project), and counts them in the dry-run plan.

2. Exported agentic tools were skipped in workflows. A workflow tool_instance
   references a registry id; exported agentic projects register under
   agentic_studio_registry, but ToolInstancePhase resolved tool_id only via
   prompt_studio_registry. It now falls back to the agentic registry, so the
   "Agentic tool API" workflow lands with its tool wired.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown

Greptile Summary

This PR completes the org-to-org clone for Agentic Prompt Studio projects by adding document upload/download, ground-truth verified-data cloning, and a fallback agentic_studio_registry resolver for workflow tool instances. The three previously flagged issues have all been resolved.

  • Document cloning: _clone_documents / _clone_one_document download raw bytes from source and upload to target, correctly short-circuits on file_strategy=\"skip\" and max_file_size.
  • Verified-data cloning: _clone_verified_data re-points ground-truth rows to the cloned target document via filename; skips rows whose document is missing, and returns early under file_strategy=\"skip\".
  • Tool-instance resolution: ToolInstancePhase now falls back to agentic_studio_registry when the prompt_studio_registry lookup misses.

Confidence Score: 5/5

Safe to merge

All three previously flagged issues are resolved, new methods are additive, and the agentic-registry fallback is inside the existing lock.

No files require special attention.

Important Files Changed

Filename Overview
src/unstract/clone/client.py Adds five new client methods following the established request/error-handling pattern.
src/unstract/clone/phases/agentic_studio.py Adds document and verified-data clone methods with correct file_strategy/dry-run guards.
src/unstract/clone/phases/tool_instance.py Adds agentic_studio_registry fallback inside the existing lock — thread-safe and correct.
tests/clone/test_agentic_studio_phase.py Eight new test cases covering all critical edge cases.
tests/clone/test_tool_instance_phase.py Adds test verifying the agentic-registry fallback resolve path.

Reviews (4): Last reviewed commit: "fix(clone): honour skip strategy in _clo..." | Re-trigger Greptile

Comment thread src/unstract/clone/phases/agentic_studio.py
Agentic verified-data ("ground truth manually verified by user") is curated
input, not regenerable output, so it must be cloned. AgenticStudioPhase now
re-points each source verified-data row to the cloned target document by
filename and recreates it (skipping docs absent on target and rows already
present). Extracted/comparison data stays uncloned — both regenerate on a
re-run + re-verify.

Also honour file_strategy="skip" in the new document path, matching the files
and lookups phases: under skip, agentic documents are listed and counted as
skipped (operator re-uploads), not transferred.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS
Comment thread src/unstract/clone/phases/agentic_studio.py Outdated
…tegy

Verified data FKs a document; with file_strategy=skip no docs land on
target, so a dry-run must predict the rows as skipped rather than as
creates that the real run silently drops.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS
Comment thread src/unstract/clone/phases/agentic_studio.py
_plan_children already forecasts verified-data as skipped under
file_strategy=skip, but the runtime path lacked the matching guard: on a
re-run where documents reached the target by other means, it would create
verified rows the plan said it would skip. Add the early-return guard,
mirroring _clone_documents.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS
@chandrasekharan-zipstack chandrasekharan-zipstack merged commit d769942 into main Jun 23, 2026
3 checks passed
@chandrasekharan-zipstack chandrasekharan-zipstack deleted the fix/clone-agentic-docs-and-tool branch June 23, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants