Skip to content

feat(core): parse Promptfoo llm-rubric results#1664

Merged
christso merged 1 commit into
mainfrom
grading-llm-rubric-parsing
Jul 5, 2026
Merged

feat(core): parse Promptfoo llm-rubric results#1664
christso merged 1 commit into
mainfrom
grading-llm-rubric-parsing

Conversation

@christso

@christso christso commented Jul 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Accept Promptfoo-style llm-rubric judge responses shaped as { reason, pass, score } and normalize them into AgentV grader scores.
  • Map optional Promptfoo-style checks[] into current internal assertion entries while preserving normalized pass/score/reason in details for the downstream artifact contract work.
  • Let llm-rubric custom prompts receive the authored value through {{ rubric }} and use a Promptfoo-compatible output schema for free-form llm-rubric judging.

Validation

  • bun test packages/core/test/evaluation/graders.test.ts
  • bunx biome check packages/core/src/evaluation/graders/llm-grader.ts packages/core/src/evaluation/registry/builtin-graders.ts packages/core/test/evaluation/graders.test.ts
  • bun --filter @agentv/core typecheck
  • bun --filter @agentv/core build
  • bun run build (passed before the final prompt-contract adjustment; final core build passed after)

Live Dogfood

  • Command: LOCAL_OPENAI_PROXY_BASE_URL=http://127.0.0.1:10531/v1 LOCAL_OPENAI_PROXY_API_KEY=dummy-local-key LOCAL_OPENAI_PROXY_MODEL=gpt-5.4-mini bun apps/cli/src/cli.ts eval run .agentv/results/av-kfik-28-3-dogfood/eval.yaml --targets .agentv/results/av-kfik-28-3-dogfood/targets.yaml --target local-openai --grader-target local-openai --workers 1 --threshold 0.8 --output .agentv/results/av-kfik-28-3-dogfood-final
  • Result: PASS, 1/1 scored >= 80%, mean 100%.
  • Private evidence branch: EntityProcess/agentv-private:evidence/av-kfik-28-3-llm-rubric-parsing at commit f961c63.

Post-Deploy Monitoring & Validation

  • No additional production monitoring required. This changes local/CI grader parsing behavior and is covered by focused parser tests plus live grader dogfood evidence.

Notes

  • This intentionally does not rewrite public run artifacts from assertion_results to the final graders/checks shape; that remains in av-kfik.28.6.
  • Code review: dedicated ce-code-review dispatch unavailable in this Codex session; performed manual diff scan and fixed a checks-only aggregate edge case before final validation.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 5, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: e1f24a0
Status: ✅  Deploy successful!
Preview URL: https://9839a143.agentv.pages.dev
Branch Preview URL: https://grading-llm-rubric-parsing.agentv.pages.dev

View logs

@christso christso force-pushed the grading-llm-rubric-parsing branch from bf768c6 to e1f24a0 Compare July 5, 2026 04:18
@christso christso marked this pull request as ready for review July 5, 2026 04:30
@christso christso merged commit 75ca32a into main Jul 5, 2026
8 checks passed
@christso christso deleted the grading-llm-rubric-parsing branch July 5, 2026 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant