feat(core): parse Promptfoo llm-rubric results by christso · Pull Request #1664 · EntityProcess/agentv

christso · 2026-07-05T04:15:35Z

Summary

Accept Promptfoo-style llm-rubric judge responses shaped as { reason, pass, score } and normalize them into AgentV grader scores.
Map optional Promptfoo-style checks[] into current internal assertion entries while preserving normalized pass/score/reason in details for the downstream artifact contract work.
Let llm-rubric custom prompts receive the authored value through {{ rubric }} and use a Promptfoo-compatible output schema for free-form llm-rubric judging.

Validation

bun test packages/core/test/evaluation/graders.test.ts
bunx biome check packages/core/src/evaluation/graders/llm-grader.ts packages/core/src/evaluation/registry/builtin-graders.ts packages/core/test/evaluation/graders.test.ts
bun --filter @agentv/core typecheck
bun --filter @agentv/core build
bun run build (passed before the final prompt-contract adjustment; final core build passed after)

Live Dogfood

Command: LOCAL_OPENAI_PROXY_BASE_URL=http://127.0.0.1:10531/v1 LOCAL_OPENAI_PROXY_API_KEY=dummy-local-key LOCAL_OPENAI_PROXY_MODEL=gpt-5.4-mini bun apps/cli/src/cli.ts eval run .agentv/results/av-kfik-28-3-dogfood/eval.yaml --targets .agentv/results/av-kfik-28-3-dogfood/targets.yaml --target local-openai --grader-target local-openai --workers 1 --threshold 0.8 --output .agentv/results/av-kfik-28-3-dogfood-final
Result: PASS, 1/1 scored >= 80%, mean 100%.
Private evidence branch: EntityProcess/agentv-private:evidence/av-kfik-28-3-llm-rubric-parsing at commit f961c63.

Post-Deploy Monitoring & Validation

No additional production monitoring required. This changes local/CI grader parsing behavior and is covered by focused parser tests plus live grader dogfood evidence.

Notes

This intentionally does not rewrite public run artifacts from assertion_results to the final graders/checks shape; that remains in av-kfik.28.6.
Code review: dedicated ce-code-review dispatch unavailable in this Codex session; performed manual diff scan and fixed a checks-only aggregate edge case before final validation.

cloudflare-workers-and-pages · 2026-07-05T04:15:57Z

Deploying agentv with Cloudflare Pages

Latest commit:	`e1f24a0`
Status:	✅ Deploy successful!
Preview URL:	https://9839a143.agentv.pages.dev
Branch Preview URL:	https://grading-llm-rubric-parsing.agentv.pages.dev

View logs

feat(core): parse promptfoo llm-rubric results

e1f24a0

christso force-pushed the grading-llm-rubric-parsing branch from bf768c6 to e1f24a0 Compare July 5, 2026 04:18

christso marked this pull request as ready for review July 5, 2026 04:30

christso merged commit 75ca32a into main Jul 5, 2026
8 checks passed

christso deleted the grading-llm-rubric-parsing branch July 5, 2026 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): parse Promptfoo llm-rubric results#1664

feat(core): parse Promptfoo llm-rubric results#1664
christso merged 1 commit into
mainfrom
grading-llm-rubric-parsing

christso commented Jul 5, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Live Dogfood

Post-Deploy Monitoring & Validation

Notes

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Jul 5, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jul 5, 2026 •

edited

Loading