fix(run-store): route caller-passed read clients to the owning store's primary#4153
Conversation
…primary RoutingRunStore accepted `client` on every routed read but dropped it, so read-your-writes reads (execution snapshots, waitpoints, task run attempts, batches) silently fell back to the read replica. Under replica lag the dequeue re-read of a just-written snapshot could return stale or missing data and fail the run. A caller-passed client is never forwarded verbatim (it is bound to the control-plane database, the wrong one for a NEW-resident run); its presence now routes the read to the owning sub-store own primary via the new `primaryReadClient` handle. Reads without a client keep using the replica.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📜 Recent review details⏰ Context from checks skipped due to timeout. (22)
|
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Description check | The description covers bug, fix, and testing, but it misses required template sections like Closes #, checklist items, changelog, and screenshots. | Add the issue-closing line, complete the checklist, and include Changelog and Screenshots sections to match the repository template. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title clearly states the main routing fix and matches the changed run-store read path behavior. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
✨ Finishing Touches
📝 Generate docstrings
- Create stacked PR
- Commit on current branch
🧪 Generate unit tests (beta)
- Create PR with unit tests
- Commit unit tests in branch
fix/run-store-routing-primary-read
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts (1)
178-183: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winRaw SQL against Prisma's implicit join table is fragile.
Direct
$executeRawinsert into_completedWaitpointsrelies on Prisma's undocumented column-ordering convention (A/B assigned alphabetically by model name). If the underlying models are ever renamed or the relation becomes explicit, this insert silently breaks or inserts into the wrong columns without a compile-time signal.Consider seeding this via the Prisma relation API (e.g., updating the snapshot's
completedWaitpointsconnect) if the schema exposes it, to avoid depending on internal join-table layout.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2baa1606-d1cf-4cbe-beea-03ae9e9535d5
📒 Files selected for processing (5)
.server-changes/fix-run-store-routing-read-your-writes.mdinternal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/runOpsStore.tsinternal-packages/run-store/src/types.ts
📜 Review details
⏰ Context from checks skipped due to timeout. (24)
- GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
- GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
- GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
- GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
- GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
- GitHub Check: typecheck / typecheck
⚠️ CI failures not shown inline (4)
GitHub Actions: 🔎 REVIEW.md Drift Audit / 0_audit.txt: fix(run-store): route caller-passed read clients to the owning store's primary
Conclusion: failure
s.rc1
* [new tag] build-metadata-upgrade-logging.rc1 -> build-metadata-upgrade-logging.rc1
* [new tag] build-metadata-upgrade-logging.rc2 -> build-metadata-upgrade-logging.rc2
* [new tag] build-metadata-upgrade-logging.rc3 -> build-metadata-upgrade-logging.rc3
* [new tag] build-new-build-system.rc.1 -> build-new-build-system.rc.1
* [new tag] build-otel-upgrade-rc.0 -> build-otel-upgrade-rc.0
* [new tag] build-otel-upgrade-rc.1 -> build-otel-upgrade-rc.1
* [new tag] build-pre-pull-deployments-rc.1 -> build-pre-pull-deployments-rc.1
* [new tag] build-prod-rescue-rc.1 -> build-prod-rescue-rc.1
* [new tag] build-rate-limiter-fix-rc.1 -> build-rate-limiter-fix-rc.1
* [new tag] build-re2.rc0 -> build-re2.rc0
* [new tag] build-realtime-v2-stream-fix -> build-realtime-v2-stream-fix
* [new tag] build-realtime-v2-stream-fix-2 -> build-realtime-v2-stream-fix-2
* [new tag] build-realtime-v2-stream-fix-3 -> build-realtime-v2-stream-fix-3
* [new tag] build-realtime-v2-stream-fix-4 -> build-realtime-v2-stream-fix-4
* [new tag] build-realtime-v2-stream-fix-5 -> build-realtime-v2-stream-fix-5
* [new tag] build-realtimestreams-dedupe -> build-realtimestreams-dedupe
* [new tag] build-registry-maintenance-rc.1 -> build-registry-maintenance-rc.1
* [new tag] build-registry-maintenance-rc.2 -> build-registry-maintenance-rc.2
* [new tag] build-remote-ecr-rc.0 -> build-remote-ecr-rc.0
* [new tag] build-reschedule-hotfix.rc1 -> build-reschedule-hotfix.rc1
* [new tag] build-resume-fixes.rc1 -> build-resume-fixes.rc1
* [new tag] build-resume-fixes.rc2 -> build-resume-fixes.rc2
* [new tag] build-resume-fixes.rc3 -> build-resume-...
GitHub Actions: 🔎 REVIEW.md Drift Audit / audit: fix(run-store): route caller-passed read clients to the owning store's primary
Conclusion: failure
##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
with:
anthropic_***REDACTED***
use_sticky_comment: true
allowed_bots: devin-ai-integration[bot]
claude_args: --max-turns 30
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
prompt: You are auditing this PR for drift against `.claude/REVIEW.md`.
## Context
`.claude/REVIEW.md` is the repo's source of truth for what AI / agent code reviewers should treat as critical findings (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, Lua versioning, etc.). It is consumed by review agents to calibrate severity. If REVIEW.md goes stale, every future agent review degrades.
## Strategy — read this first
You have a hard turn budget. Spend it on signal, not coverage. The audit is allowed to miss things; it is NOT allowed to time out.
1. Read `.claude/REVIEW.md` once, in full.
2. Run `git diff origin/main...HEAD --name-only` to get the list of changed files. Do NOT read the diff content yet.
3. Scan the file-list for relevance to REVIEW.md scope. Relevance signals: changes to Prisma schema, Redis / queue / Lua code, hot tables, recovery / restart loops, new packages, deletions of paths REVIEW.md cites. Skim everything else.
4. Open at most **5 files** total — only the ones most likely to surface a real signal. If nothing in the file-list looks relevant to any REVIEW.md rule, do NOT read any files; go straight to the verdict.
5. Form a verdict and stop. Do not exhaust the turn budget exploring.
Large PRs (>50 files changed) are a strong signal to be MORE selective, not more thorough. Pick 3-5 files at most.
## What to look for
- **Stale references** — does any REVIEW.md rule cite a file, directory, function, table, Prisma model, or package name that has been removed or renamed in this PR (or is already gone from `main`)?
- **Contradictions** — does code in this PR clearly violate a current REVIEW.md rule? (Don't re-review the PR. Only flag if REVIE...
GitHub Actions: 📝 Agent Instructions Audit / audit: fix(run-store): route caller-passed read clients to the owning store's primary
Conclusion: failure
##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
with:
anthropic_***REDACTED***
use_sticky_comment: true
allowed_bots: devin-ai-integration[bot]
claude_args: --max-turns 25
--model claude-opus-4-8
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
prompt: You are reviewing a PR to check whether any agent instruction files need updating.
In this repo:
- Root shared agent guidance lives in `AGENTS.md`.
- Root `CLAUDE.md` is only a Claude Code adapter that imports `AGENTS.md`.
- Subdirectories may still have scoped `CLAUDE.md` files.
- `.claude/rules/` contains additional Claude Code guidance.
## Your task
1. Run `git diff origin/main...HEAD --name-only` to see which files changed in this PR.
2. For each changed directory, check the applicable instruction files: root `AGENTS.md`, any `CLAUDE.md` in that directory or a parent directory, and relevant `.claude/rules/` files.
3. Determine if any instruction file should be updated based on the changes. Consider:
- New files/directories that aren't covered by existing documentation
- Changed architecture or patterns that contradict current agent guidance
- New dependencies, services, or infrastructure that agents should know about
- Renamed or moved files that are referenced in an instruction file
- Changes to build commands, test patterns, or development workflows
## Response format
If NO updates are needed, respond with exactly:
✅ Agent instruction files look current for this PR.
If updates ARE needed, respond with a short list:
📝 **Agent instruction updates suggested:**
- `AGENTS.md`: [what should be added/changed]
- `path/to/CLAUDE.md`: [what should be added/changed]
- `.claude/rules/file.md`: [what should be added/changed]
Keep suggestions specific and brief. Only flag things that would actually mislead agents in future sessions.
Do NOT suggest updates for trivial changes (bug fixes, small refactors within existing patterns).
Do NOT suggest creating new...
GitHub Actions: 📝 Agent Instructions Audit / 0_audit.txt: fix(run-store): route caller-passed read clients to the owning store's primary
Conclusion: failure
d-batching-rc.1
* [new tag] build-batching-rc.2 -> build-batching-rc.2
* [new tag] build-billing-0.0.1 -> build-billing-0.0.1
* [new tag] build-billing-0.0.2 -> build-billing-0.0.2
* [new tag] build-billing-0.0.3 -> build-billing-0.0.3
* [new tag] build-buildinfo-rc.0 -> build-buildinfo-rc.0
* [new tag] build-buildinfo-rc.1 -> build-buildinfo-rc.1
* [new tag] build-checkpoint-failover-rc.1 -> build-checkpoint-failover-rc.1
* [new tag] build-checkpoint-race-condition-1 -> build-checkpoint-race-condition-1
* [new tag] build-checkpoint-race-condition-2 -> build-checkpoint-race-condition-2
* [new tag] build-checkpoint-race-condition-3 -> build-checkpoint-race-condition-3
* [new tag] build-chris-test-blacksmith -> build-chris-test-blacksmith
* [new tag] build-chris-test-blacksmith-2 -> build-chris-test-blacksmith-2
* [new tag] build-cli-build-upgrade-rc.1 -> build-cli-build-upgrade-rc.1
* [new tag] build-clickhouse-reads-rc0 -> build-clickhouse-reads-rc0
* [new tag] build-clickhouse-reads-rc1 -> build-clickhouse-reads-rc1
* [new tag] build-compute.rc0 -> build-compute.rc0
* [new tag] build-compute.rc1 -> build-compute.rc1
* [new tag] build-compute.rc2 -> build-compute.rc2
* [new tag] build-compute.rc3 -> build-compute.rc3
* [new tag] build-compute.rc4 -> build-compute.rc4
* [new tag] build-compute.rc5 -> build-compute.rc5
* [new tag] build-compute.rc6 -> build-compute.rc6
* [new tag] build-corepack-offline-rc.0 -> build-corepack-offline-rc.0
* [new tag] build-current-deployment-rc.0 -> build-current-deployment-rc.0
* [ne...
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
Files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamicimport(); only use dynamic imports when resolving circular dependencies, enabling real code splitting, or conditionally loading a module at runtime.
Always import from@trigger.dev/sdk; never import from@trigger.dev/sdk/v3or use deprecatedclient.defineJob.
In code that imports@trigger.dev/core, use subpath imports only and never import from the package root.
Files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
**/*.{test,spec}.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use vitest for all tests in the Trigger.dev repository
Files:
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
**/*.test.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.test.{ts,tsx,js,jsx}: Place test files next to their source files (for example,MyService.ts->MyService.test.ts).
Use Vitest exclusively for tests, and do not mock dependencies; use testcontainers instead.
Files:
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
🧠 Learnings (12)
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.
Applied to files:
.server-changes/fix-run-store-routing-read-your-writes.md
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.
Applied to files:
internal-packages/run-store/src/PostgresRunStore.tsinternal-packages/run-store/src/runOpsStore.routedReadPrimary.test.tsinternal-packages/run-store/src/types.tsinternal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-05-18T14:40:02.173Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3658
File: packages/core/src/v3/realtimeStreams/manager.test.ts:1-147
Timestamp: 2026-05-18T14:40:02.173Z
Learning: In the triggerdotdev/trigger.dev repo, the policy “Never mock anything — use testcontainers instead” should only be enforced for integration tests that interact with real external services (e.g., Redis, Postgres) via actual infrastructure. For unit tests that exercise pure in-memory logic (e.g., cache semantics) it is OK to stub collaborators such as `ApiClient` using Vitest (`vi.fn()`) to assert call counts or control behavior. Do not flag `vi.fn()`-based `ApiClient` stubs in unit tests as violations of the testcontainers policy.
Applied to files:
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
📚 Learning: 2026-06-16T09:19:47.637Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3960
File: apps/webapp/test/prismaInfrastructureErrorCapture.test.ts:0-0
Timestamp: 2026-06-16T09:19:47.637Z
Learning: In this repo’s Vitest setup, `vitest.config.ts` uses `globals: true`, so identifiers like `vi`, `describe`, `it`, and `expect` are available as globals in Vitest test files. During code review, do not flag missing `vi`/`describe`/`it`/`expect` imports as a runtime error or correctness issue when they’re used in `*.test.ts/tsx` or `*.spec.ts/tsx` files. Explicit imports are still preferred for consistency, but they’re not required for runtime behavior.
Applied to files:
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
🔇 Additional comments (9)
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts (1)
1-326: LGTM!.server-changes/fix-run-store-routing-read-your-writes.md (1)
1-6: LGTM!internal-packages/run-store/src/PostgresRunStore.ts (1)
510-516: LGTM!internal-packages/run-store/src/runOpsStore.ts (6)
50-66: LGTM!Also applies to: 78-90, 150-170
214-264: LGTM!Also applies to: 266-375, 607-656
723-778: LGTM!Also applies to: 785-822, 831-901
941-990: LGTM!Also applies to: 992-1125
1198-1275: LGTM!Also applies to: 1296-1324
1372-1444: LGTM!Also applies to: 1464-1473, 1527-1538
… writer/tx Review follow-up to the read-your-writes routing fix. The presence-only signal (`client != null`) escalated EVERY caller-passed read client to the owning store's primary, including an explicit read replica (e.g. `$replica`) passed by span/trace/session lookups and the run-ops read-through — defeating replica scaling. Writer and replica Prisma clients are structurally identical at runtime (a replica is a `new PrismaClient(...)` too, so it also exposes `$transaction`), so shape can't tell them apart. The client builder now brands replica handles (`markReadReplicaClient`) and the routing store reads the brand: a writer/tx still escalates to the owning primary (read-your-writes, incl. the dequeue hot path), while a branded replica or no client keeps the owning store's replica. Also probe the PRIMARY in `forWaitpointCompletion`'s store-resolution (it selects the store a subsequent write lands on), mirroring `#resolveWaitpointStore` — a replica-lagged probe could mis-resolve the owner and strand the run. Tests: branded-replica-stays-on-replica vs writer-escalates (routed reads); and forWaitpointCompletion resolving the owner under replica lag. Seed the snapshot↔ waitpoint link via the Prisma relation API instead of a raw join-table insert. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks both — addressed in cc72e87. The factual dispute first (Devin's MAJOR): does
|
The bug
When the run-ops routing store is active (both run-ops DB URLs configured),
RoutingRunStoreaccepted a caller-suppliedclienton its read methods but dropped it, so every routed read fell back to the owning sub-store'sreadOnlyPrisma(the read replica).The run engine's hot paths pass the writer/tx into these reads deliberately, to get read-your-writes. On the dequeue path the engine writes the
QUEUEDsnapshot to the primary and reads it back milliseconds later; with the client dropped that read hit the replica. On a single-DB install (or local/CI, where$replicafalls back to the primary) this is invisible, but against a genuinely separate, lagging replica under write load the read returns a stale or missing snapshot — surfacing asTASK_DEQUEUED_INVALID_STATEandNo execution snapshot found for TaskRun …, i.e. runs failing on dequeue.The fix
Honor the caller's read-consistency intent, but map it to the owning sub-store's own primary (the caller's client is bound to the control-plane DB, which is the wrong database for a NEW-resident row):
readonly primaryReadClient: ReadClientto theRunStoreinterface;PostgresRunStorereturns its writer,RoutingRunStorethrows (a router has no single primary).#ownPrimary(store, client)=client === undefined ? undefined : store.primaryReadClient— client passed → owning primary;undefined→ replica (byte-identical to before).findLatestExecutionSnapshot,findExecutionSnapshot,findManyExecutionSnapshots,findSnapshotCompletedWaitpointIds,findRun/findRunOrThrow, the waitpoint read family, batch reads,findTaskRunAttempt) plus the cross-DB relation-hydration chain.readYourWritesnow keys on client presence rather than writer identity, which also fixes tx clients (they lack$transactionat runtime) being silently downgraded.Single-DB / self-host behavior is unchanged (no routing store is built).
Testing
runOpsStore.routedReadPrimary.test.ts): two physically distinct Postgres containers — writer→DB-A (rows), replica→DB-B (empty = unbounded lag) — so there's noreplica == primaryaliasing to mask the bug. Fails before, passes after, for snapshot reads, fan-outfindRuns, batch friendlyId, and waitpoint reads; the NEW-arm case asserts a control-plane client resolves to the NEW store's own primary (never forwarded verbatim).@internal/run-store,@internal/run-engine, andwebapptypecheck clean.