Skip to content

fix(run-store): route caller-passed read clients to the owning store's primary#4153

Merged
d-cs merged 2 commits into
mainfrom
fix/run-store-routing-primary-read
Jul 4, 2026
Merged

fix(run-store): route caller-passed read clients to the owning store's primary#4153
d-cs merged 2 commits into
mainfrom
fix/run-store-routing-primary-read

Conversation

@d-cs

@d-cs d-cs commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

The bug

When the run-ops routing store is active (both run-ops DB URLs configured), RoutingRunStore accepted a caller-supplied client on its read methods but dropped it, so every routed read fell back to the owning sub-store's readOnlyPrisma (the read replica).

The run engine's hot paths pass the writer/tx into these reads deliberately, to get read-your-writes. On the dequeue path the engine writes the QUEUED snapshot to the primary and reads it back milliseconds later; with the client dropped that read hit the replica. On a single-DB install (or local/CI, where $replica falls back to the primary) this is invisible, but against a genuinely separate, lagging replica under write load the read returns a stale or missing snapshot — surfacing as TASK_DEQUEUED_INVALID_STATE and No execution snapshot found for TaskRun …, i.e. runs failing on dequeue.

The fix

Honor the caller's read-consistency intent, but map it to the owning sub-store's own primary (the caller's client is bound to the control-plane DB, which is the wrong database for a NEW-resident row):

  • Add readonly primaryReadClient: ReadClient to the RunStore interface; PostgresRunStore returns its writer, RoutingRunStore throws (a router has no single primary).
  • #ownPrimary(store, client) = client === undefined ? undefined : store.primaryReadClient — client passed → owning primary; undefined → replica (byte-identical to before).
  • Applied across the whole class of routed reads (findLatestExecutionSnapshot, findExecutionSnapshot, findManyExecutionSnapshots, findSnapshotCompletedWaitpointIds, findRun/findRunOrThrow, the waitpoint read family, batch reads, findTaskRunAttempt) plus the cross-DB relation-hydration chain.
  • readYourWrites now keys on client presence rather than writer identity, which also fixes tx clients (they lack $transaction at runtime) being silently downgraded.

Single-DB / self-host behavior is unchanged (no routing store is built).

Testing

  • New deterministic regression harness (runOpsStore.routedReadPrimary.test.ts): two physically distinct Postgres containers — writer→DB-A (rows), replica→DB-B (empty = unbounded lag) — so there's no replica == primary aliasing to mask the bug. Fails before, passes after, for snapshot reads, fan-out findRuns, batch friendlyId, and waitpoint reads; the NEW-arm case asserts a control-plane client resolves to the NEW store's own primary (never forwarded verbatim).
  • Full run-store suite: 191 tests pass. Run-engine dequeue/snapshot/waitpoint suites: 30 pass. @internal/run-store, @internal/run-engine, and webapp typecheck clean.

…primary

RoutingRunStore accepted `client` on every routed read but dropped it, so
read-your-writes reads (execution snapshots, waitpoints, task run attempts,
batches) silently fell back to the read replica. Under replica lag the
dequeue re-read of a just-written snapshot could return stale or missing
data and fail the run.

A caller-passed client is never forwarded verbatim (it is bound to the
control-plane database, the wrong one for a NEW-resident run); its presence
now routes the read to the owning sub-store own primary via the new
`primaryReadClient` handle. Reads without a client keep using the replica.
@changeset-bot

changeset-bot Bot commented Jul 4, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: cc72e87

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@d-cs d-cs self-assigned this Jul 4, 2026
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 781da218-bd65-45f2-9dd9-3a9e0b2fc3dc

📥 Commits

Reviewing files that changed from the base of the PR and between 40fd064 and cc72e87.

📒 Files selected for processing (6)
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/index.ts
  • internal-packages/run-store/src/readReplicaClient.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
  • internal-packages/run-store/src/runOpsStore.ts
✅ Files skipped from review due to trivial changes (1)
  • internal-packages/run-store/src/index.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/runOpsStore.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout. (22)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
⚠️ CI failures not shown inline (4)

GitHub Actions: 📝 Agent Instructions Audit / 0_audit.txt: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

   build-batching-rc.1         -> build-batching-rc.1
  * [new tag]             build-batching-rc.2         -> build-batching-rc.2
  * [new tag]             build-billing-0.0.1         -> build-billing-0.0.1
  * [new tag]             build-billing-0.0.2         -> build-billing-0.0.2
  * [new tag]             build-billing-0.0.3         -> build-billing-0.0.3
  * [new tag]             build-buildinfo-rc.0        -> build-buildinfo-rc.0
  * [new tag]             build-buildinfo-rc.1        -> build-buildinfo-rc.1
  * [new tag]             build-checkpoint-failover-rc.1 -> build-checkpoint-failover-rc.1
  * [new tag]             build-checkpoint-race-condition-1 -> build-checkpoint-race-condition-1
  * [new tag]             build-checkpoint-race-condition-2 -> build-checkpoint-race-condition-2
  * [new tag]             build-checkpoint-race-condition-3 -> build-checkpoint-race-condition-3
  * [new tag]             build-chris-test-blacksmith -> build-chris-test-blacksmith
  * [new tag]             build-chris-test-blacksmith-2 -> build-chris-test-blacksmith-2
  * [new tag]             build-cli-build-upgrade-rc.1 -> build-cli-build-upgrade-rc.1
  * [new tag]             build-clickhouse-reads-rc0  -> build-clickhouse-reads-rc0
  * [new tag]             build-clickhouse-reads-rc1  -> build-clickhouse-reads-rc1
  * [new tag]             build-compute.rc0           -> build-compute.rc0
  * [new tag]             build-compute.rc1           -> build-compute.rc1
  * [new tag]             build-compute.rc2           -> build-compute.rc2
  * [new tag]             build-compute.rc3           -> build-compute.rc3
  * [new tag]             build-compute.rc4           -> build-compute.rc4
  * [new tag]             build-compute.rc5           -> build-compute.rc5
  * [new tag]             build-compute.rc6           -> build-compute.rc6
  * [new tag]             build-corepack-offline-rc.0 -> build-corepack-offline-rc.0
  * [new tag]             build-current-deployment-rc.0 ->...

GitHub Actions: 📝 Agent Instructions Audit / audit: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 25
--model claude-opus-4-8
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are reviewing a PR to check whether any agent instruction files need updating.
In this repo:
- Root shared agent guidance lives in `AGENTS.md`.
- Root `CLAUDE.md` is only a Claude Code adapter that imports `AGENTS.md`.
- Subdirectories may still have scoped `CLAUDE.md` files.
- `.claude/rules/` contains additional Claude Code guidance.
## Your task
1. Run `git diff origin/main...HEAD --name-only` to see which files changed in this PR.
2. For each changed directory, check the applicable instruction files: root `AGENTS.md`, any `CLAUDE.md` in that directory or a parent directory, and relevant `.claude/rules/` files.
3. Determine if any instruction file should be updated based on the changes. Consider:
   - New files/directories that aren't covered by existing documentation
   - Changed architecture or patterns that contradict current agent guidance
   - New dependencies, services, or infrastructure that agents should know about
   - Renamed or moved files that are referenced in an instruction file
   - Changes to build commands, test patterns, or development workflows
## Response format
If NO updates are needed, respond with exactly:
✅ Agent instruction files look current for this PR.
If updates ARE needed, respond with a short list:
📝 **Agent instruction updates suggested:**
- `AGENTS.md`: [what should be added/changed]
- `path/to/CLAUDE.md`: [what should be added/changed]
- `.claude/rules/file.md`: [what should be added/changed]
Keep suggestions specific and brief. Only flag things that would actually mislead agents in future sessions.
Do NOT suggest updates for trivial changes (bug fixes, small refactors within existing patterns).
Do NOT suggest creating new...

GitHub Actions: 🔎 REVIEW.md Drift Audit / 0_audit.txt: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

-> build-legacy-run-engine.fix3
  * [new tag]             build-manual-checkpoints.rc1 -> build-manual-checkpoints.rc1
  * [new tag]             build-metadata-upgrade-logging.rc1 -> build-metadata-upgrade-logging.rc1
  * [new tag]             build-metadata-upgrade-logging.rc2 -> build-metadata-upgrade-logging.rc2
  * [new tag]             build-metadata-upgrade-logging.rc3 -> build-metadata-upgrade-logging.rc3
  * [new tag]             build-new-build-system.rc.1 -> build-new-build-system.rc.1
  * [new tag]             build-otel-upgrade-rc.0     -> build-otel-upgrade-rc.0
  * [new tag]             build-otel-upgrade-rc.1     -> build-otel-upgrade-rc.1
  * [new tag]             build-pre-pull-deployments-rc.1 -> build-pre-pull-deployments-rc.1
  * [new tag]             build-prod-rescue-rc.1      -> build-prod-rescue-rc.1
  * [new tag]             build-rate-limiter-fix-rc.1 -> build-rate-limiter-fix-rc.1
  * [new tag]             build-re2.rc0               -> build-re2.rc0
  * [new tag]             build-realtime-v2-stream-fix -> build-realtime-v2-stream-fix
  * [new tag]             build-realtime-v2-stream-fix-2 -> build-realtime-v2-stream-fix-2
  * [new tag]             build-realtime-v2-stream-fix-3 -> build-realtime-v2-stream-fix-3
  * [new tag]             build-realtime-v2-stream-fix-4 -> build-realtime-v2-stream-fix-4
  * [new tag]             build-realtime-v2-stream-fix-5 -> build-realtime-v2-stream-fix-5
  * [new tag]             build-realtimestreams-dedupe -> build-realtimestreams-dedupe
  * [new tag]             build-registry-maintenance-rc.1 -> build-registry-maintenance-rc.1
  * [new tag]             build-registry-maintenance-rc.2 -> build-registry-maintenance-rc.2
  * [new tag]             build-remote-ecr-rc.0       -> build-remote-ecr-rc.0
  * [new tag]             build-reschedule-hotfix.rc1 -> build-reschedule-hotfix.rc1
  * [new tag]             build-resume-fixes.rc1      -> build-resume-fixes.rc1
  * [new tag]             build-resum...

GitHub Actions: 🔎 REVIEW.md Drift Audit / audit: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 30
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are auditing this PR for drift against `.claude/REVIEW.md`.
## Context
`.claude/REVIEW.md` is the repo's source of truth for what AI / agent code reviewers should treat as critical findings (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, Lua versioning, etc.). It is consumed by review agents to calibrate severity. If REVIEW.md goes stale, every future agent review degrades.
## Strategy — read this first
You have a hard turn budget. Spend it on signal, not coverage. The audit is allowed to miss things; it is NOT allowed to time out.
1. Read `.claude/REVIEW.md` once, in full.
2. Run `git diff origin/main...HEAD --name-only` to get the list of changed files. Do NOT read the diff content yet.
3. Scan the file-list for relevance to REVIEW.md scope. Relevance signals: changes to Prisma schema, Redis / queue / Lua code, hot tables, recovery / restart loops, new packages, deletions of paths REVIEW.md cites. Skim everything else.
4. Open at most **5 files** total — only the ones most likely to surface a real signal. If nothing in the file-list looks relevant to any REVIEW.md rule, do NOT read any files; go straight to the verdict.
5. Form a verdict and stop. Do not exhaust the turn budget exploring.
Large PRs (>50 files changed) are a strong signal to be MORE selective, not more thorough. Pick 3-5 files at most.
## What to look for
- **Stale references** — does any REVIEW.md rule cite a file, directory, function, table, Prisma model, or package name that has been removed or renamed in this PR (or is already gone from `main`)?
- **Contradictions** — does code in this PR clearly violate a current REVIEW.md rule? (Don't re-review the PR. Only flag if REVIE...
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamic import(); only use dynamic imports when resolving circular dependencies, enabling real code splitting, or conditionally loading a module at runtime.
Always import from @trigger.dev/sdk; never import from @trigger.dev/sdk/v3 or use deprecated client.defineJob.
In code that imports @trigger.dev/core, use subpath imports only and never import from the package root.

Files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/db.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Always use findFirst instead of findUnique for Prisma queries.

Files:

  • apps/webapp/app/db.server.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • internal-packages/run-store/src/runOpsStore.test.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Place test files next to their source files (for example, MyService.ts -> MyService.test.ts).
Use Vitest exclusively for tests, and do not mock dependencies; use testcontainers instead.

Files:

  • internal-packages/run-store/src/runOpsStore.test.ts
🧠 Learnings (15)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.

Applied to files:

  • internal-packages/run-store/src/readReplicaClient.ts
  • apps/webapp/app/db.server.ts
  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.

Applied to files:

  • apps/webapp/app/db.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.

Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.

Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.

Applied to files:

  • apps/webapp/app/db.server.ts
📚 Learning: 2026-06-25T18:21:51.905Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4039
File: apps/webapp/app/routes/invite-revoke.tsx:0-0
Timestamp: 2026-06-25T18:21:51.905Z
Learning: During the Zod v4 migration in the triggerdotdev/trigger.dev webapp, ensure any imports from `conform-to/zod` use the Zod-4 subpath: `conform-to/zod/v4` (e.g., `import { parseWithZod } from "conform-to/zod/v4"`). Do not import from the package root `conform-to/zod`, because it is the Zod 3 implementation and may load Zod-3-only symbols (e.g., `ZodBranded`, `ZodEffects`), which can throw at module load (notably with `zod4.4.3`). This should be enforced across `apps/webapp/**/*` where helpers like `parseWithZod` and `conformZodMessage` are used.

Applied to files:

  • apps/webapp/app/db.server.ts
📚 Learning: 2026-07-03T17:10:21.498Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 4148
File: apps/webapp/app/models/orgMember.server.ts:149-168
Timestamp: 2026-07-03T17:10:21.498Z
Learning: In triggerdotdev/trigger.dev, `User.email` (Prisma schema: `internal-packages/database/prisma/schema.prisma`) currently does NOT use `citext` and does NOT have a `lower(email)` functional unique index. Therefore, do not introduce Prisma queries like `where: { email: { equals: <value>, mode: "insensitive" } }` (or any case-insensitive lookup) against `User.email`, because it can force sequential scans of the `users` table under load. During review, ensure email is normalized (e.g., lowercased/trimmed) before both writes and subsequent lookups, and if true case-insensitive behavior/uniqueness is required, implement it via a separate app-wide migration (e.g., switch to `citext` and/or add a functional unique index with backfill) rather than bolting it onto individual feature PRs.

Applied to files:

  • apps/webapp/app/db.server.ts
📚 Learning: 2026-05-18T14:40:02.173Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3658
File: packages/core/src/v3/realtimeStreams/manager.test.ts:1-147
Timestamp: 2026-05-18T14:40:02.173Z
Learning: In the triggerdotdev/trigger.dev repo, the policy “Never mock anything — use testcontainers instead” should only be enforced for integration tests that interact with real external services (e.g., Redis, Postgres) via actual infrastructure. For unit tests that exercise pure in-memory logic (e.g., cache semantics) it is OK to stub collaborators such as `ApiClient` using Vitest (`vi.fn()`) to assert call counts or control behavior. Do not flag `vi.fn()`-based `ApiClient` stubs in unit tests as violations of the testcontainers policy.

Applied to files:

  • internal-packages/run-store/src/runOpsStore.test.ts
📚 Learning: 2026-06-16T09:19:47.637Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3960
File: apps/webapp/test/prismaInfrastructureErrorCapture.test.ts:0-0
Timestamp: 2026-06-16T09:19:47.637Z
Learning: In this repo’s Vitest setup, `vitest.config.ts` uses `globals: true`, so identifiers like `vi`, `describe`, `it`, and `expect` are available as globals in Vitest test files. During code review, do not flag missing `vi`/`describe`/`it`/`expect` imports as a runtime error or correctness issue when they’re used in `*.test.ts/tsx` or `*.spec.ts/tsx` files. Explicit imports are still preferred for consistency, but they’re not required for runtime behavior.

Applied to files:

  • internal-packages/run-store/src/runOpsStore.test.ts
🔇 Additional comments (4)
internal-packages/run-store/src/readReplicaClient.ts (1)

1-26: LGTM!

apps/webapp/app/db.server.ts (2)

172-179: LGTM!


262-270: 🎯 Functional Correctness

No issue here: the replica brand survives the wrapper

			> Likely an incorrect or invalid review comment.
internal-packages/run-store/src/runOpsStore.test.ts (1)

1260-1291: LGTM!


Walkthrough

This PR updates run-store routing so routed reads can distinguish replica clients from primary-capable clients. It adds a primaryReadClient contract to RunStore, implements it on PostgresRunStore, and introduces read-replica branding helpers. RoutingRunStore now routes read paths through owning-store primary clients when appropriate, updates snapshot, waitpoint, task-run, and batch read flows, and changes the read-your-writes signal to rely on non-replica client presence. New integration tests cover split-topology routing and lagged replica behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers bug, fix, and testing, but it misses required template sections like Closes #, checklist items, changelog, and screenshots. Add the issue-closing line, complete the checklist, and include Changelog and Screenshots sections to match the repository template.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main routing fix and matches the changed run-store read path behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/run-store-routing-primary-read

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

devin-ai-integration[bot]

This comment was marked as resolved.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts (1)

178-183: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Raw SQL against Prisma's implicit join table is fragile.

Direct $executeRaw insert into _completedWaitpoints relies on Prisma's undocumented column-ordering convention (A/B assigned alphabetically by model name). If the underlying models are ever renamed or the relation becomes explicit, this insert silently breaks or inserts into the wrong columns without a compile-time signal.

Consider seeding this via the Prisma relation API (e.g., updating the snapshot's completedWaitpoints connect) if the schema exposes it, to avoid depending on internal join-table layout.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2baa1606-d1cf-4cbe-beea-03ae9e9535d5

📥 Commits

Reviewing files that changed from the base of the PR and between 119189f and 40fd064.

📒 Files selected for processing (5)
  • .server-changes/fix-run-store-routing-read-your-writes.md
  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/runOpsStore.ts
  • internal-packages/run-store/src/types.ts
📜 Review details
⏰ Context from checks skipped due to timeout. (24)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (1, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (4, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (8, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (11, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (10, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (5, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (6, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (12, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (3, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (9, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (2, 12)
  • GitHub Check: internal / 🧪 Unit Tests: Internal (7, 12)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: typecheck / typecheck
⚠️ CI failures not shown inline (4)

GitHub Actions: 🔎 REVIEW.md Drift Audit / 0_audit.txt: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

s.rc1
  * [new tag]             build-metadata-upgrade-logging.rc1 -> build-metadata-upgrade-logging.rc1
  * [new tag]             build-metadata-upgrade-logging.rc2 -> build-metadata-upgrade-logging.rc2
  * [new tag]             build-metadata-upgrade-logging.rc3 -> build-metadata-upgrade-logging.rc3
  * [new tag]             build-new-build-system.rc.1 -> build-new-build-system.rc.1
  * [new tag]             build-otel-upgrade-rc.0     -> build-otel-upgrade-rc.0
  * [new tag]             build-otel-upgrade-rc.1     -> build-otel-upgrade-rc.1
  * [new tag]             build-pre-pull-deployments-rc.1 -> build-pre-pull-deployments-rc.1
  * [new tag]             build-prod-rescue-rc.1      -> build-prod-rescue-rc.1
  * [new tag]             build-rate-limiter-fix-rc.1 -> build-rate-limiter-fix-rc.1
  * [new tag]             build-re2.rc0               -> build-re2.rc0
  * [new tag]             build-realtime-v2-stream-fix -> build-realtime-v2-stream-fix
  * [new tag]             build-realtime-v2-stream-fix-2 -> build-realtime-v2-stream-fix-2
  * [new tag]             build-realtime-v2-stream-fix-3 -> build-realtime-v2-stream-fix-3
  * [new tag]             build-realtime-v2-stream-fix-4 -> build-realtime-v2-stream-fix-4
  * [new tag]             build-realtime-v2-stream-fix-5 -> build-realtime-v2-stream-fix-5
  * [new tag]             build-realtimestreams-dedupe -> build-realtimestreams-dedupe
  * [new tag]             build-registry-maintenance-rc.1 -> build-registry-maintenance-rc.1
  * [new tag]             build-registry-maintenance-rc.2 -> build-registry-maintenance-rc.2
  * [new tag]             build-remote-ecr-rc.0       -> build-remote-ecr-rc.0
  * [new tag]             build-reschedule-hotfix.rc1 -> build-reschedule-hotfix.rc1
  * [new tag]             build-resume-fixes.rc1      -> build-resume-fixes.rc1
  * [new tag]             build-resume-fixes.rc2      -> build-resume-fixes.rc2
  * [new tag]             build-resume-fixes.rc3      -> build-resume-...

GitHub Actions: 🔎 REVIEW.md Drift Audit / audit: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 30
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are auditing this PR for drift against `.claude/REVIEW.md`.
## Context
`.claude/REVIEW.md` is the repo's source of truth for what AI / agent code reviewers should treat as critical findings (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, Lua versioning, etc.). It is consumed by review agents to calibrate severity. If REVIEW.md goes stale, every future agent review degrades.
## Strategy — read this first
You have a hard turn budget. Spend it on signal, not coverage. The audit is allowed to miss things; it is NOT allowed to time out.
1. Read `.claude/REVIEW.md` once, in full.
2. Run `git diff origin/main...HEAD --name-only` to get the list of changed files. Do NOT read the diff content yet.
3. Scan the file-list for relevance to REVIEW.md scope. Relevance signals: changes to Prisma schema, Redis / queue / Lua code, hot tables, recovery / restart loops, new packages, deletions of paths REVIEW.md cites. Skim everything else.
4. Open at most **5 files** total — only the ones most likely to surface a real signal. If nothing in the file-list looks relevant to any REVIEW.md rule, do NOT read any files; go straight to the verdict.
5. Form a verdict and stop. Do not exhaust the turn budget exploring.
Large PRs (>50 files changed) are a strong signal to be MORE selective, not more thorough. Pick 3-5 files at most.
## What to look for
- **Stale references** — does any REVIEW.md rule cite a file, directory, function, table, Prisma model, or package name that has been removed or renamed in this PR (or is already gone from `main`)?
- **Contradictions** — does code in this PR clearly violate a current REVIEW.md rule? (Don't re-review the PR. Only flag if REVIE...

GitHub Actions: 📝 Agent Instructions Audit / audit: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 25
--model claude-opus-4-8
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are reviewing a PR to check whether any agent instruction files need updating.
In this repo:
- Root shared agent guidance lives in `AGENTS.md`.
- Root `CLAUDE.md` is only a Claude Code adapter that imports `AGENTS.md`.
- Subdirectories may still have scoped `CLAUDE.md` files.
- `.claude/rules/` contains additional Claude Code guidance.
## Your task
1. Run `git diff origin/main...HEAD --name-only` to see which files changed in this PR.
2. For each changed directory, check the applicable instruction files: root `AGENTS.md`, any `CLAUDE.md` in that directory or a parent directory, and relevant `.claude/rules/` files.
3. Determine if any instruction file should be updated based on the changes. Consider:
   - New files/directories that aren't covered by existing documentation
   - Changed architecture or patterns that contradict current agent guidance
   - New dependencies, services, or infrastructure that agents should know about
   - Renamed or moved files that are referenced in an instruction file
   - Changes to build commands, test patterns, or development workflows
## Response format
If NO updates are needed, respond with exactly:
✅ Agent instruction files look current for this PR.
If updates ARE needed, respond with a short list:
📝 **Agent instruction updates suggested:**
- `AGENTS.md`: [what should be added/changed]
- `path/to/CLAUDE.md`: [what should be added/changed]
- `.claude/rules/file.md`: [what should be added/changed]
Keep suggestions specific and brief. Only flag things that would actually mislead agents in future sessions.
Do NOT suggest updates for trivial changes (bug fixes, small refactors within existing patterns).
Do NOT suggest creating new...

GitHub Actions: 📝 Agent Instructions Audit / 0_audit.txt: fix(run-store): route caller-passed read clients to the owning store's primary

Conclusion: failure

View job details

d-batching-rc.1
  * [new tag]             build-batching-rc.2         -> build-batching-rc.2
  * [new tag]             build-billing-0.0.1         -> build-billing-0.0.1
  * [new tag]             build-billing-0.0.2         -> build-billing-0.0.2
  * [new tag]             build-billing-0.0.3         -> build-billing-0.0.3
  * [new tag]             build-buildinfo-rc.0        -> build-buildinfo-rc.0
  * [new tag]             build-buildinfo-rc.1        -> build-buildinfo-rc.1
  * [new tag]             build-checkpoint-failover-rc.1 -> build-checkpoint-failover-rc.1
  * [new tag]             build-checkpoint-race-condition-1 -> build-checkpoint-race-condition-1
  * [new tag]             build-checkpoint-race-condition-2 -> build-checkpoint-race-condition-2
  * [new tag]             build-checkpoint-race-condition-3 -> build-checkpoint-race-condition-3
  * [new tag]             build-chris-test-blacksmith -> build-chris-test-blacksmith
  * [new tag]             build-chris-test-blacksmith-2 -> build-chris-test-blacksmith-2
  * [new tag]             build-cli-build-upgrade-rc.1 -> build-cli-build-upgrade-rc.1
  * [new tag]             build-clickhouse-reads-rc0  -> build-clickhouse-reads-rc0
  * [new tag]             build-clickhouse-reads-rc1  -> build-clickhouse-reads-rc1
  * [new tag]             build-compute.rc0           -> build-compute.rc0
  * [new tag]             build-compute.rc1           -> build-compute.rc1
  * [new tag]             build-compute.rc2           -> build-compute.rc2
  * [new tag]             build-compute.rc3           -> build-compute.rc3
  * [new tag]             build-compute.rc4           -> build-compute.rc4
  * [new tag]             build-compute.rc5           -> build-compute.rc5
  * [new tag]             build-compute.rc6           -> build-compute.rc6
  * [new tag]             build-corepack-offline-rc.0 -> build-corepack-offline-rc.0
  * [new tag]             build-current-deployment-rc.0 -> build-current-deployment-rc.0
  * [ne...
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

**/*.{ts,tsx,js,jsx}: Prefer static imports over dynamic import(); only use dynamic imports when resolving circular dependencies, enabling real code splitting, or conditionally loading a module at runtime.
Always import from @trigger.dev/sdk; never import from @trigger.dev/sdk/v3 or use deprecated client.defineJob.
In code that imports @trigger.dev/core, use subpath imports only and never import from the package root.

Files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js,jsx}: Place test files next to their source files (for example, MyService.ts -> MyService.test.ts).
Use Vitest exclusively for tests, and do not mock dependencies; use testcontainers instead.

Files:

  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
🧠 Learnings (12)
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.

Applied to files:

  • .server-changes/fix-run-store-routing-read-your-writes.md
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.

Applied to files:

  • internal-packages/run-store/src/PostgresRunStore.ts
  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
  • internal-packages/run-store/src/types.ts
  • internal-packages/run-store/src/runOpsStore.ts
📚 Learning: 2026-05-18T14:40:02.173Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3658
File: packages/core/src/v3/realtimeStreams/manager.test.ts:1-147
Timestamp: 2026-05-18T14:40:02.173Z
Learning: In the triggerdotdev/trigger.dev repo, the policy “Never mock anything — use testcontainers instead” should only be enforced for integration tests that interact with real external services (e.g., Redis, Postgres) via actual infrastructure. For unit tests that exercise pure in-memory logic (e.g., cache semantics) it is OK to stub collaborators such as `ApiClient` using Vitest (`vi.fn()`) to assert call counts or control behavior. Do not flag `vi.fn()`-based `ApiClient` stubs in unit tests as violations of the testcontainers policy.

Applied to files:

  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
📚 Learning: 2026-06-16T09:19:47.637Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3960
File: apps/webapp/test/prismaInfrastructureErrorCapture.test.ts:0-0
Timestamp: 2026-06-16T09:19:47.637Z
Learning: In this repo’s Vitest setup, `vitest.config.ts` uses `globals: true`, so identifiers like `vi`, `describe`, `it`, and `expect` are available as globals in Vitest test files. During code review, do not flag missing `vi`/`describe`/`it`/`expect` imports as a runtime error or correctness issue when they’re used in `*.test.ts/tsx` or `*.spec.ts/tsx` files. Explicit imports are still preferred for consistency, but they’re not required for runtime behavior.

Applied to files:

  • internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts
🔇 Additional comments (9)
internal-packages/run-store/src/runOpsStore.routedReadPrimary.test.ts (1)

1-326: LGTM!

.server-changes/fix-run-store-routing-read-your-writes.md (1)

1-6: LGTM!

internal-packages/run-store/src/PostgresRunStore.ts (1)

510-516: LGTM!

internal-packages/run-store/src/runOpsStore.ts (6)

50-66: LGTM!

Also applies to: 78-90, 150-170


214-264: LGTM!

Also applies to: 266-375, 607-656


723-778: LGTM!

Also applies to: 785-822, 831-901


941-990: LGTM!

Also applies to: 992-1125


1198-1275: LGTM!

Also applies to: 1296-1324


1372-1444: LGTM!

Also applies to: 1464-1473, 1527-1538

Comment thread internal-packages/run-store/src/types.ts
… writer/tx

Review follow-up to the read-your-writes routing fix. The presence-only signal
(`client != null`) escalated EVERY caller-passed read client to the owning
store's primary, including an explicit read replica (e.g. `$replica`) passed by
span/trace/session lookups and the run-ops read-through — defeating replica
scaling.

Writer and replica Prisma clients are structurally identical at runtime (a
replica is a `new PrismaClient(...)` too, so it also exposes `$transaction`), so
shape can't tell them apart. The client builder now brands replica handles
(`markReadReplicaClient`) and the routing store reads the brand: a writer/tx
still escalates to the owning primary (read-your-writes, incl. the dequeue hot
path), while a branded replica or no client keeps the owning store's replica.

Also probe the PRIMARY in `forWaitpointCompletion`'s store-resolution (it selects
the store a subsequent write lands on), mirroring `#resolveWaitpointStore` — a
replica-lagged probe could mis-resolve the owner and strand the run.

Tests: branded-replica-stays-on-replica vs writer-escalates (routed reads); and
forWaitpointCompletion resolving the owner under replica lag. Seed the snapshot↔
waitpoint link via the Prisma relation API instead of a raw join-table insert.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-cs

d-cs commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks both — addressed in cc72e87.

The factual dispute first (Devin's MAJOR): does $replica have $transaction at runtime?

Yes. Verified empirically against the generated Prisma 6.14 client: a base PrismaClient exposes $transaction as a function, and it survives .$extends() (single and double — exactly how $replica is built via captureInfrastructureErrors(tagDatasource("replica", replica))). So PrismaReplicaClient is only a compile-time Omit<PrismaClient, "$transaction">; the runtime object still has the method.

Consequence: the pre-PR isWriterClient check (typeof x.$transaction === "function") ALSO returned true for $replica, so those four callers were already being escalated to the primary before this PR — this specific change was not a new regression for them. BUT the underlying point is right and worth fixing properly: a caller-passed replica should never be escalated to the primary. It also means $transaction-presence can't discriminate writer from replica at all (both have it; only a transaction client lacks it), so the old check was doubly wrong — it failed to escalate transaction clients while always escalating replicas.

1. Replica reads no longer escalate to the primary (Devin MAJOR)

Since writer and replica are structurally identical at runtime, I brand replica handles at construction (markReadReplicaClient, a Symbol.for marker in run-store) and have the router read the brand:

  • writer / tx (incl. the engine's dequeue re-read) -> owning store's primary (read-your-writes preserved)
  • branded replica, or no client -> owning store's replica (scaling preserved)

Applied in db.server.ts to $replica and the run-ops new replica, branding only genuine replicas (never the writer, and never the prisma fallback when no replica is configured). readYourWrites and #ownPrimary now escalate only when the client is present and not a branded replica. All four flagged callers (spans, trace, sessionRunManager via $replica; runEngineHandlers via the read-through runOpsNewReplica/runOpsLegacyReplica) now read from a replica again.

New test (RED before / GREEN after): a branded replica keeps a routed read on the owning store's replica (misses the lagging replica), while an unbranded writer escalates and finds the fresh row.

2. forWaitpointCompletion replica probe (Devin minor)

It is write-after-read: it selects the store a subsequent updateManyWaitpoints lands on. Its two findWaitpoint probes now read each store's primary (store.primaryReadClient), mirroring #resolveWaitpointStore(onPrimary). New test (RED before / GREEN after): under replica lag the owner still resolves correctly instead of mis-routing to the id-shape default and stranding the run.

3. Fragile raw SQL in the new test (CodeRabbit)

Replaced the raw INSERT INTO "_completedWaitpoints" ("A","B") with the Prisma relation API (taskRunExecutionSnapshot.update({ data: { completedWaitpoints: { connect: { id } } } })), so a relation/column rename fails at compile time instead of silently seeding nothing.

Verification

  • run-store: full suite 193/193 green; both new tests confirmed RED on the pre-fix logic, GREEN after.
  • run-engine: waitpoint / snapshot / dequeue / residency suites green.
  • typecheck --filter webapp, format, lint all clean.

@d-cs d-cs enabled auto-merge (squash) July 4, 2026 20:42
@d-cs d-cs disabled auto-merge July 4, 2026 22:08
@d-cs d-cs enabled auto-merge (squash) July 4, 2026 22:08
@d-cs d-cs merged commit d977691 into main Jul 4, 2026
46 checks passed
@d-cs d-cs deleted the fix/run-store-routing-primary-read branch July 4, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants