Skip to content

refactor(web): resolve language model capabilities from models.dev#1372

Merged
whoisthey merged 13 commits into
mainfrom
whoisthey/language-model-input-modalities
Jun 27, 2026
Merged

refactor(web): resolve language model capabilities from models.dev#1372
whoisthey merged 13 commits into
mainfrom
whoisthey/language-model-input-modalities

Conversation

@whoisthey

@whoisthey whoisthey commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Lays the groundwork for chat file attachments by teaching Sourcebot which inputs a configured language model can accept, resolved automatically from the models.dev catalog (the same catalog that already backs context-window resolution).

What this adds

  • Automatic capability resolution. resolveModelCapabilities (packages/web/src/features/chat/modelCapabilities.server.ts) looks a model's modalities.input up in the models.dev catalog and partitions it into two buckets:
    • inputModalities (text | image | audio | video) — perceptual channels the model encodes natively.
    • supportedDocumentTypes (pdf) — rich compound container formats providers decompose server-side. models.dev folds pdf into modalities.input; we split it back out, since a document is not a modality.
  • Shared catalog module. The catalog fetch/cache plumbing (TTL, negative cache, stale-while-revalidate, in-flight dedupe, provider-id overrides) lives in modelsDevCatalog.server.ts and backs both context-window and capability resolution off a single cached fetch.
  • Both resolved fields are exposed on the client-safe LanguageModelInfo (via getConfiguredLanguageModelsInfo and the MCP ask path).

This is pure capability plumbing. It adds no attachment UI (future work).

Behavior

  • Catalogued models light up their real capabilities automatically, with no config.json changes.
  • Uncatalogued models (e.g. openai-compatible / self-hosted endpoints) fall back to text-only with no document support: the model stays fully usable for normal chat, and richer attachments stay gated off until support can be positively confirmed.

Add an optional `inputModalities` declaration to language model config and
expose a resolved capability set to the client.

- Schema: add optional `inputModalities` (`text` | `image` | `pdf`) to every
  provider definition in `schemas/v3/languageModel.json` and regenerate the
  schema types/snippets.
- Add a fail-closed `resolveModelInputModalities` resolver that defaults to
  text-only when a model does not declare its input modalities.
- Expose the resolved `inputModalities` on the client-safe `LanguageModelInfo`
  (populated via `getConfiguredLanguageModelsInfo` and the MCP ask path).

This is groundwork for chat file attachments. It adds no attachment UI and no
live provider capability probing yet.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Language-model capability fields were added to schema and type contracts, mirrored in generated JSON/docs, and propagated into chat metadata plus MCP askCodebase/server payloads. The web code resolves omitted modality and document-type values to text-only and no-document defaults.

Changes

Type contracts

Layer / File(s) Summary
Provider model interfaces
packages/schemas/src/v3/index.type.ts, packages/schemas/src/v3/languageModel.type.ts
Provider language-model interfaces add optional inputModalities and supportedDocumentTypes fields across the supported provider types.

Schema contracts

Layer / File(s) Summary
Schema definitions
packages/schemas/src/v3/index.schema.ts, packages/schemas/src/v3/languageModel.schema.ts
Matching JSON-schema definitions add the same optional fields and fail-closed descriptions in both base definitions and oneOf provider variants.

Published artifacts

Layer / File(s) Summary
Generated schema and docs mirrors
schemas/v3/languageModel.json, docs/snippets/schemas/v3/index.schema.mdx, docs/snippets/schemas/v3/languageModel.schema.mdx, CHANGELOG.md
The generated JSON schema, docs snippets, and changelog mirror the new capability fields and defaults.

Capability helpers and tooling

Layer / File(s) Summary
Chat capability metadata
packages/web/src/features/chat/types.ts, packages/web/src/features/chat/modelCapabilities.ts, packages/web/src/features/chat/utils.ts
Chat metadata types add explicit modality and document-type fields with defaults, capability resolvers apply fail-closed behavior, and getLanguageModelKey accepts only identifying fields.

Payload wiring

Layer / File(s) Summary
Server payload enrichment
packages/web/src/features/chat/utils.server.ts, packages/web/src/ee/features/mcp/askCodebase.ts
Server-side language-model mappings and the MCP AskCodebaseResult.languageModel payload now include resolved capability fields.

Sequence Diagram(s)

sequenceDiagram
  participant modelCapabilities as "packages/web/src/features/chat/modelCapabilities.ts"
  participant utilsServer as "packages/web/src/features/chat/utils.server.ts"
  participant askCodebase as "packages/web/src/ee/features/mcp/askCodebase.ts"
  utilsServer->>modelCapabilities: resolveModelInputModalities(languageModelConfig)
  utilsServer->>modelCapabilities: resolveModelSupportedDocumentTypes(languageModelConfig)
  askCodebase->>modelCapabilities: resolveModelInputModalities(languageModelConfig)
  askCodebase->>modelCapabilities: resolveModelSupportedDocumentTypes(languageModelConfig)
  utilsServer->>utilsServer: add inputModalities and supportedDocumentTypes to LanguageModelInfo
  askCodebase->>askCodebase: attach inputModalities and supportedDocumentTypes to AskCodebaseResult.languageModel
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • msukkari
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly reflects the main change: resolving language model capabilities from models.dev.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch whoisthey/language-model-input-modalities

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mintlify

mintlify Bot commented Jun 26, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
sourcebot 🟢 Ready View Preview Jun 26, 2026, 4:07 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@mintlify

mintlify Bot commented Jun 26, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
sourcebot 🟡 Building Jun 26, 2026, 3:58 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

inputModalities now only enumerates true perceptual channels
(text | image | audio | video). Document/container formats like PDF
move to a separate fail-closed `supportedDocumentTypes` field, since
PDF is not a model modality but a format providers decompose into
text/image internally.

Co-authored-by: Cursor <cursoragent@cursor.com>
Tighten the inputModalities / supportedDocumentTypes descriptions to
remove the implication that omitting supportedDocumentTypes blocks all
non-text attachments. Clarify the taxonomy: single-medium files
(images, audio, video) and plain-text files (.txt, .md) are governed by
inputModalities; supportedDocumentTypes only gates rich compound
container formats like PDF.

Co-authored-by: Cursor <cursoragent@cursor.com>
whoisthey and others added 2 commits June 26, 2026 10:25
LanguageModelInfo now has required inputModalities/supportedDocumentTypes,
so a raw LanguageModel config (where those are optional) is no longer
assignable to it. getLanguageModelKey only reads provider/model/displayName,
so type its parameter as that Pick subset, letting both LanguageModel and
LanguageModelInfo be keyed. Fixes the docker build type check.

Co-authored-by: Cursor <cursoragent@cursor.com>
Two dev-experience fixes for the stale-build-output footgun:

- schemas watch now runs `yarn build` (generate + tsc) instead of
  generate-only, so editing a schema JSON during `yarn dev` refreshes
  dist (both the .d.ts types and the runtime index.schema.js used by
  ajv), not just the generated source.
- web tsconfig maps @sourcebot/schemas/v3|v2/* to the package source,
  so type-checking and the IDE read committed source directly instead
  of stale built .d.ts. Web only imports .type files (erased at
  compile), so there is no bundling/runtime impact.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions

Copy link
Copy Markdown
Contributor

License Audit

⚠️ Status: PASS

Metric Count
Total packages 2129
Resolved (non-standard) 11
Unresolved 0
Strong copyleft 0
Weak copyleft 39

Weak Copyleft Packages (informational)

Package Version License
@img/sharp-libvips-darwin-arm64 1.0.4 LGPL-3.0-or-later
@img/sharp-libvips-darwin-arm64 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-darwin-x64 1.0.4 LGPL-3.0-or-later
@img/sharp-libvips-darwin-x64 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-arm 1.0.5 LGPL-3.0-or-later
@img/sharp-libvips-linux-arm 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-arm64 1.0.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-arm64 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-ppc64 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-riscv64 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-s390x 1.0.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-s390x 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-x64 1.0.4 LGPL-3.0-or-later
@img/sharp-libvips-linux-x64 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linuxmusl-arm64 1.0.4 LGPL-3.0-or-later
@img/sharp-libvips-linuxmusl-arm64 1.2.4 LGPL-3.0-or-later
@img/sharp-libvips-linuxmusl-x64 1.0.4 LGPL-3.0-or-later
@img/sharp-libvips-linuxmusl-x64 1.2.4 LGPL-3.0-or-later
@img/sharp-wasm32 0.33.5 Apache-2.0 AND LGPL-3.0-or-later AND MIT
@img/sharp-wasm32 0.34.5 Apache-2.0 AND LGPL-3.0-or-later AND MIT
@img/sharp-win32-arm64 0.34.5 Apache-2.0 AND LGPL-3.0-or-later
@img/sharp-win32-ia32 0.33.5 Apache-2.0 AND LGPL-3.0-or-later
@img/sharp-win32-ia32 0.34.5 Apache-2.0 AND LGPL-3.0-or-later
@img/sharp-win32-x64 0.33.5 Apache-2.0 AND LGPL-3.0-or-later
@img/sharp-win32-x64 0.34.5 Apache-2.0 AND LGPL-3.0-or-later
axe-core 4.10.3 MPL-2.0
dompurify 3.4.11 (MPL-2.0 OR Apache-2.0)
lightningcss 1.32.0 MPL-2.0
lightningcss-android-arm64 1.32.0 MPL-2.0
lightningcss-darwin-arm64 1.32.0 MPL-2.0
lightningcss-darwin-x64 1.32.0 MPL-2.0
lightningcss-freebsd-x64 1.32.0 MPL-2.0
lightningcss-linux-arm-gnueabihf 1.32.0 MPL-2.0
lightningcss-linux-arm64-gnu 1.32.0 MPL-2.0
lightningcss-linux-arm64-musl 1.32.0 MPL-2.0
lightningcss-linux-x64-gnu 1.32.0 MPL-2.0
lightningcss-linux-x64-musl 1.32.0 MPL-2.0
lightningcss-win32-arm64-msvc 1.32.0 MPL-2.0
lightningcss-win32-x64-msvc 1.32.0 MPL-2.0
Resolved Packages (11)
Package Version Original Resolved Source
@react-grab/cli 0.1.23 UNKNOWN MIT GitHub repo (github.com/aidenybai/react-grab README + LICENSE)
@react-grab/cli 0.1.29 UNKNOWN MIT GitHub repo (github.com/aidenybai/react-grab README + LICENSE)
@react-grab/mcp 0.1.29 UNKNOWN MIT GitHub repo (github.com/aidenybai/react-grab README + LICENSE)
codemirror-lang-elixir 4.0.0 UNKNOWN Apache-2.0 npm registry (registry.npmjs.org top-level license field)
element-source 0.0.3 UNKNOWN MIT GitHub repo (github.com/aidenybai/element-source LICENSE)
lezer-elixir 1.1.2 UNKNOWN Apache-2.0 npm registry (registry.npmjs.org top-level license field)
map-stream 0.1.0 UNKNOWN MIT npm registry (registry.npmjs.org top-level license field)
memorystream 0.3.1 UNKNOWN MIT npm registry (licenses array object [{type:MIT}])
valid-url 1.0.9 UNKNOWN MIT GitHub repo (raw LICENSE file github.com/ogt/valid-url)
pause-stream 0.0.11 ["MIT","Apache2"] MIT extracted from object (license array ["MIT","Apache2"])
posthog-js 1.369.0 SEE LICENSE IN LICENSE MIT npm registry (registry.npmjs.org top-level license field)

….json

Re-source language model input-modality / document capabilities from the
models.dev catalog instead of hand-declared config.json fields, aligning
with the move to de-emphasize on-disk config in favor of automatic
resolution (the same catalog already backs context-window resolution).

- Revert the inputModalities/supportedDocumentTypes additions to
  schemas/v3/languageModel.json and all regenerated artifacts; capabilities
  are no longer declared in config.json.
- Extract the shared models.dev catalog plumbing (fetch/TTL/negative-cache/
  stale-while-revalidate/provider-id overrides) into modelsDevCatalog.server.ts,
  now consumed by both context-window and capability resolution.
- Add models.dev-backed resolveModelCapabilities (modelCapabilities.server.ts),
  partitioning the catalog's modalities.input list into Sourcebot's
  inputModalities (channels) and supportedDocumentTypes (containers); falls back
  to text-only for uncatalogued / self-hosted models.

The client-safe LanguageModelInfo contract is unchanged; only the resolution
backend moved.

Co-authored-by: Cursor <cursoragent@cursor.com>
@whoisthey whoisthey changed the title feat(web): add language model inputModalities capability plumbing refactor(web): resolve language model capabilities from models.dev Jun 27, 2026
jsourcebot
jsourcebot previously approved these changes Jun 27, 2026
Comment thread packages/web/src/features/chat/types.ts Outdated
brendan-kellam
brendan-kellam previously approved these changes Jun 27, 2026
Comment thread packages/web/src/features/chat/types.ts Outdated
@whoisthey whoisthey dismissed stale reviews from brendan-kellam and jsourcebot via bf79260 June 27, 2026 19:57
@whoisthey whoisthey merged commit d546511 into main Jun 27, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants