refactor(web): resolve language model capabilities from models.dev#1372
Conversation
Add an optional `inputModalities` declaration to language model config and expose a resolved capability set to the client. - Schema: add optional `inputModalities` (`text` | `image` | `pdf`) to every provider definition in `schemas/v3/languageModel.json` and regenerate the schema types/snippets. - Add a fail-closed `resolveModelInputModalities` resolver that defaults to text-only when a model does not declare its input modalities. - Expose the resolved `inputModalities` on the client-safe `LanguageModelInfo` (populated via `getConfiguredLanguageModelsInfo` and the MCP ask path). This is groundwork for chat file attachments. It adds no attachment UI and no live provider capability probing yet. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughLanguage-model capability fields were added to schema and type contracts, mirrored in generated JSON/docs, and propagated into chat metadata plus MCP askCodebase/server payloads. The web code resolves omitted modality and document-type values to text-only and no-document defaults. ChangesType contracts
Schema contracts
Published artifacts
Capability helpers and tooling
Payload wiring
Sequence Diagram(s)sequenceDiagram
participant modelCapabilities as "packages/web/src/features/chat/modelCapabilities.ts"
participant utilsServer as "packages/web/src/features/chat/utils.server.ts"
participant askCodebase as "packages/web/src/ee/features/mcp/askCodebase.ts"
utilsServer->>modelCapabilities: resolveModelInputModalities(languageModelConfig)
utilsServer->>modelCapabilities: resolveModelSupportedDocumentTypes(languageModelConfig)
askCodebase->>modelCapabilities: resolveModelInputModalities(languageModelConfig)
askCodebase->>modelCapabilities: resolveModelSupportedDocumentTypes(languageModelConfig)
utilsServer->>utilsServer: add inputModalities and supportedDocumentTypes to LanguageModelInfo
askCodebase->>askCodebase: attach inputModalities and supportedDocumentTypes to AskCodebaseResult.languageModel
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-authored-by: Cursor <cursoragent@cursor.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
inputModalities now only enumerates true perceptual channels (text | image | audio | video). Document/container formats like PDF move to a separate fail-closed `supportedDocumentTypes` field, since PDF is not a model modality but a format providers decompose into text/image internally. Co-authored-by: Cursor <cursoragent@cursor.com>
Tighten the inputModalities / supportedDocumentTypes descriptions to remove the implication that omitting supportedDocumentTypes blocks all non-text attachments. Clarify the taxonomy: single-medium files (images, audio, video) and plain-text files (.txt, .md) are governed by inputModalities; supportedDocumentTypes only gates rich compound container formats like PDF. Co-authored-by: Cursor <cursoragent@cursor.com>
LanguageModelInfo now has required inputModalities/supportedDocumentTypes, so a raw LanguageModel config (where those are optional) is no longer assignable to it. getLanguageModelKey only reads provider/model/displayName, so type its parameter as that Pick subset, letting both LanguageModel and LanguageModelInfo be keyed. Fixes the docker build type check. Co-authored-by: Cursor <cursoragent@cursor.com>
Two dev-experience fixes for the stale-build-output footgun: - schemas watch now runs `yarn build` (generate + tsc) instead of generate-only, so editing a schema JSON during `yarn dev` refreshes dist (both the .d.ts types and the runtime index.schema.js used by ajv), not just the generated source. - web tsconfig maps @sourcebot/schemas/v3|v2/* to the package source, so type-checking and the IDE read committed source directly instead of stale built .d.ts. Web only imports .type files (erased at compile), so there is no bundling/runtime impact. Co-authored-by: Cursor <cursoragent@cursor.com>
License Audit
Weak Copyleft Packages (informational)
Resolved Packages (11)
|
….json Re-source language model input-modality / document capabilities from the models.dev catalog instead of hand-declared config.json fields, aligning with the move to de-emphasize on-disk config in favor of automatic resolution (the same catalog already backs context-window resolution). - Revert the inputModalities/supportedDocumentTypes additions to schemas/v3/languageModel.json and all regenerated artifacts; capabilities are no longer declared in config.json. - Extract the shared models.dev catalog plumbing (fetch/TTL/negative-cache/ stale-while-revalidate/provider-id overrides) into modelsDevCatalog.server.ts, now consumed by both context-window and capability resolution. - Add models.dev-backed resolveModelCapabilities (modelCapabilities.server.ts), partitioning the catalog's modalities.input list into Sourcebot's inputModalities (channels) and supportedDocumentTypes (containers); falls back to text-only for uncatalogued / self-hosted models. The client-safe LanguageModelInfo contract is unchanged; only the resolution backend moved. Co-authored-by: Cursor <cursoragent@cursor.com>
bf79260
Lays the groundwork for chat file attachments by teaching Sourcebot which inputs a configured language model can accept, resolved automatically from the models.dev catalog (the same catalog that already backs context-window resolution).
What this adds
resolveModelCapabilities(packages/web/src/features/chat/modelCapabilities.server.ts) looks a model'smodalities.inputup in the models.dev catalog and partitions it into two buckets:inputModalities(text|image|audio|video) — perceptual channels the model encodes natively.supportedDocumentTypes(pdf) — rich compound container formats providers decompose server-side. models.dev foldspdfintomodalities.input; we split it back out, since a document is not a modality.modelsDevCatalog.server.tsand backs both context-window and capability resolution off a single cached fetch.LanguageModelInfo(viagetConfiguredLanguageModelsInfoand the MCP ask path).This is pure capability plumbing. It adds no attachment UI (future work).
Behavior
config.jsonchanges.openai-compatible/ self-hosted endpoints) fall back to text-only with no document support: the model stays fully usable for normal chat, and richer attachments stay gated off until support can be positively confirmed.