modelcontextprotocol · maxisbey · Jun 29, 2026 · Jun 28, 2026 · Jun 29, 2026 · claude
diff --git a/docs/advanced/caching.md b/docs/advanced/caching.md
@@ -0,0 +1,64 @@
+# Caching hints
+
+Every result a server returns for `tools/list`, `prompts/list`, `resources/list`, `resources/templates/list`, `resources/read` and `server/discover` carries two fields on the 2026-07-28 protocol: `ttlMs`, how many milliseconds a client may treat the result as fresh, and `cacheScope`, whether a cached result may be shared across users (`"public"`) or belongs to one authorization context (`"private"`).
+
+The server doesn't cache anything. The fields are a *declaration*: "this tool list is the same for everyone and won't change for a minute." A client (or a gateway in front of you) may then skip the round trip. Honoring the hints is the client's choice; emitting them is the server's job, and the SDK does it for you.
+
+Out of the box every result says `ttlMs: 0, cacheScope: "private"` — immediately stale, never shared. That is always safe and always conformant. If your lists really are stable and identical for all callers, say so at construction:
+
+```python title="server.py" hl_lines="5-8"
+--8<-- "docs_src/caching/tutorial001.py"
+```
+
+* The map is keyed by **method name** — the six cacheable methods are the only legal keys. The parameter is typed `Mapping[CacheableMethod, CacheHint]`, so your editor autocompletes the keys and flags a typo before you run; anything that slips past the type checker raises at construction.
+* A method you don't mention keeps the defaults. The map is a set of overrides, not a manifest.
+* `CacheHint(ttl_ms=5_000)` left `scope` unset, so it stays `"private"`: five seconds of freshness, per caller. Scope and TTL are independent decisions.
+* `"server/discover"` is a legal key too — the handshake result is cacheable like any list.
+
+!!! warning
+    `cacheScope: "public"` means *anyone* may be served your cached response — a shared
+    gateway will happily hand one user's result to another, even when the request was
+    authenticated. Mark a result `"public"` only when it is identical for every caller, and
+    never use `cacheScope` as access control: it is a label, not a lock.
+
+## Per-handler override
+
+On the low-level `Server`, handlers build their results by hand — and `ttl_ms` / `cache_scope` are just fields on the result models. A handler that sets them explicitly always wins over the constructor map, field by field:
+
+```python title="server.py" hl_lines="11 17"
+--8<-- "docs_src/caching/tutorial002.py"
+```
+
+The handler said `ttl_ms=1_000` and nothing about scope. On the wire: `ttlMs: 1000` (the handler's, not the map's `60_000`) and `cacheScope: "public"` (the map's — the handler left it unset). Explicit beats configured, configured beats default — per field, so a handler can pin one field and leave the other to the server-wide policy.
+
+This is also the escape hatch for dynamics the constructor can't know: a handler that filters `resources/read` per user can return `cache_scope="private"` for one URI from an otherwise-public server.
+
+One caveat on paginated lists: the protocol requires the **same `cacheScope` on every page** of one list. The constructor map satisfies that by construction — it's keyed by method, not by page. But a handler that overrides the scope itself owns that consistency: override it on *every* page, never only when a cursor is present, or page one and page two will disagree.
+
+## What the client sees
+
+On the client, the hints arrive as plain fields on every cacheable result — `ttl_ms` and `cache_scope`, already parsed:
+
+```python title="client.py" hl_lines="15"
+--8<-- "docs_src/caching/tutorial003.py"
+```
+
+The SDK parses; it does not (yet) act. There is no built-in response cache: calling `list_tools()` twice makes two round trips, whatever the TTL said. The spec makes honoring optional — a client that ignores the hints entirely is fully conformant — so until the SDK grows a response cache, the supported path is to read the fields and do your own bookkeeping:
+
+* **Freshness** is `now < t_received + ttl_ms / 1000`: record the clock when the response arrives, and treat the result as reusable until the TTL runs out. `ttl_ms == 0` means *immediately stale* — don't reuse it at all.
+* **Scope is a sharing rule, not a suggestion.** A `"private"` result may be reused only within the same authorization context — same access token, same cache. Never put `"private"` results in a cache shared across users.
+* **Notifications beat TTL.** If the server sends `list_changed` while your copy is still fresh, the copy is stale now — re-fetch.
+
+Against an **older server** (pre-2026 protocol), the fields are simply absent from the wire, and the models show their conservative defaults: `ttl_ms == 0`, `cache_scope == "private"` — stale and unshared, the right assumption for a server that declared nothing. If you need to distinguish "the server said 0" from "the server said nothing", check `"ttl_ms" in result.model_fields_set`: it's only set when the field actually arrived.
+
+## Older clients
+
+Clients on pre-2026 protocol versions never see either field — the SDK strips them at serialization for those connections. Configure your hints once; there is nothing version-specific to write.
+
+## Recap
+
+* Six methods carry `ttlMs`/`cacheScope`; the SDK defaults them to `0`/`"private"` — stale and unshared, always safe.
+* `cache_hints={method: CacheHint(...)}` at construction (both `MCPServer` and `Server`) sets server-wide values per method.
+* A handler that sets the fields on its result overrides the map, per field.
+* `"public"` is a promise that the result is identical for every caller. It is not access control.
+* Clients read the hints as `result.ttl_ms` / `result.cache_scope` and own the caching decision themselves — the SDK has no built-in response cache yet.
diff --git a/docs/migration.md b/docs/migration.md
@@ -1598,7 +1598,7 @@ The implementation is responsible for validating the assertion per RFC 7523 §3
 
 ### 2025-11-25 and 2026-07-28 protocol fields modeled
 
-`mcp_types` models the 2025-11-25 and 2026-07-28 protocol fields (e.g. `resultType`, `ttlMs`/`cacheScope` on cacheable results, `inputResponses`/`requestState` on retried requests), so inbound payloads carrying these keys parse into typed fields and round-trip. `ttlMs`/`cacheScope` default to `0`/`"private"` (immediately stale, not shared-cacheable); `resultType` defaults to `"complete"` on concrete results (`None` on `EmptyResult`); the server strips all of them from the wire at pre-2026 versions.
+`mcp_types` models the 2025-11-25 and 2026-07-28 protocol fields (e.g. `resultType`, `ttlMs`/`cacheScope` on cacheable results, `inputResponses`/`requestState` on retried requests), so inbound payloads carrying these keys parse into typed fields and round-trip. `ttlMs`/`cacheScope` default to `0`/`"private"` (immediately stale, not shared-cacheable); `resultType` defaults to `"complete"` on concrete results (`None` on `EmptyResult`); the server strips all of them from the wire at pre-2026 versions. Servers set per-method values with `cache_hints={method: CacheHint(...)}` on the `Server`/`MCPServer` constructor — see [Caching hints](advanced/caching.md).
 
 ### `streamable_http_app()` available on lowlevel Server
 

diff --git a/docs_src/caching/__init__.py b/docs_src/caching/__init__.py
diff --git a/docs_src/caching/tutorial001.py b/docs_src/caching/tutorial001.py
@@ -0,0 +1,19 @@
+from mcp.server import CacheHint, MCPServer
+
+mcp = MCPServer(
+    "Weather",
+    cache_hints={
+        "tools/list": CacheHint(ttl_ms=60_000, scope="public"),
+        "resources/read": CacheHint(ttl_ms=5_000),
+    },
+)
+
+
+@mcp.tool()
+def forecast(city: str) -> str:
+    return f"Sunny in {city}"
+
+
+@mcp.resource("config://units")
+def units() -> str:
+    return "metric"
diff --git a/docs_src/caching/tutorial002.py b/docs_src/caching/tutorial002.py
@@ -0,0 +1,18 @@
+from typing import Any
+
+from mcp_types import ListToolsResult, PaginatedRequestParams, Tool
+
+from mcp.server import CacheHint, Server, ServerRequestContext
+
+TOOLS = [Tool(name="forecast", input_schema={"type": "object"})]
+
+
+async def list_tools(ctx: ServerRequestContext[Any], params: PaginatedRequestParams | None) -> ListToolsResult:
+    return ListToolsResult(tools=TOOLS, ttl_ms=1_000)
+
+
+server = Server(
+    "Weather",
+    on_list_tools=list_tools,
+    cache_hints={"tools/list": CacheHint(ttl_ms=60_000, scope="public")},
+)
diff --git a/docs_src/caching/tutorial003.py b/docs_src/caching/tutorial003.py
@@ -0,0 +1,15 @@
+from mcp import Client
+from mcp.server import CacheHint, MCPServer
+
+mcp = MCPServer("Weather", cache_hints={"tools/list": CacheHint(ttl_ms=60_000, scope="public")})
+
+
+@mcp.tool()
+def forecast(city: str) -> str:
+    return f"Sunny in {city}"
+
+
+async def main() -> None:
+    async with Client(mcp) as client:
+        tools = await client.list_tools()
+        print(f"{len(tools.tools)} tools, fresh for {tools.ttl_ms / 1000:.0f}s, scope={tools.cache_scope}")
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -43,6 +43,7 @@ nav:
       - The low-level Server: advanced/low-level-server.md
       - URI templates: advanced/uri-templates.md
       - Pagination: advanced/pagination.md
+      - Caching hints: advanced/caching.md
       - Middleware: advanced/middleware.md
       - Extensions: advanced/extensions.md
       - MCP Apps: advanced/apps.md

diff --git a/src/mcp/server/__init__.py b/src/mcp/server/__init__.py
@@ -1,6 +1,7 @@
+from .caching import CacheHint
 from .context import ServerRequestContext
 from .lowlevel import NotificationOptions, Server
 from .mcpserver import MCPServer
 from .models import InitializationOptions
 
-__all__ = ["Server", "ServerRequestContext", "MCPServer", "NotificationOptions", "InitializationOptions"]
+__all__ = ["CacheHint", "Server", "ServerRequestContext", "MCPServer", "NotificationOptions", "InitializationOptions"]
diff --git a/src/mcp/server/caching.py b/src/mcp/server/caching.py
@@ -0,0 +1,98 @@
+"""Server-side caching hints (SEP-2549, protocol revision 2026-07-28).
+
+Results for the cacheable methods carry `ttlMs`/`cacheScope` freshness hints.
+A handler sets them by returning a result with explicit `ttl_ms`/`cache_scope`
+values; `Server(cache_hints={method: CacheHint(...)})` fills them for handlers
+that don't. Fields the handler set win, per field, so a server-wide hint never
+overrides a handler's explicit choice.
+"""
+
+from __future__ import annotations
+
+from collections.abc import Mapping
+from dataclasses import dataclass
+from typing import Any, Final, Literal, TypeVar, get_args
+
+import mcp_types as types
+
+__all__ = ["CACHEABLE_METHODS", "CacheHint", "CacheableMethod", "apply_cache_hint", "validate_cache_hints"]
+
+CacheableMethod = Literal[
+    "prompts/list",
+    "resources/list",
+    "resources/read",
+    "resources/templates/list",
+    "server/discover",
+    "tools/list",
+]
+"""The methods whose results carry `ttlMs`/`cacheScope`. Closed set: the spec
+defines caching hints on exactly these six (tests pin it to which result models
+mix in `CacheableResult`)."""
+
+CACHEABLE_METHODS: Final[frozenset[str]] = frozenset(get_args(CacheableMethod))
+"""Runtime mirror of `CacheableMethod`, for callers the type checker can't see."""
+
+
+@dataclass(frozen=True, slots=True)
+class CacheHint:
+    """Freshness hint for one cacheable method's results.
+
+    `ttl_ms` is how long, in milliseconds, a client may consider the result
+    fresh (`0` means immediately stale). `scope` is whether a cached result may
+    be shared across authorization contexts (`"public"`) or only reused within
+    the one that produced it (`"private"`).
+    """
+
+    ttl_ms: int = 0
+    scope: Literal["public", "private"] = "private"
+
+    def __post_init__(self) -> None:
+        if self.ttl_ms < 0:
+            raise ValueError(f"ttl_ms must be >= 0, got {self.ttl_ms}")
+        if self.scope not in ("public", "private"):
+            raise ValueError(f"scope must be 'public' or 'private', got {self.scope!r}")
+
+
+CacheableResultT = TypeVar("CacheableResultT", bound=types.CacheableResult)
+
+
+def apply_cache_hint(result: CacheableResultT, hint: CacheHint) -> CacheableResultT:
+    """Fill `ttl_ms`/`cache_scope` on `result` from `hint`.
+
+    Per-field: a field the handler set explicitly - even to its default value,
+    tracked via `model_fields_set` - is left alone; only unset fields take the
+    hint. A handler constructing results with `model_construct` bypasses that
+    tracking and is treated as having set nothing.
+    """
+    update: dict[str, int | str] = {}
+    if "ttl_ms" not in result.model_fields_set:
+        update["ttl_ms"] = hint.ttl_ms
+    if "cache_scope" not in result.model_fields_set:
+        update["cache_scope"] = hint.scope
+    return result.model_copy(update=update) if update else result
+
+
+def validate_cache_hints(cache_hints: Mapping[Any, Any] | None) -> dict[str, CacheHint]:
+    """Validate a `cache_hints` constructor argument into a plain dict.
+
+    The `Server`/`MCPServer` signatures already close the key set and value
+    type for type-checked callers; this runtime gate is deliberately loose in
+    its parameter so it covers everyone else (e.g. a map deserialized from
+    config) - a bad entry fails at construction, not on the first request to
+    that method.
+
+    Raises:
+        ValueError: If a key is not a cacheable method.
+        TypeError: If a value is not a `CacheHint`.
+    """
+    if cache_hints is None:
+        return {}
+    unknown = sorted(method for method in cache_hints if method not in CACHEABLE_METHODS)
+    if unknown:
+        raise ValueError(f"cache_hints keys must be cacheable methods (see CacheableMethod); got: {', '.join(unknown)}")
+    validated: dict[str, CacheHint] = {}
+    for method, hint in cache_hints.items():
+        if not isinstance(hint, CacheHint):
+            raise TypeError(f"cache_hints[{method!r}] must be a CacheHint, got {type(hint).__name__}")
+        validated[method] = hint
+    return validated
diff --git a/src/mcp/server/lowlevel/server.py b/src/mcp/server/lowlevel/server.py
@@ -38,7 +38,7 @@ async def main():
 
 import logging
 import warnings
-from collections.abc import AsyncIterator, Awaitable, Callable
+from collections.abc import AsyncIterator, Awaitable, Callable, Mapping
 from contextlib import AbstractAsyncContextManager, asynccontextmanager
 from dataclasses import dataclass
 from importlib.metadata import version as importlib_version
@@ -59,6 +59,7 @@ async def main():
 from mcp.server.auth.provider import OAuthAuthorizationServerProvider, TokenVerifier
 from mcp.server.auth.routes import build_resource_metadata_url, create_auth_routes, create_protected_resource_routes
 from mcp.server.auth.settings import AuthSettings
+from mcp.server.caching import CacheableMethod, CacheHint, validate_cache_hints
 from mcp.server.context import HandlerResult, ServerMiddleware, ServerRequestContext
 from mcp.server.models import InitializationOptions
 from mcp.server.runner import serve_loop
@@ -140,6 +141,7 @@ def __init__(
         instructions: str | None = None,
         website_url: str | None = None,
         icons: list[types.Icon] | None = None,
+        cache_hints: Mapping[CacheableMethod, CacheHint] | None = None,
         lifespan: Callable[
             [Server[LifespanResultT]],
             AbstractAsyncContextManager[LifespanResultT],
@@ -222,6 +224,7 @@ def __init__(
         instructions: str | None = None,
         website_url: str | None = None,
         icons: list[types.Icon] | None = None,
+        cache_hints: Mapping[CacheableMethod, CacheHint] | None = None,
         lifespan: Callable[
             [Server[LifespanResultT]],
             AbstractAsyncContextManager[LifespanResultT],
@@ -313,6 +316,7 @@ def __init__(
         instructions: str | None = None,
         website_url: str | None = None,
         icons: list[types.Icon] | None = None,
+        cache_hints: Mapping[CacheableMethod, CacheHint] | None = None,
         lifespan: Callable[
             [Server[LifespanResultT]],
             AbstractAsyncContextManager[LifespanResultT],
@@ -420,6 +424,9 @@ def __init__(
         self.instructions = instructions
         self.website_url = website_url
         self.icons = icons
+        # Per-method `ttl_ms`/`cache_scope` fills, applied by `ServerRunner`
+        # after the handler returns; fields the handler set explicitly win.
+        self.cache_hints: dict[str, CacheHint] = validate_cache_hints(cache_hints)
         self.lifespan = lifespan
         self._request_handlers: dict[str, HandlerEntry[LifespanResultT]] = {}
         self._notification_handlers: dict[str, HandlerEntry[LifespanResultT]] = {}

diff --git a/src/mcp/server/mcpserver/server.py b/src/mcp/server/mcpserver/server.py
@@ -4,7 +4,7 @@
 
 import base64
 import inspect
-from collections.abc import AsyncIterator, Awaitable, Callable, Iterable, Sequence
+from collections.abc import AsyncIterator, Awaitable, Callable, Iterable, Mapping, Sequence
 from contextlib import AbstractAsyncContextManager, asynccontextmanager
 from typing import Any, Generic, Literal, TypeVar, overload
 
@@ -58,6 +58,7 @@
 from mcp.server.auth.middleware.bearer_auth import BearerAuthBackend, RequireAuthMiddleware
 from mcp.server.auth.provider import OAuthAuthorizationServerProvider, ProviderTokenVerifier, TokenVerifier
 from mcp.server.auth.settings import AuthSettings
+from mcp.server.caching import CacheableMethod, CacheHint
 from mcp.server.context import HandlerResult, ServerRequestContext
 from mcp.server.extension import (
     Extension,
@@ -169,6 +170,7 @@ def __init__(
         lifespan: Callable[[MCPServer[LifespanResultT]], AbstractAsyncContextManager[LifespanResultT]] | None = None,
         auth: AuthSettings | None = None,
         resource_security: ResourceSecurity = DEFAULT_RESOURCE_SECURITY,
+        cache_hints: Mapping[CacheableMethod, CacheHint] | None = None,
     ):
         self._resource_security = resource_security
         self.settings = Settings(
@@ -196,6 +198,7 @@ def __init__(
             website_url=website_url,
             icons=icons,
             version=version,
+            cache_hints=cache_hints,
             on_list_tools=self._handle_list_tools,
             on_call_tool=self._handle_call_tool,
             on_list_resources=self._handle_list_resources,

diff --git a/src/mcp/server/runner.py b/src/mcp/server/runner.py
@@ -28,6 +28,7 @@
     INVALID_PARAMS,
     METHOD_NOT_FOUND,
     PROTOCOL_VERSION_META_KEY,
+    CacheableResult,
     ErrorData,
     Implementation,
     InitializeRequestParams,
@@ -40,6 +41,7 @@
 from pydantic import BaseModel, ValidationError
 from typing_extensions import TypeVar
 
+from mcp.server.caching import apply_cache_hint
 from mcp.server.connection import Connection
 from mcp.server.context import CallNext, HandlerResult, ServerMiddleware, ServerRequestContext
 from mcp.server.models import InitializationOptions
@@ -196,6 +198,12 @@ async def _inner(ctx: ServerRequestContext[LifespanT, Any]) -> HandlerResult:
             if isinstance(result, ErrorData):
                 # Raise inside the chain so middleware observes the failure.
                 raise MCPError.from_error_data(result)
+            # Fill cache hints on the typed result, before the serialize sieve
+            # decides whether the negotiated version carries the fields at all.
+            # `input_required` interim results are not `CacheableResult` models,
+            # so the MRTR carve-out (no hints on them) holds by shape.
+            if isinstance(result, CacheableResult) and (hint := self.server.cache_hints.get(method)) is not None:
+                result = apply_cache_hint(result, hint)
             # Dump and serialize inside the chain so the OpenTelemetry span (the
             # outermost middleware) records a failing handler return shape too.
             return self._serialize(method, version, result)