Harden the dual-era stream loop's era-lock and rejection semantics#3040
Open
maxisbey wants to merge 3 commits into
Open
Harden the dual-era stream loop's era-lock and rejection semantics#3040maxisbey wants to merge 3 commits into
maxisbey wants to merge 3 commits into
Conversation
Post-merge review follow-up to #3038, fixing the cases where a failed request could still lock the connection's era: - The modern era now locks only when a request SUCCEEDS, mirroring the legacy side's lock-on-success rule. Previously the lock committed at classification time, so a first request with a well-formed envelope but malformed content - or subscriptions/listen, or an unknown method - locked the connection modern after failing; the client's initialize fallback then got -32022, the one code auto-negotiating clients do not fall back from, stranding the connection for its lifetime (a whole subprocess on stdio). - initialize is legacy-distinctive by definition (the method does not exist at modern versions), so it always takes the handshake path now, even when a confused client stamps the envelope triple on it. Previously it routed modern, failed METHOD_NOT_FOUND, and locked the era modern - bricking the handshake the same way. - A bare server/discover on a legacy-locked connection falls through to the loop runner's per-version surface validation again, restoring the byte-identical METHOD_NOT_FOUND a handshake-only server produced; only envelope-bearing frames get the INVALID_REQUEST rejection (a conforming legacy client can never send the reserved triple). - classify_inbound_request rejects a present-but-non-string protocol version as a descriptive INVALID_PARAMS instead of raising ValidationError out of its own -32022 payload construction (`requested` is a str field); the dispatcher's exception ladder happened to mask that as a generic -32602. - _NoServerRequestsDispatchContext masks transport.can_send_request so the per-message transport metadata agrees with the wrapper's denial, matching the modern HTTP entry's context. - NotifyOnlyOutbound inherits _NoChannelOutbound's refusal instead of duplicating it, and the has_standalone_channel / Raises contracts now document that a modern connection refuses requests on a real channel. - Docs: the migration guide and protocol-versions page note the discover probe is transport-independent; the subscriptions page documents the streamable-HTTP-only limitation of subscriptions/listen; serve_loop's stale shared-recipe claim is corrected. Also replaces two pyright suppressions in the stdio test doubles with typeshed-compatible Buffer signatures.
Contributor
📚 Documentation preview
|
There was a problem hiding this comment.
1 issue found across 9 files
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
The dual-era loop's lock-on-success rule approximated "client-visible success" with "the handler returned", leaving windows where a request the client saw fail still moved the era: a pipelined envelope-bearing request completing after an inline initialize overwrote the committed legacy lock (stranding the freshly-handshaked client), and a peer cancel landing during the handler's shielded teardown locked the era while the client received "Request cancelled". Era commits now run through a single era_settles predicate: monotone (the first success wins, a straggler from the other era can never move a committed lock) and skipped when a pending cancel is about to replace the response. Two modern-path divergences between the stream loop and the HTTP entry close by sharing one pipeline: Connection.from_envelope takes the classifier's raw envelope values and owns the tolerant coercion (custom methods now degrade mis-shaped clientInfo identically everywhere), and a shared modern_error_data boundary maps unmapped handler exceptions to a generic INTERNAL_ERROR instead of the legacy code-0 catch-all putting str(e) on the wire. The classifier's non-string protocol-version guard moves below the header rung, so the HTTP wire is untouched whenever the version header is present, while the guard still backstops the header-absent null-version corner that previously escaped as a ValidationError.
There was a problem hiding this comment.
1 issue found across 6 files (changes from recent commits).
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
A null body protocolVersion slipped the header equality check (None == None), so the absent-header outcome depended on the body value: an int gave HEADER_MISMATCH while null fell through to the string guard's INVALID_PARAMS. The rung now checks presence explicitly, making an absent version header uniformly HEADER_MISMATCH and the string guard reachable only on header-less transports. Also updates the public Server.run docstring to the first-success era-lock wording.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Post-merge review follow-up to #3038. An adversarial review pass over the dual-era driver found a family of cases where a request that failed could still lock the connection's era — and on stdio a connection is a whole subprocess, so a locked-then-stranded connection has no recovery short of a restart.
The era-lock fixes
The invariant the driver documents is: released auto-negotiating clients fall back on any error code except -32022, so a failed probe must leave the legacy handshake available. Three openers violated it:
clientInfo: 42): classification checks key presence only, so the era locked modern before pydantic rejected the content. The request failed with a fallback-eligible -32602, but the fallback then hit -32022 forever. The lock now commits only after the request succeeds, mirroring the legacy side's existing "lock only on success" rule.initializestamped with the envelope triple: routed modern (the triple won), failed METHOD_NOT_FOUND (initializedoesn't exist at 2026-07-28), and locked the era anyway.initializeis legacy-distinctive by definition, so it now always takes the handshake path regardless of decoration.subscriptions/listenas the opener: the not-served-on-this-transport rejection used to land after the lock; now, like every failed request, it locks nothing.No conforming client — released SDK or otherwise — produces any of these frames; the fixes are robustness against buggy third-party modern clients.
Compatibility restoration
A bare
server/discoveron a legacy-locked connection now falls through to the loop runner's per-version validation, restoring the byte-identical METHOD_NOT_FOUND (-32601, same message and data) a handshake-only server produced — #3038 had it rejecting with -32600. Only envelope-bearing frames get the cross-era INVALID_REQUEST; a conforming legacy client can never send the reserved triple, so legacy traffic is byte-identical again in full.Note for anyone bisecting client behavior: v2.0.0b1 ships the -32600 variant for this one frame shape (it carries #3038 but not this fix), so a legacy-locked bare
server/discoveranswers -32600 on b1 and -32601 before #3038 and from this PR onward.Smaller fixes
classify_inbound_requestrejects a present-but-non-string protocol version with a descriptive INVALID_PARAMS instead of raisingValidationErrorout of its own -32022 payload construction (requestedis astrfield). The dispatcher's exception ladder happened to mask that as a generic -32602 — same code, now deliberate, with a message that names the defect. Unreachable over HTTP (the header rung fires first), so the wire delta is stdio-only.transport.can_send_requestso per-message transport metadata agrees with its denial, matching the modern HTTP entry's context — closing a latent inconsistency before anything consumes the field.NotifyOnlyOutboundinherits the no-channel refusal instead of duplicating it; thehas_standalone_channel/Raisescontracts now document that a modern connection refuses requests on a real channel.subscriptions/listen; a staleserve_loopdocstring claim is corrected. Two pyright suppressions in the stdio test doubles are replaced with typeshed-compatibleBuffersignatures.Validation
Every fix has a pinning test (rejection codes, lock neutrality of each failing opener, the byte-identical legacy fall-through, the classifier guard, the transport mask). Beyond the suite (100% coverage), each formerly-stranding opener was driven against a live stdio subprocess and shown to leave the legacy handshake working on the same pipe, with the full original probe matrix (auto/legacy/pinned negotiation, pipelined probe ordering, -32022 round-trip) passing unchanged.
Round 2 — review follow-ups
Review found that the round-1 lock-on-success rule still approximated "client-visible success" with "the handler returned". Two windows violated the invariant, plus two cross-transport divergences on the modern path:
initializeoverwrote the committed legacy lock — bricking the freshly-handshaked client — and the mirror direction (a legacy commit clobbering a modern lock) was possible too. The era cell is now monotone: oneera_settlessettlement predicate with site-local atomic commits, first success wins, and a straggler from the other era can never move a committed lock. Its response still stands — it was already earned."Request cancelled", but the era had already locked. The settlement predicate now skips the lock when the request's cancel event is set, tying "success" to what the client actually observes. (A cancel racing a genuinely blocked response write can still slip through; that residue belongs to the dispatcher's existing respond-after-cancel divergence and dissolves with it.)str(e)on the wire — while the identical request over modern HTTP produced the generic-32603 Internal server error. The code-0 pin exists for v1 compat, which the modern era doesn't have. Both modern entries now share onemodern_error_databoundary; debugraise_exceptionsstill re-raises.clientInfo/clientCapabilitiesto not-supplied while the stream path rejected them atConnection.from_envelopeconstruction. For spec methods both transports already rejected identically (the kernel's per-version params surface types the reserved_metakeys strictly); custom methods diverged.from_envelopenow takes the classifier's raw values and owns the coercion, so every modern entry — HTTP, stream loop, andmodern_on_request— builds connection identity identically.None == Noneequality and make the absent-header outcome depend on the body value. An absent or disagreeing version header is uniformlyHEADER_MISMATCH, the HTTP wire is untouched wherever it was well-defined, and the string guard is reachable only on header-less transports, making the round-1 "stdio-only delta" claim above exactly true. (Through the shipped HTTP manager the era routing is header-only, so these cells harden the modern entry's own contract.) The publicServer.rundocstring also picked up the first-success wording.Each behavioral fix carries a regression test that failed on the round-1 code (the overwrite and cancelled-away tests reproduce the races deterministically), and the scenarios were re-driven against a live stdio subprocess: the pipelined-race brick, the sanitized internal error, the failed-opener fallback, and the cancel-never-locks path all behave as documented, with auto/legacy negotiation unchanged.
AI Disclaimer