test(tutorials): fix flaky streaming test that broke on first terminal event#453
Open
max-parke-scale wants to merge 1 commit into
Open
test(tutorials): fix flaky streaming test that broke on first terminal event#453max-parke-scale wants to merge 1 commit into
max-parke-scale wants to merge 1 commit into
Conversation
…l event
test_send_event_and_stream_with_reasoning broke out of the stream loop on
the first `done` event. A single turn emits several messages — user echo,
reasoning, agent text — each ending in a `full` or `done`. When a reasoning
message's terminal event arrived before the agent's text `done`, the loop
exited with agent_response_found still False, failing at the assertion
("Agent response not found in stream") rather than timing out.
The failure signature confirms this: an AssertionError (loop broke early),
not a TimeoutError (latency). Consume terminal events until both the user
echo and the agent's text reply are seen, keying off message content rather
than the first terminal signal.
Test-side only: the producer (streaming.py) correctly emits a terminal
event per message and is keyed by message id by real consumers; it was
just repaired in #449, so the fix stays in the test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dba8df3 to
00a4351
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
test_send_event_and_stream_with_reasoningin010_agent_chatintermittently reds CI on unrelated PRs (e.g. passes on one run, fails minutes later on a sibling PR whose diff is version-strings/changelogs only).Root cause — test-side early break
A single turn emits several task messages, each terminated by exactly one stream event:
adk.messages.create)fullsummary="detailed")full(thenclose()is a no-op)doneThe stream loop broke unconditionally on the first
done. When a reasoning message's terminal event reached the consumer before the agent text'sdone, the loop exited withagent_response_foundstillFalse.The failure signature confirms it is an early break, not latency: the run fails with
AssertionError: Agent response not found in streamattests/test_agent.py:287, not aTimeoutErrorat theawait stream_taskline. A latency/timeout problem would surface as the latter. (It also fails at the agent assertion, not the user one, confirming the user echo reliably arrives.)Fix
Consume terminal events (
fullanddone) until both the user echo and the agent's text reply are observed, keying off the retrieved message's content rather than stopping at the first terminal signal. The 90s timeout remains the backstop if the agent genuinely never responds.As a bonus, the unified handler now catches the agent text whether it arrives as a
fullor adone, so it's robust to emission changes.Why test-side, not
streaming.pyThe producer correctly emits one terminal event per message, and real consumers key by message id (they don't break on the first
done). That code path was just repaired in #449 for a duplicate-publish symptom; making reasoning also emit adonerisks reintroducing it. The correct and lower-risk layer is the test.Testing
Full repro requires the
scale-agentexserver image + Redis + Temporal + real gpt-5 calls, which isn't reproducible in this environment (no OpenAI key). Verified locally: file compiles andruff checkpasses. The fix is validated by the failure-signature analysis above rather than a mock-based regression test, which for a tutorial integration test would add more coupling than coverage.🤖 — posted via Claude Code
Greptile Summary
This PR updates the streaming tutorial test to avoid stopping on the first terminal event. The main changes are:
fullanddoneterminal events through one path.Confidence Score: 5/5
Safe to merge; the change is isolated to a tutorial integration test and aligns the stream consumer with the documented multi-message terminal-event behavior.
The update narrows the stopping condition without touching production streaming code, preserving the timeout backstop while making the assertion depend on observed message content.
What T-Rex did
Reviews (2): Last reviewed commit: "test(tutorials): fix flaky streaming tes..." | Re-trigger Greptile