fix: surface task_id in interrupted tool error text for LLM resume#35222
fix: surface task_id in interrupted tool error text for LLM resume#35222flaxodev wants to merge 1 commit into
Conversation
…resume them
When a user aborts a running sub-agent, the Task tool's sessionId is persisted in the tool part's metadata but never surfaced to the LLM. The agent only sees "Tool execution aborted" with no way to know the sub-agent's session ID, so it cannot resume the aborted task via the task_id parameter.
Include the task_id in the error text rendered to the LLM for any
interrupted tool call that has sessionId in its state metadata.
The agent will see:
Tool execution aborted
`task_id: ses_0d76b70c6ffekT4JcAvvXEuaeI`
And can pass that task_id back to the Task tool to resume the
sub-agent with its full conversation history intact.
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
|
I've been thinking for a long time to make the task_id return even if the agent stalls out, errors, etc. Just too many higher priority issues to work on. A fix is needed. The recovery pattern for agents is typically trying to resume the session and observing it fail without understanding why, and then just retasking a fresh subagent. Which works, but its sometimes inefficient and creates more token burn. The additional feature that's needed though is for when subagents fail to return, due to interruption or an error (such as a connection timeout), relaying the error message or the last section of text and message type (e.g. an agent was interrupted or stalled out with last messages being thinking blocks). That way the tasking agent can have enough information to better infer and understand why the subagent failed to return, and initiate more appropriate recovery processes. This would help make more autonomous workflows; less need for humans to help agents understand why a subagent session failed to return. I'd suggest expanding the scope of the fix as needed to fully address matters. |
So you mean would be better to also give the agent context from the sub agents? I don't really think it would be a good idea, the point of sub-agents is to do tasks without flooding the parent agent's context, imagine a case where someone has 7 sub-agents running, giving their context to the parent agent would eat up a lot of it's context... I think it might be good to give the agent the error that happened so the sub-agent was aborted, however it might be an API request error to provider, and I guess agent might hallucinate stuff up on it Let me know if I misunderstood you or you have something else in mind |
I'm proposing that when the last message in a subagent session is not an assistant text return, present that with a notice to the primary agent so it can understand that: 1) the subagent did work the task, but got interrupted, stalled, or errored out, 2) based upon the text showing the error message or stall nature can decide if resuming the session is the best course of action, or task a fresh session, or investigate the failure prior to any further tasking. I think if we give the agent the message type and middle-truncate messages to a set character limit, it'll be information that helps agents more efficiently resolve a somewhat common failure pattern. That overall, it's less costly to supply the data than have the agent operate without it. The hallucination risk can be addressed by packaging the subagent-tail message and it's message-type data as part of an agent return. Proper message labeling, such as 'Agent Return Failure • the agent did not issue return text. The last text in the agents session is: {: '}. While developing a plugin that modifies session context prior to sending to the provider API, built off a fork of DCP, I have observed If you inject session context utilization info labeled as as a synthetic tool part, the agent hallucinates having called a tool. Move it to being a user message, and label it as as system directive, and provide a system prompt that informs the agent 'this text with this format is a system notice from the environment', and it doesn't confuse it coming from the user and properly uses the injected information as environmental data. I would expect the wording of the Agent Return Failure notice would be sufficient by itself. Adding a concise 1-2 sentence instruction explaining what an Agent Return Failure is to the task tool prompt could be an additive safeguard. |
The sub-agent session ID was persisted in DB metadata on abort but
never rendered to the LLM. Include task_id in the error text so the
agent can resume aborted sub-agents via the Task tool's task_id
parameter.
Issue for this PR
Related to #35177
Closes #
Closes #35177
Type of change
What does this PR do?
The sub-agent session ID was persisted in DB metadata on abort but
never rendered to the LLM. Include task_id in the error text so the
agent can resume aborted sub-agents via the Task tool's task_id
parameter.
How did you verify your code works?
I tested my changes locally
Screenshots / recordings
If this is a UI change, please include a screenshot or recording.
Checklist