-
Notifications
You must be signed in to change notification settings - Fork 477
Description
Environment
- models: claude-sonnet-4-5, claude-sonnet-4-6
- thinking (both extended and adaptive):
{"type": "enabled", "budget_tokens": N}(also reproduced with{"type": "adaptive"}with or without specifying the effort) - output_config: {"format": {"type": "json_schema", "schema": {...}}} or using the pydantic schema inside
output_format - Tools are present in the request input (multi-turn conversation with tool results)
Bug 1 — thinking block generated instead of tool_use blocks
In tool-calling turns, the model occasionally returns only a thinking block plus an empty text block (or no text block at all), with stop_reason: "end_turn", instead of generating tool_use blocks.
The thinking content correctly identifies which tools should be called, but no tool_use blocks follow.
Actual behavior
{
...
content: [
{"type": "thinking", "thinking": "I need to call tool_1 and tool_2 in parallel..."},
{"type": "text", "text": ""} // empty, no tool_use blocks
],
stop_reason: "end_turn"
}Expected behavior
If the model determines that tools must be called, it should emit one or more tool_use blocks (not end the turn with only thinking/empty text).
Impact
In this scenario, I cannot call anthropic.messages.parse, because the SDK attempts to parse an invalid/unfinished model response.
Bug 2 — final text block contains invalid JSON
Often, In the final turn (after tool results), the structured output text block contains invalid content.
Observed cases
- Case A: the text block is a mix of text and json and it is not possible to parse it.
- Case B: a malformed/partial JSON, followed by 20+ blank lines, followed by the valid JSON — both in a single text block:
{
...
"content": [
{
"type": "text",
// invalid json + actual valid output, but appended in same text block
"text": "{\"my_json\": \"is_broken \n\n\n\n\n\n\n\n\n {\"my_json\": \"is_not_broken\"}"
}
]
}Expected behavior
The final text block should contain exactly one valid JSON object matching the provided json_schema, with no duplicated or partial content.
Notes
- If I remove
output_config.format, everything seems to work, but I no longer get structured JSON output. - If I remove
thinkingand use messages.parse, the response sometimes contains empty content instead
minimal code
from anthropic import AsyncAnthropic
from pydantic import BaseModel
import asyncio
class Output(BaseModel):
param1: str
param2: str
param3: str
client = AsyncAnthropic()
tools = []
def extract_tools_use(response):
"""Extracts the tools use block from the response"""
return []
def execute_tools(tools_to_use):
"""Execute the tools and create the tool_result blocks"""
return {"type": "tool_result", "tool_use_id": "", "content": "..."}
async def main():
query = "What is the capital of France?"
messages = [{"role": "user", "content": query}]
# react loop
for i in range(10):
client = AsyncAnthropic()
response = await client.messages.parse(
model="claude-sonnet-4-6",
max_tokens=10000,
system="...prompt...",
tool_choice={"type": "auto"},
messages=[{"role": "user", "content": query}],
tools=tools,
thinking={"type": "adaptive"},
output_config={"effort": "medium"},
output_format=Output
)
tools_use = extract_tools_use(response)
if len(tools_use) == 0:
# final response in structured output
print(response.parsed_output)
else:
# Append thinking and tool calls to messages for next iteration
messages.append(response)
# execute tools and get tool response
tool_response = execute_tools(tools_use)
# add tool response to messages for next iteration
messages.append({"role": "tool", "content": tool_response})
if __name__ == "__main__":
asyncio.run(main())Conclusion
At the moment, the interaction between thinking, tool calling, and json_schema structured output appears to be broken or highly unstable.
- With thinking + structured output + tools, the model fails a lot to emit tool_use blocks and end the turn prematurely, or produce invalid structured output.
- With structured output + tools (without thinking), I sometimes receive empty content in the response.
- If I remove structured output and keep tools enabled, the flow appears to work more reliably, but I no longer receive output that conforms to the required schema.
In practice, this makes it difficult to reliably use thinking + tools together with structured JSON output in production.