Structured output + thinking + tool use: two bugs in multi-turn conversations

## Environment

- models: claude-sonnet-4-5, claude-sonnet-4-6
- thinking (both extended and adaptive): `{"type": "enabled", "budget_tokens": N}` (also reproduced with `{"type": "adaptive"}` with or without specifying the effort)
- output_config: {"format": {"type": "json_schema", "schema": {...}}} or using the pydantic schema inside `output_format`
- Tools are present in the request input (multi-turn conversation with tool results)

### Bug 1 — thinking block generated instead of tool_use blocks

In tool-calling turns, the model occasionally returns only a thinking block plus an empty text block (or no text block at all), with `stop_reason: "end_turn"`, instead of generating tool_use blocks.

The thinking content correctly identifies which tools should be called, but no tool_use blocks follow.

#### Actual behavior
```json
{
  ...
  content: [
    {"type": "thinking", "thinking": "I need to call tool_1 and tool_2 in parallel..."},
    {"type": "text", "text": ""}   // empty, no tool_use blocks
  ],
  stop_reason: "end_turn"
}
```

#### Expected behavior

If the model determines that tools must be called, it should emit one or more tool_use blocks (not end the turn with only thinking/empty text).

#### Impact

In this scenario, I cannot call `anthropic.messages.parse`, because the SDK attempts to parse an invalid/unfinished model response.

### Bug 2 — final text block contains invalid JSON
Often, In the final turn (after tool results), the structured output text block contains invalid content.

#### Observed cases
- *Case A*: the text block is a mix of text and json and it is not possible to parse it.
- *Case B*: a malformed/partial JSON, followed by 20+ blank lines, followed by the valid JSON — both in a single text block:

```json
{
  ...
  "content": [
    {
      "type": "text",
      // invalid json +  actual valid output, but appended in same text block
      "text": "{\"my_json\": \"is_broken \n\n\n\n\n\n\n\n\n {\"my_json\": \"is_not_broken\"}"
    }
  ]
}
```
#### Expected behavior

The final text block should contain exactly one valid JSON object matching the provided json_schema, with no duplicated or partial content.

## Notes
- If I remove `output_config.format`, everything seems to work, but I no longer get structured JSON output.
- If I remove `thinking` and use messages.parse, the response sometimes contains empty content instead

## minimal code
```py
from anthropic import AsyncAnthropic
from pydantic import BaseModel
import asyncio

class Output(BaseModel):
    param1: str
    param2: str
    param3: str

client = AsyncAnthropic()

tools = []

def extract_tools_use(response):
    """Extracts the tools use block from the response"""
    return []

def execute_tools(tools_to_use):
    """Execute the tools and create the tool_result blocks"""
    return {"type": "tool_result", "tool_use_id": "", "content": "..."}


async def main():
        query = "What is the capital of France?"
        messages = [{"role": "user", "content": query}]
        
        # react loop
        for i in range(10):
            client = AsyncAnthropic()
            response = await client.messages.parse(
                model="claude-sonnet-4-6",
                max_tokens=10000,
                system="...prompt...",
                tool_choice={"type": "auto"},
                messages=[{"role": "user", "content": query}],
                tools=tools,
                thinking={"type": "adaptive"},
                output_config={"effort": "medium"},
                output_format=Output
            )
            
            tools_use = extract_tools_use(response)
            
            if len(tools_use) == 0:
                # final response in structured output
                print(response.parsed_output)
            else:
                # Append thinking and tool calls to messages for next iteration
                messages.append(response)
                
                # execute tools and get tool response
                tool_response = execute_tools(tools_use)
                
                # add tool response to messages for next iteration
                messages.append({"role": "tool", "content": tool_response})
            
        
if __name__ == "__main__":
    asyncio.run(main())
```
## Conclusion

At the moment, the interaction between thinking, tool calling, and json_schema structured output appears to be broken or highly unstable.
- With thinking + structured output + tools, the model fails a lot to emit tool_use blocks and end the turn prematurely, or produce invalid structured output.
- With structured output + tools (without thinking), I sometimes receive empty content in the response.
- If I remove structured output and keep tools enabled, the flow appears to work more reliably, but I no longer receive output that conforms to the required schema.

In practice, this makes it difficult to reliably use thinking + tools together with structured JSON output in production.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured output + thinking + tool use: two bugs in multi-turn conversations #1204

Environment

Bug 1 — thinking block generated instead of tool_use blocks

Actual behavior

Expected behavior

Impact

Bug 2 — final text block contains invalid JSON

Observed cases

Expected behavior

Notes

minimal code

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Structured output + thinking + tool use: two bugs in multi-turn conversations #1204

Description

Environment

Bug 1 — thinking block generated instead of tool_use blocks

Actual behavior

Expected behavior

Impact

Bug 2 — final text block contains invalid JSON

Observed cases

Expected behavior

Notes

minimal code

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions