Skip to content

Structured output + thinking + tool use: two bugs in multi-turn conversations #1204

@ootkin

Description

@ootkin

Environment

  • models: claude-sonnet-4-5, claude-sonnet-4-6
  • thinking (both extended and adaptive): {"type": "enabled", "budget_tokens": N} (also reproduced with {"type": "adaptive"} with or without specifying the effort)
  • output_config: {"format": {"type": "json_schema", "schema": {...}}} or using the pydantic schema inside output_format
  • Tools are present in the request input (multi-turn conversation with tool results)

Bug 1 — thinking block generated instead of tool_use blocks

In tool-calling turns, the model occasionally returns only a thinking block plus an empty text block (or no text block at all), with stop_reason: "end_turn", instead of generating tool_use blocks.

The thinking content correctly identifies which tools should be called, but no tool_use blocks follow.

Actual behavior

{
  ...
  content: [
    {"type": "thinking", "thinking": "I need to call tool_1 and tool_2 in parallel..."},
    {"type": "text", "text": ""}   // empty, no tool_use blocks
  ],
  stop_reason: "end_turn"
}

Expected behavior

If the model determines that tools must be called, it should emit one or more tool_use blocks (not end the turn with only thinking/empty text).

Impact

In this scenario, I cannot call anthropic.messages.parse, because the SDK attempts to parse an invalid/unfinished model response.

Bug 2 — final text block contains invalid JSON

Often, In the final turn (after tool results), the structured output text block contains invalid content.

Observed cases

  • Case A: the text block is a mix of text and json and it is not possible to parse it.
  • Case B: a malformed/partial JSON, followed by 20+ blank lines, followed by the valid JSON — both in a single text block:
{
  ...
  "content": [
    {
      "type": "text",
      // invalid json +  actual valid output, but appended in same text block
      "text": "{\"my_json\": \"is_broken \n\n\n\n\n\n\n\n\n {\"my_json\": \"is_not_broken\"}"
    }
  ]
}

Expected behavior

The final text block should contain exactly one valid JSON object matching the provided json_schema, with no duplicated or partial content.

Notes

  • If I remove output_config.format, everything seems to work, but I no longer get structured JSON output.
  • If I remove thinking and use messages.parse, the response sometimes contains empty content instead

minimal code

from anthropic import AsyncAnthropic
from pydantic import BaseModel
import asyncio

class Output(BaseModel):
    param1: str
    param2: str
    param3: str

client = AsyncAnthropic()

tools = []

def extract_tools_use(response):
    """Extracts the tools use block from the response"""
    return []

def execute_tools(tools_to_use):
    """Execute the tools and create the tool_result blocks"""
    return {"type": "tool_result", "tool_use_id": "", "content": "..."}


async def main():
        query = "What is the capital of France?"
        messages = [{"role": "user", "content": query}]
        
        # react loop
        for i in range(10):
            client = AsyncAnthropic()
            response = await client.messages.parse(
                model="claude-sonnet-4-6",
                max_tokens=10000,
                system="...prompt...",
                tool_choice={"type": "auto"},
                messages=[{"role": "user", "content": query}],
                tools=tools,
                thinking={"type": "adaptive"},
                output_config={"effort": "medium"},
                output_format=Output
            )
            
            tools_use = extract_tools_use(response)
            
            if len(tools_use) == 0:
                # final response in structured output
                print(response.parsed_output)
            else:
                # Append thinking and tool calls to messages for next iteration
                messages.append(response)
                
                # execute tools and get tool response
                tool_response = execute_tools(tools_use)
                
                # add tool response to messages for next iteration
                messages.append({"role": "tool", "content": tool_response})
            
        
if __name__ == "__main__":
    asyncio.run(main())

Conclusion

At the moment, the interaction between thinking, tool calling, and json_schema structured output appears to be broken or highly unstable.

  • With thinking + structured output + tools, the model fails a lot to emit tool_use blocks and end the turn prematurely, or produce invalid structured output.
  • With structured output + tools (without thinking), I sometimes receive empty content in the response.
  • If I remove structured output and keep tools enabled, the flow appears to work more reliably, but I no longer receive output that conforms to the required schema.

In practice, this makes it difficult to reliably use thinking + tools together with structured JSON output in production.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions