LLM Structured Output: JSON Mode vs Function Calling vs Constrained Decoding — ZenDevy

Technical information was last verified on April 2026. The AI/LLM field moves fast — re-check official docs if more than 6 months have passed.

Who should read this

Summary: There are three ways to get structured data from an LLM. Structured Output (constrained decoding) guarantees 100% schema compliance. JSON Mode guarantees valid JSON but not schema compliance. Function Calling is designed for tool invocation. In 2026, Structured Output is supported by every major provider, making it the right answer for most cases.

This article is for backend developers who need to receive programmable data from LLM API calls.

Comparing the 3 approaches

	JSON Mode	Function Calling	Structured Output
JSON syntax guarantee	Guaranteed	Guaranteed	Guaranteed
JSON Schema compliance	Not guaranteed	Not guaranteed (without strict)	100% guaranteed
Primary use case	Simple JSON extraction	Agent tool invocation	Schema-based data extraction
Overhead	Near zero	Slight (tool definition tokens)	Near zero (XGrammar)
OpenAI support	GPT-3.5+	GPT-4+	GPT-4o, 4.1 (strict)
Anthropic support	--	Tool use	Claude 3.5+ (Nov 2025-)
Open-source support	vLLM, TGI	Limited	vLLM + XGrammar

As of April 2026. When Structured Output is available, always choose it.

JSON Mode — simplest but incomplete

Setting response_format: { type: "json_object" } makes the LLM return syntactically valid JSON. However, you cannot specify a schema. You might expect { "name": "Jane Doe" } but get { "user": "Jane Doe", "extra": true } instead.

Using JSON Mode alone in production means your response parsing code needs defensive logic, and schema mismatches cause runtime errors. Suitable for quick prototyping only.

Function Calling — agent tool invocation

Function Calling is a mechanism where “the LLM decides to call an external function.” Its purpose is action selection, not data extraction.

// OpenAI Function Calling
const response = await openai.chat.completions.create({
  model: 'gpt-4.1',
  messages: [{ role: 'user', content: 'Tell me the weather in Seoul' }],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      parameters: {
        type: 'object',
        properties: { city: { type: 'string' } },
        required: ['city'],
      },
    },
  }],
});
// → model decides to call get_weather({ city: "Seoul" })

Adding strict: true guarantees schema compliance, but at that point it is effectively the same mechanism as Structured Output.

Structured Output — the 2026 production standard

Constrained Decoding: When the LLM generates tokens, it sets the probability of any token that violates the JSON Schema to zero, making schema violations structurally impossible. This is not “validate then retry” but “invalid tokens can never be selected in the first place.”

// OpenAI Structured Output
const response = await openai.chat.completions.create({
  model: 'gpt-4.1',
  messages: [{ role: 'user', content: 'Extract sentiment and keywords from this review' }],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'review_analysis',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
          keywords: { type: 'array', items: { type: 'string' } },
          confidence: { type: 'number', minimum: 0, maximum: 1 },
        },
        required: ['sentiment', 'keywords', 'confidence'],
      },
    },
  },
});
// → 100% schema compliance guaranteed

Engines like XGrammar and llguidance have reduced the performance overhead of constrained decoding to near zero. As of 2026, there is no reason not to use Structured Output in production.

What to avoid