Technical information was last verified on April 2026. The AI/LLM field moves fast — re-check official docs if more than 6 months have passed.
Who should read this
Summary: There are three ways to get structured data from an LLM. Structured Output (constrained decoding) guarantees 100% schema compliance. JSON Mode guarantees valid JSON but not schema compliance. Function Calling is designed for tool invocation. In 2026, Structured Output is supported by every major provider, making it the right answer for most cases.
This article is for backend developers who need to receive programmable data from LLM API calls.
Comparing the 3 approaches
| JSON Mode | Function Calling | Structured Output | |
|---|---|---|---|
| JSON syntax guarantee | Guaranteed | Guaranteed | Guaranteed |
| JSON Schema compliance | Not guaranteed | Not guaranteed (without strict) | 100% guaranteed |
| Primary use case | Simple JSON extraction | Agent tool invocation | Schema-based data extraction |
| Overhead | Near zero | Slight (tool definition tokens) | Near zero (XGrammar) |
| OpenAI support | GPT-3.5+ | GPT-4+ | GPT-4o, 4.1 (strict) |
| Anthropic support | -- | Tool use | Claude 3.5+ (Nov 2025-) |
| Open-source support | vLLM, TGI | Limited | vLLM + XGrammar |
JSON Mode — simplest but incomplete
Setting response_format: { type: "json_object" } makes the LLM return syntactically valid JSON. However, you cannot specify a schema. You might expect { "name": "Jane Doe" } but get { "user": "Jane Doe", "extra": true } instead.
Using JSON Mode alone in production means your response parsing code needs defensive logic, and schema mismatches cause runtime errors. Suitable for quick prototyping only.
Function Calling — agent tool invocation
Function Calling is a mechanism where “the LLM decides to call an external function.” Its purpose is action selection, not data extraction.
// OpenAI Function Calling
const response = await openai.chat.completions.create({
model: 'gpt-4.1',
messages: [{ role: 'user', content: 'Tell me the weather in Seoul' }],
tools: [{
type: 'function',
function: {
name: 'get_weather',
parameters: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city'],
},
},
}],
});
// → model decides to call get_weather({ city: "Seoul" }) Adding strict: true guarantees schema compliance, but at that point it is effectively the same mechanism as Structured Output.
Structured Output — the 2026 production standard
Constrained Decoding: When the LLM generates tokens, it sets the probability of any token that violates the JSON Schema to zero, making schema violations structurally impossible. This is not “validate then retry” but “invalid tokens can never be selected in the first place.”
// OpenAI Structured Output
const response = await openai.chat.completions.create({
model: 'gpt-4.1',
messages: [{ role: 'user', content: 'Extract sentiment and keywords from this review' }],
response_format: {
type: 'json_schema',
json_schema: {
name: 'review_analysis',
strict: true,
schema: {
type: 'object',
properties: {
sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
keywords: { type: 'array', items: { type: 'string' } },
confidence: { type: 'number', minimum: 0, maximum: 1 },
},
required: ['sentiment', 'keywords', 'confidence'],
},
},
},
});
// → 100% schema compliance guaranteed Engines like XGrammar and llguidance have reduced the performance overhead of constrained decoding to near zero. As of 2026, there is no reason not to use Structured Output in production.
What to avoid
Further reading
- RAG Pipeline Design: From Chunking to Retrieval Quality Monitoring — Architecture for integrating structured output into a RAG pipeline
- Prompt Version Management for Production AI Services — How to manage prompts when schemas change
- REST vs GraphQL vs tRPC: 2026 Selection Guide — Choosing the API layer that wraps your structured output