If you’ve ever built an AI agent that suddenly returns "I'm not sure" where your code expected a numeric price, or watched a production pipeline crash because a language model decided to invent a field name on a whim — you already understand why structured output enforcement is one of the most important skills in an AI agent engineer’s toolkit.
In this tutorial, we’re going to walk through the full picture: what structured outputs are, why LLMs fight against them, and how to lock down your agent’s responses using JSON Schema validation, Python type safety, and layered fallback strategies. By the end, you’ll have a reliable, production-grade approach you can drop into any agent you build.
What Are Structured Outputs and Why Do They Matter?
A structured output is any response from an AI model that conforms to a predefined shape — a specific set of fields, data types, and constraints. Instead of a free-form paragraph, you get something like this:
{
"product_name": "Wireless Keyboard",
"price": 49.99,
"in_stock": true,
"category": "Electronics"
}
This matters enormously in agentic systems because agents don’t just answer questions — they take actions. An agent might:
- Pass a result to a downstream API that expects a specific payload
- Store data in a database with strict column types
- Trigger a workflow based on a classification label
- Feed output into another agent’s input
When any of these steps receives malformed data, the entire pipeline breaks. Worse, if the output looks valid but contains hallucinated field values, you get silent corruption — the hardest class of bug to debug.
The core challenge: Language models are trained to produce natural language. Every structured response you ask for is, from the model’s perspective, a creative writing task. Without enforcement, models will omit fields, add unrequested ones, switch data types, or fill gaps with plausible-sounding fiction.
The Three Layers of Output Enforcement
Reliable structured outputs aren’t a single technique — they’re a defense-in-depth stack with three layers:
- Prompt-level constraints — telling the model what shape you want
- Schema-level validation — enforcing that shape programmatically
- Type-level safety — catching mismatches at the Python (or TypeScript) layer before they propagate
Let’s build each layer from the ground up.
Layer 1: Prompt-Level Constraints
Before any code runs, your prompt does a lot of heavy lifting. A weak prompt like “Extract product info and return it as JSON” will produce inconsistent results. A strong prompt specifies the contract explicitly.
Writing a Tight System Prompt
SYSTEM_PROMPT = """
You are a product data extraction assistant.
You MUST respond with a valid JSON object that exactly matches this structure:
{
"product_name": string,
"price": number (float, no currency symbols),
"in_stock": boolean,
"category": string (one of: "Electronics", "Clothing", "Home", "Books", "Other")
}
Rules:
- Do not add extra fields.
- Do not include explanation or prose — only the JSON object.
- If a value cannot be determined, use null for that field.
- Never invent data. If uncertain, use null.
"""
Notice what this prompt does:
– Declares the exact schema inline
– Specifies data types explicitly
– Constrains enum values (the category field)
– Handles the “I don’t know” case with null instead of hallucination
– Bans freeform prose entirely
This alone won’t be sufficient, but it dramatically reduces the surface area for errors and primes the model for the validation steps ahead.
Layer 2: JSON Schema Validation
Even with a perfect prompt, you need programmatic enforcement. JSON Schema is the industry standard for defining and validating the structure of JSON data.
Defining Your Schema
from jsonschema import validate, ValidationError
PRODUCT_SCHEMA = {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": ["number", "null"]},
"in_stock": {"type": "boolean"},
"category": {
"type": "string",
"enum": ["Electronics", "Clothing", "Home", "Books", "Other"]
}
},
"required": ["product_name", "in_stock", "category"],
"additionalProperties": False
}
The "additionalProperties": False line is critical — it rejects any response that adds fields not in your schema, which is one of the most common LLM failure modes.
Parsing and Validating the Response
import json
import re
def parse_and_validate(raw_response: str) -> dict:
# Step 1: Extract JSON from the response (handles markdown code blocks)
json_match = re.search(r'\{.*\}', raw_response, re.DOTALL)
if not json_match:
raise ValueError("No JSON object found in response")
json_str = json_match.group()
# Step 2: Parse JSON
try:
data = json.loads(json_str)
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON: {e}")
# Step 3: Validate against schema
try:
validate(instance=data, schema=PRODUCT_SCHEMA)
except ValidationError as e:
raise ValueError(f"Schema validation failed: {e.message}")
return data
The regex extraction step handles a very common real-world problem: models often wrap their JSON in markdown code fences (```json ... ```), even when told not to. Always strip these defensively.
Using OpenAI’s Native Structured Outputs
If you’re using OpenAI’s API, you can take this further with their native response_format parameter, which uses constrained decoding to guarantee valid JSON:
from openai import OpenAI
from pydantic import BaseModel
from typing import Optional, Literal
client = OpenAI()
class ProductExtraction(BaseModel):
product_name: str
price: Optional[float]
in_stock: bool
category: Literal["Electronics", "Clothing", "Home", "Books", "Other"]
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract product information from the text."},
{"role": "user", "content": "The Sony WH-1000XM5 headphones are $349.99 and currently available in Electronics."}
],
response_format=ProductExtraction,
)
product = response.choices[0].message.parsed
print(product.product_name) # "Sony WH-1000XM5"
print(product.price) # 349.99
print(product.in_stock) # True
This approach uses the model’s own tokenization constraints to make non-conforming responses structurally impossible. It’s the gold standard when your provider supports it.
Layer 3: Type Safety with Pydantic
Raw dictionaries are error-prone. Once your JSON is validated, you want to work with typed objects that give you IDE autocompletion, runtime coercion, and field-level validation. Pydantic is the standard library for this in Python AI agent work.
Building a Pydantic Model with Field Validation
from pydantic import BaseModel, Field, field_validator
from typing import Optional, Literal
from decimal import Decimal
class ProductData(BaseModel):
product_name: str = Field(min_length=1, max_length=200)
price: Optional[float] = Field(default=None, ge=0)
in_stock: bool
category: Literal["Electronics", "Clothing", "Home", "Books", "Other"]
@field_validator("product_name")
@classmethod
def name_must_not_be_placeholder(cls, v: str) -> str:
placeholders = {"unknown", "n/a", "none", "null", "undefined"}
if v.lower().strip() in placeholders:
raise ValueError("product_name appears to be a hallucinated placeholder")
return v
@field_validator("price")
@classmethod
def price_sanity_check(cls, v: Optional[float]) -> Optional[float]:
if v is not None and v > 1_000_000:
raise ValueError(f"Price {v} is suspiciously high — possible hallucination")
return v
The custom validators are doing something powerful here: they encode domain knowledge about what values are plausible. A price over $1,000,000 for a typical product isn’t just technically valid — it’s a hallucination signal. Catching it at the model layer prevents it from corrupting downstream systems.
Integrating the Full Pipeline
def extract_product(text: str) -> ProductData:
# 1. Call the LLM
response = call_llm(SYSTEM_PROMPT, text)
# 2. Parse and validate JSON structure
raw_data = parse_and_validate(response)
# 3. Create typed Pydantic object with field-level validation
return ProductData(**raw_data)
# Usage
try:
product = extract_product("Buy the Logitech MX Master 3 for $99.99. In stock.")
print(f"Extracted: {product.product_name} at ${product.price}")
except (ValueError, ValidationError) as e:
print(f"Extraction failed: {e}")
# Trigger retry logic or fallback
Handling Failures Gracefully: Retry and Fallback Strategies
Even with all three layers in place, failures happen. A robust agent doesn’t just crash — it recovers. Here are the patterns you need.
Pattern 1: Retry with Error Feedback
When validation fails, feed the error back to the model in a follow-up message:
def extract_with_retry(text: str, max_retries: int = 3) -> ProductData:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text}
]
for attempt in range(max_retries):
raw = call_llm_with_messages(messages)
try:
data = parse_and_validate(raw)
return ProductData(**data)
except (ValueError, ValidationError) as e:
if attempt == max_retries - 1:
raise
# Feed the error back to the model
messages.append({"role": "assistant", "content": raw})
messages.append({
"role": "user",
"content": f"Your response failed validation: {e}. Please correct it and return only the valid JSON object."
})
raise RuntimeError("Max retries exceeded")
This technique — sometimes called self-correction prompting — leverages the model’s ability to fix its own mistakes when given precise feedback about what went wrong.
Pattern 2: Partial Extraction Fallback
For non-critical fields, consider accepting partial results rather than rejecting the entire response:
class PartialProductData(BaseModel):
product_name: str # Required
price: Optional[float] = None # Gracefully absent
in_stock: Optional[bool] = None # Gracefully absent
category: Optional[str] = None # Gracefully absent
extraction_confidence: float = Field(default=1.0, ge=0, le=1)
Use a confidence field to signal to downstream systems that some data may be missing or uncertain, rather than blocking the pipeline entirely.
Real-World Example: A Multi-Step Agent with Structured Handoffs
Let’s put it all together. Here’s a simplified order processing agent where each step passes structured data to the next:
from pydantic import BaseModel
from typing import Optional
class OrderIntent(BaseModel):
action: Literal["purchase", "refund", "inquiry"]
product_id: Optional[str]
quantity: Optional[int] = Field(default=None, ge=1, le=100)
urgency: Literal["low", "medium", "high"]
class InventoryCheck(BaseModel):
product_id: str
available_quantity: int
warehouse_location: str
estimated_ship_date: str # ISO 8601 format
class OrderConfirmation(BaseModel):
order_id: str
total_price: float
estimated_delivery: str
confirmation_sent: bool
# Each agent step validates its input and output
def classify_intent(user_message: str) -> OrderIntent:
raw = call_llm(INTENT_PROMPT, user_message)
return OrderIntent(**parse_and_validate(raw))
def check_inventory(intent: OrderIntent) -> InventoryCheck:
raw = call_llm(INVENTORY_PROMPT, intent.model_dump_json())
return InventoryCheck(**parse_and_validate(raw))
def confirm_order(intent: OrderIntent, inventory: InventoryCheck) -> OrderConfirmation:
context = {"intent": intent.model_dump(), "inventory": inventory.model_dump()}
raw = call_llm(ORDER_PROMPT, json.dumps(context))
return OrderConfirmation(**parse_and_validate(raw))
Each handoff is validated on both sides. An OrderIntent can only enter check_inventory if it’s been validated. The InventoryCheck that check_inventory returns is validated before it reaches confirm_order. No hallucinated product IDs, no phantom quantities, no ambiguous delivery strings slip through.
Common Hallucination Patterns and How to Block Them
Understanding how models hallucinate in structured contexts helps you write better validators:
| Hallucination Pattern | Example | Defense |
|---|---|---|
| Invented enum values | "category": "Gadgets" |
Enum constraint in schema |
| Type coercion guesses | "price": "$49.99" |
type: "number" in schema |
| Placeholder strings | "product_name": "N/A" |
Custom string validator |
| Extra helpful fields | "discount_percentage": 10 |
additionalProperties: false |
| Confident nulls | "in_stock": true when unknown |
Domain knowledge validators |
| Date format drift | "date": "April 24, 2026" |
Regex pattern validators |
Each of these is a pattern the model learned from training data. Your validation layer needs to anticipate them and reject them before they propagate.
Tooling and Libraries Reference
Here’s a quick reference for the ecosystem around structured outputs:
- Pydantic v2 — Python’s gold standard for data validation and type coercion
- jsonschema — Pure Python JSON Schema validator (supports Draft 7, 2019-09, 2020-12)
- Instructor — Open-source library that wraps OpenAI/Anthropic calls with Pydantic validation and auto-retry
- Outlines — Library for constrained text generation at the token level (for self-hosted models)
- LangChain’s output parsers — Built-in structured output parsing with retry logic
- Anthropic’s tool use / OpenAI’s function calling — Native structured output via tool definitions
For most production agents, Instructor is worth evaluating early. It handles the retry-with-feedback loop automatically and integrates cleanly with Pydantic models, saving significant boilerplate.
Testing Your Structured Output Pipeline
Output enforcement only works if you test it adversarially. Build a test suite that includes:
import pytest
ADVERSARIAL_RESPONSES = [
# Missing required field
'{"price": 49.99, "in_stock": true, "category": "Electronics"}',
# Invalid enum
'{"product_name": "Keyboard", "price": 49.99, "in_stock": true, "category": "Gadgets"}',
# Wrong type
'{"product_name": "Keyboard", "price": "$49.99", "in_stock": true, "category": "Electronics"}',
# Extra field
'{"product_name": "Keyboard", "price": 49.99, "in_stock": true, "category": "Electronics", "discount": 10}',
# Markdown wrapped
'```json\n{"product_name": "Keyboard", "price": 49.99, "in_stock": true, "category": "Electronics"}\n```',
# Plain prose
"The product is a keyboard priced at $49.99 and it is in stock.",
]
@pytest.mark.parametrize("response", ADVERSARIAL_RESPONSES[:-2])
def test_invalid_responses_are_rejected(response):
with pytest.raises((ValueError, ValidationError)):
parse_and_validate(response)
def test_markdown_wrapped_json_is_accepted():
result = parse_and_validate(ADVERSARIAL_RESPONSES[-2])
assert result["product_name"] == "Keyboard"
Run these tests every time you change your schema or validation logic. They’re your contract with reality.
Key Takeaways
Building reliable structured outputs in AI agents is not optional — it’s the foundation that makes everything else possible. Here’s what to carry forward:
- Use all three layers: prompt constraints set expectations, JSON Schema enforces structure, Pydantic enforces types and domain rules.
- Make schema violations specific: the more precisely your error messages describe what went wrong, the better your retry prompts will perform.
- Anticipate hallucination patterns: model the failure modes your specific use case is prone to and build validators for them.
- Test adversarially: your validation logic is only as good as the bad inputs you’ve thought to test against.
- Consider native structured output features: provider-level constrained decoding (OpenAI’s
response_format, Anthropic’s tool use) eliminates entire classes of failure.
Continue Your Learning
Ready to go deeper? Here’s what to explore next on Harness Engineering Academy:
- [Building Reliable Multi-Agent Pipelines] — how to chain validated handoffs across multiple specialized agents
- [Pydantic v2 for AI Engineers] — a full course on using Pydantic’s validator ecosystem for agent data contracts
- [Evaluating Agent Output Quality] — moving beyond validation to semantic correctness scoring
Start with the fundamentals: if you’re new to agent engineering, our AI Agent Engineering Learning Path will walk you through everything from basic prompt design to production-grade architectures — with structured outputs as a first-class concern from day one.
Written by Jamie Park — Educator and Career Coach at Harness Engineering Academy. Jamie specializes in making complex agent engineering concepts approachable for developers at every level.