Prompting Is Not Enforcement: How Schema-Constrained Decoding Makes LLM Outputs Reliable

TL;DR: Prompting makes structured output more likely. Schema-constrained decoding makes invalid continuations impossible by masking out tokens that would violate a JSON/schema/grammar contract.

</aside>

The problem: “Return valid JSON” isn’t reliable

In real LLM workflows, you’ll ask for output like:

{
  "verdict": "pass",
  "confidence": "high",
  "reason": "The answer satisfies the rubric."
}

And then you’ll see failures such as:

Extra text before/after the JSON (e.g., Here is the JSON:)
Missing required fields
Unexpected extra fields
Invalid enum values (e.g., "high confidence" instead of "high")
Nearly-valid JSON with a tiny syntax error (trailing commas, mismatched braces)

A human reader can often “see what it meant.” A parser can’t.

The key idea: influence vs constrain

Prompting affects what the model is likely to generate.

Schema-constrained decoding affects what the decoder is allowed to generate.

Prompting = changes token probabilities

Constrained decoding = removes format-breaking tokens from the candidate set

This difference is the whole mechanism.