<aside> <img src="i" alt="i" width="40px" />

TL;DR: Prompting makes structured output more likely. Schema-constrained decoding makes invalid continuations impossible by masking out tokens that would violate a JSON/schema/grammar contract.

</aside>


The problem: “Return valid JSON” isn’t reliable

In real LLM workflows, you’ll ask for output like:

{
  "verdict": "pass",
  "confidence": "high",
  "reason": "The answer satisfies the rubric."
}

And then you’ll see failures such as:

A human reader can often “see what it meant.” A parser can’t.


The key idea: influence vs constrain

Prompting affects what the model is likely to generate.

Schema-constrained decoding affects what the decoder is allowed to generate.

Prompting = changes token probabilities

Constrained decoding = removes format-breaking tokens from the candidate set

This difference is the whole mechanism.