In a recent evaluation benchmark for sales-style AI agents, I ran into a pattern that felt surprisingly consistent:

The benchmark penalized overcommitment and rewarded phased, cautious language. But one question kept coming up:

Why does the same model sometimes sound overconfident and other times cautious - even under similar conditions?

This turns out not to be random, and not just about prompt wording. It’s an inference-time effect, driven by how token probabilities, prompt conditioning, and decoding strategies interact under uncertainty.


1. The Core Mechanism: Token Probabilities

At its core, a language model generates text one token at a time.

For every next word, the model:

  1. Assigns a score (called a logit) to many possible tokens
  2. Converts those scores into probabilities
  3. Selects one token based on those probabilities

So when generating a sentence like:

“Based on your requirements, …”

the model might internally assign probabilities like: