Abstract
Autoregressive LLMs generate text by sampling from estimated probability distributions over the next token, conditional on preceding context. We leverage these conditional probabilities to construct an entropy-based measure of prediction uncertainty, which we term inner confidence. Predictions with higher inner confidence are systematically more accurate.
To assess the measure’s economic relevance, we use an LLM to predict daily stock returns based on firm-specific news and evaluate the performance of long-short portfolios built on these predictions. Conditioning on inner confidence significantly improves the performance: high-confidence predictions achieve Sharpe ratios roughly 20% higher than the unconditional benchmark, while low-confidence predictions yield no excess returns. By contrast, LLM’s self-declared confidence exhibits strong biases and delivers no comparable gains.