Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Task-Awareness Improves LLM Generations and Uncertainty

About

In many applications of LLMs, natural language responses often have an underlying structure such as representing discrete labels, numerical values, or graphs. Yet, existing decoding and uncertainty estimation methods operate only in language space and largely disregard structural information. We address this by modeling LLM outputs directly in a task-dependent latent structure. By equipping this structure with a dissimilarity measure, we can compute Bayes-optimal responses. These are not selected from sampled generations but are newly synthesized by combining individual responses in the latent space. Across different tasks, Bayes-optimal responses consistently outperform standard decoding methods like beam search. Moreover, quantifying uncertainty via the induced Bayesian risk captures variations in terms of the latent structure and improves alignment with output quality and correctness. Our decision-theoretic framework is applicable to any problem that admits a latent response structure and enables reliable task-aware LLM predictions.

Tim Tomov, Dominik Fuchsgruber, Stephan G\"unnemann• 2026

Related benchmarks

TaskDatasetResultRank
Question AnsweringTriviaQA
EM71.9
116
Multi-answer Question AnsweringMAQA-ΔK−1
KL Divergence0.467
48
Uncertainty EstimationTriviaQA--
37
Uncertainty QuantificationMAQA
Hamming AUC83.5
28
Uncertainty QuantificationCNN/DailyMail
Hamming AUC0.745
28
Uncertainty QuantificationMAQA-ΔK−1
KL Divergence AUC0.757
28
Machine TranslationWMT19
COMET Score0.333
28
Uncertainty QuantificationWMT 19
COMET AUC0.608
28
Question AnsweringTriviaQA
Exact Match AUC78.5
28
Multi-answer Question AnsweringMAQA
Hamming Distance0.66
28
Showing 10 of 14 rows

Other info

Follow for update