Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations

About

In black-box large language model (LLM) services, response reliability is often only partially observable at decision time, while stronger inference pathways incur substantial computational cost, inducing a budgeted sequential decision problem: for each request, the system should decide whether the default low-cost response is sufficiently reliable or whether additional computation should be allocated to improve response quality. In this paper, we propose \textbf{Ver}ifiable \textbf{O}bservations for Risk-aware \textbf{I}nference \textbf{C}ontrol (\textsc{Veroic}), a framework for adaptive inference control in black-box LLM settings, which formulates request-time control as a \textit{partially observable Markov decision process} to capture partial observability and sequential budget coupling. It constructs a lightweight verifiable observation channel from the input-output pair by aggregating heterogeneous quality signals into a belief state over latent response reliability, which is then used by a budget-aware policy to decide whether to return the default output or trigger a higher-cost inference pathway. Experiments on diverse tasks show that \textsc{Veroic} achieves improved quality-cost trade-offs, stronger risk estimation and calibration, and more robust long-horizon inference control than competitive baselines.

Wenhao Yuan, Chenchen Lin, Jian Chen, Jinfeng Xu, Shuo Yang, Edith Cheuk Han Ngai• 2026

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval (test)
Pass@172.9
612
Code GenerationMBPP (test)
Pass@173.2
405
Math ReasoningGSM8K (test)--
250
Question AnsweringPopQA (test)
Accuracy35.3
111
Question AnsweringHotpotQA (test)
EM56.2
32
Math ReasoningMATH (test)
Exact Match (EM)52.7
14
Question Answering2WikiMHQA (test)
EM45.8
14
Code GenerationHumanEval
AUROC81
7
Math ReasoningGSM8K
AUROC0.87
7
Question AnsweringHotpotQA
AUROC81
7
Showing 10 of 13 rows

Other info

Follow for update