Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Regime Theory of Controller Class Selection for LLM Action Decisions

About

Deployed language and vision-language models must decide, on each input, whether to answer directly, retrieve evidence, defer to a stronger model, or abstain. Contrary to the common monotonicity intuition, greater per-input expressivity is not uniformly beneficial in finite samples: under identical strict cross-validation, different benchmarks prefer different controller classes. This reflects a finite-sample limitation of instance-level uncertainty signals, which can be exhausted at a distribution-dependent scale. We organize controllers into a nested lattice of four classes: fixed actions, partition routers, instance-level controllers, and prior-gated controllers, ordered by complexity. We prove a regime theory that turns three data-estimable bottlenecks into a class choice: how much improvement is possible beyond the best fixed action, whether there are enough samples for instance-level controllers to make reliable decisions, and how much improvement a coarse partition router can recover when instance-level signal is unreliable. The resulting Bernstein-tight threshold has a matching information-theoretic lower bound, and strict nested cross-validation provably selects a near-best class. Across SMS-Spam, HallusionBench, A-OKVQA, and FOLIO, the predicted class matches the empirical winner; the prior-gated controller wins on TextVQA when OCR tokens supply a label-free prediction-time prior. Code is available at https://github.com/Anonymous-Awesome-Submissions/Regime-Theory.

Zhaoyang Jiang, Zhizhong Fu, Yunsoo Kim, Jiacong Mi, Zicheng Li, Xuanqi Peng, Honghan Wu• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringTextVQA OCR n=5000 (val)
Answer Loss0.8212
4
Knowledge-based Visual Question AnsweringA-OKVQA n=1145 (held-out)
Per-Class Loss0.3805
3
Logical reasoningFOLIO n=203 (held-out)
Per-Class Loss0.7195
3
Spam DetectionSMS-Spam n=1114 (held-out)
Per-Class Loss0.059
3
Visual Hallucination DetectionHallusionBench n=920 (held-out)
Per-Class Loss0.897
3
Showing 5 of 5 rows

Other info

Follow for update