A Regime Theory of Controller Class Selection for LLM Action Decisions
About
Deployed language and vision-language models must decide, on each input, whether to answer directly, retrieve evidence, defer to a stronger model, or abstain. Contrary to the common monotonicity intuition, greater per-input expressivity is not uniformly beneficial in finite samples: under identical strict cross-validation, different benchmarks prefer different controller classes. This reflects a finite-sample limitation of instance-level uncertainty signals, which can be exhausted at a distribution-dependent scale. We organize controllers into a nested lattice of four classes: fixed actions, partition routers, instance-level controllers, and prior-gated controllers, ordered by complexity. We prove a regime theory that turns three data-estimable bottlenecks into a class choice: how much improvement is possible beyond the best fixed action, whether there are enough samples for instance-level controllers to make reliable decisions, and how much improvement a coarse partition router can recover when instance-level signal is unreliable. The resulting Bernstein-tight threshold has a matching information-theoretic lower bound, and strict nested cross-validation provably selects a near-best class. Across SMS-Spam, HallusionBench, A-OKVQA, and FOLIO, the predicted class matches the empirical winner; the prior-gated controller wins on TextVQA when OCR tokens supply a label-free prediction-time prior. Code is available at https://github.com/Anonymous-Awesome-Submissions/Regime-Theory.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Question Answering | TextVQA OCR n=5000 (val) | Answer Loss0.8212 | 4 | |
| Knowledge-based Visual Question Answering | A-OKVQA n=1145 (held-out) | Per-Class Loss0.3805 | 3 | |
| Logical reasoning | FOLIO n=203 (held-out) | Per-Class Loss0.7195 | 3 | |
| Spam Detection | SMS-Spam n=1114 (held-out) | Per-Class Loss0.059 | 3 | |
| Visual Hallucination Detection | HallusionBench n=920 (held-out) | Per-Class Loss0.897 | 3 |