GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
About
Large Reasoning Models (LRMs) achieve remarkable performance by explicitly generating multi-step chains of thought, but this capability incurs substantial inference latency and computational cost. Collaborative inference offers a promising solution by selectively allocating work between lightweight and large models, yet a fundamental challenge remains: determining when a reasoning step requires the capacity of a large model or the efficiency of a small model. Existing routing strategies either rely on local token probabilities or post-hoc verification, introducing significant inference overhead. In this work, we propose a novel perspective on step-wise collaboration: the difficulty of a reasoning step can be inferred from its very first token. Inspired by the "Aha Moment" phenomenon in LRMs, we show that the entropy of the initial token serves as a strong predictor of step difficulty. Building on this insight, we introduce GlimpRouter, a training-free step-wise collaboration framework. GlimpRouter employs a lightweight model to generate only the first token of each reasoning step and routes the step to a larger model only when the initial token entropy exceeds a threshold. Experiments on multiple benchmarks demonstrate that our approach significantly reduces inference latency while preserving accuracy. For instance, GlimpRouter attains a substantial 10.7% improvement in accuracy while reducing inference latency by 25.9% compared to a standalone large model on AIME25. These results suggest a simple yet effective mechanism for reasoning: allocating computation based on a glimpse of thought rather than full-step evaluation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Table Reasoning | TableBench | Accuracy50.52 | 39 | |
| Math Reasoning | MATH 500 | BA @2076 | 20 | |
| Table Reasoning | TabFact | Accuracy88.85 | 18 | |
| Table Reasoning | HiTab | Accuracy76.22 | 18 | |
| Table Reasoning | WikiTQ, TabFact, TableBench, HiTab, FinQA Average | Accuracy71.72 | 18 | |
| Table Reasoning | WikiTQ | Accuracy79.04 | 18 | |
| Table Reasoning | FinQA | Accuracy65.81 | 18 | |
| Mathematical Reasoning | AIME24 | Accuracy53.3 | 5 | |
| Code Generation | LiveCodeBench v6 | Accuracy33.14 | 5 |