AlphaToken: Decoupling Adaptation and Stability for Path-Aware Response Token Valuation in LLM Post-Training
About
Token selection is pivotal for effective LLM post-training. However, existing methods mostly rely on local heuristics and rarely formulate token selection as a principled valuation of individual response tokens. We introduce $\textbf{AlphaToken}$, a response token valuation framework that decouples valuation into $\textbf{adaptation}$ (promoting target-task learning) and $\textbf{stability}$ (preserving pre-trained capabilities), and makes each objective $\textbf{path-aware}$ by combining the direct-path signal from local token gradients with the downstream causal-path signal in autoregressive generation. Since retention data are typically unavailable, AlphaToken approximates stability via a $\textbf{Fisher-drift proxy}$ anchored at the pre-trained reference model. For efficient computation, we extend Ghost Dot-Product to token-level valuation. AlphaToken masks low-value response tokens during fine-tuning and preference optimization, concentrating training signals on more valuable positions. Experiments show that AlphaToken improves post-training performance and mitigates catastrophic forgetting.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instruction Following | AlpacaEval 2.0 | Win Rate38.76 | 722 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Accuracy56.21 | 711 | |
| Multitask Language Understanding | MMLU | Accuracy67.05 | 520 | |
| Instruction Following | Arena Hard | Win Rate34.6 | 263 | |
| Code Generation | HumanEval | HumanEval Score78.88 | 128 | |
| General Capability Evaluation | General Capability Suite MMLU, GSM8K, HumanEval, IFEval | Common Average Score72.59 | 39 | |
| General Capability Evaluation | General Capability Suite ARC-C, HellaSwag, MMLU, GSM8K | ARC-C Accuracy53.13 | 27 | |
| Science Question Answering | ARC-C | Accuracy (ARC-C)50.74 | 25 | |
| Overall Performance Evaluation | Consolidated Evaluation Benchmark | Overall Average Score49.49 | 18 | |
| Preference Aggregation | Preference Evaluation Suite Aggregate | Average Preference Win Rate36.68 | 18 |