IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
About
Reinforcement learning with verifiable rewards (RLVR) has become a key technique for en- hancing LLM reasoning, yet its data ineffi- ciency remains a major bottleneck. Existing methods address this problem only partially, each missing at least one of subset-level cov- erage, verifier signal use, or interpretability. To address this gap, we present IRDS (Inter- pretable RLVR Data Selection), which selects RLVR training instances on a sparse autoen- coder (SAE) cluster basis so the selection itself is auditable on recognizable problem motifs. To select instances the model both fails on and can still learn from, we introduce a verifier- coupled coverage objective on the SAE basis and solve it by greedy log-determinant max- imization. Experiments on three instruction- tuned models and six math reasoning bench- marks show that IRDS achieves the highest overall accuracy, exceeding the strongest base- line by +3.9/+4.0 pp on the two Qwen models and by +0.5 pp on Llama-3.1-8B, while run- ning an order of magnitude cheaper than the trajectory-based baseline.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AMC23 | Average@1682.4 | 63 | |
| Mathematical Reasoning | Math Benchmarks Aggregate | -- | 62 | |
| Mathematical Reasoning | Olympiad | Avg@16 Accuracy67.2 | 47 | |
| Math Reasoning | Olympiad | Average Rate @1665.6 | 38 | |
| Mathematical Reasoning | AIME 25 | Average@16 Score26 | 33 | |
| Math Reasoning | AMC 2023 | Avg@1679.5 | 29 | |
| Math Reasoning | MATH 500 | Mean@16 Accuracy91.3 | 24 | |
| Math Reasoning | Minerva | Mean@1643.8 | 24 | |
| Math Reasoning | AIME 2024 | Mean@1642 | 24 | |
| Math Reasoning | AIME 2025 | Mean Score (AIME 2025)35.6 | 24 |