GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training
About
Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace contribute very unevenly, and selecting reasoning data well requires assessing them individually. We present GRACE, a gradient-aligned curation method that views each reasoning trace as a sequence of optimization events and scores every step by two complementary signals: its alignment with the answer-oriented gradient direction, and its consistency with the preceding reasoning trajectory. Step-level scores are aggregated into a sample-level value for subset selection, using only the model's internal optimization signals and no external reward models or step annotations. To make this scalable, GRACE introduces a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE reaches 108.8% of the full-data performance with 20% of the data and retains 100.2% with only 5%, with subsets that transfer effectively across model backbones.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Evaluation | MME | MME-P Score1.51e+3 | 114 | |
| Multimodal Benchmarking | MMBench | Score73.8 | 73 | |
| Mathematical Reasoning | MathVista | MathVista54.3 | 55 | |
| Science Question Answering | SQA | SQA Score85 | 26 | |
| Multimodal Reasoning | Multiple Evaluation Benchmarks Aggregate (test) | Relative Average Performance105.2 | 24 | |
| Hallucination Detection | HallusionBench | Hallusion Score48.5 | 20 | |
| Mathematical Vision Reasoning | MathVision | Score (MINI)21.7 | 11 | |
| Multi-task and Multi-image Reasoning | MMT-Bench | SI Score58.5 | 11 |