Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Verifying Meta-Awareness via Predictive Rewards in Reasoning Models

About

Recent research on reasoning models explores the meta-awareness of language models, including their ability to determine optimal thinking duration, recognize knowledge boundaries, and structure concept-level thinking. While current large reasoning models depend solely on answer-based verification, we show that adding meta-awareness objectives leads to significant performance gains over models without such meta-knowledge. MAPR (Meta-Awareness via Predictive Reward) utilizes a self-generated task of predicting rollout statistics - specifically length, pass-rate, and concepts used - allowing for verification against the actual statistics. Furthermore, by leveraging this self-predictive capability, the model can regulate its reasoning behavior by i) filtering out trivial or unsolvable prompts, ii) reducing lengthy generations that tend to be incorrect, and iii) generating hints relevant to the problem. The results are inspiring: MAPR yields significant improvements in both accuracy and training efficiency on various reasoning benchmarks. More specifically, our method can speed up GRPO training by over 1.28x to reach the same performance, and achieve 83.18% gain in accuracy on AIME25, and a 13.04% average gain over six mathematics benchmarks. The code is publicly available at https://github.com/akatigre/MAPR-RL.

Yoonjeon Kim, Doohyuk Jang, Eunho Yang• 2025

Related benchmarks

TaskDatasetResultRank
Code GenerationEvalPlus
Pass@177.66
115
Scientific ReasoningARC Challenge--
115
Scientific ReasoningGPQA Diamond
Pass@1 Accuracy53.72
67
Mathematical ReasoningOlympiad
Pass@161.59
41
CodingMBPP--
37
Logical reasoningLogical Deduction
Pass@181.03
20
Scientific ReasoningSciBench
Pass@129.64
12
CodingLiveCodeBench
Total Pass Rate31.61
11
Mathematical ReasoningAIME 2024
Pass@144.27
6
Mathematical ReasoningAIME 25
Pass@131.25
6
Showing 10 of 17 rows

Other info

GitHub

Follow for update