Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Value Modeling on DAPO-Math-17k Qwen2.5-7B-Instruct policy (Held-out)
Loading...
0.693
Intra AUC
V0
0.52036
0.56518
0.61
0.65482
Feb 3, 2026
Intra AUC
Pairwise Accuracy
Calibration MSE
Updated 1mo ago
Evaluation Results
Method
Method
Links
Intra AUC
Pairwise Accuracy
Calibration MSE
V0
Protocol=Strict Genera...
2026.02
0.693
0.84
0.165
Vanilla Value Model
Protocol=Strict Genera...
2026.02
0.527
0.507
0.583
Feedback
Search any
task
Search any
task