Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Value Modeling on DAPO-Math-17k DeepSeek-R1-Distill-Qwen-1.5B policy (Held-out)
Loading...
71
Intra AUC
V0
55.4
59.45
63.5
67.55
Feb 3, 2026
Intra AUC
Pairwise Accuracy
Calibration MSE
Updated 1mo ago
Evaluation Results
Method
Method
Links
Intra AUC
Pairwise Accuracy
Calibration MSE
V0
Protocol=Strict Genera...
2026.02
71
89.5
0.139
Vanilla Value Model
Protocol=Strict Genera...
2026.02
56
46.7
0.267
Feedback
Search any
task
Search any
task