Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Utility Evaluation on MATH500

93.6Pass@1 Accuracy

RealSafe-R1

30.26446.70763.1579.593Aug 6, 2025
Updated 27d ago

Evaluation Results

MethodLinks
2025.08
93.6
2025.08
93.6
2025.08
93.6
2025.08
92.9
2025.08
91.8
2025.08
90.9
2025.08
90
2025.08
86.4
2025.08
86.4
2025.08
85.5
2025.08
85.5
2025.08
85.5
2025.08
84.5
2025.08
83.6
2025.08
83.6
2025.08
81.8
2025.08
81.8
2025.08
77.3
2025.08
74.5
2025.08
71.8
2025.08
52.7
2025.08
32.7