Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Step-level reasoning evaluation on Rooms (test)

0Error Rate

Qwen2.5-Math-7B-PRM800k

-3.32419.11341.5563.987Apr 20, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
01000
2026.04
9.510017.3
2026.04
14.700
2026.04
14.700
2026.04
14.810025.7
2026.04
46.999.563.7
2026.04
75.486.880.7
2026.04
75.782.979.1
2026.04
81.399.989.6
2026.04
83.198.990.3