Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reasoning on AIME 2024, GSM8k, MATH 500, and GPQA

60AIME 2024 Score

Critique-GRPO

-2.35236813.83526630.022946.210534Jun 3, 2025Jul 13, 2025Aug 22, 2025Oct 1, 2025Nov 10, 2025Dec 20, 2025Jan 30, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.06
60-92.6-68.1352.666.29560.647.9870.036046.7
2025.06
50-91-63.7552.665.682.557.940.47053.3343.3
2025.06
40-82-53.2341.244.167.546.935.8668.253023.3
2025.06
40-83.2-56.2543.846.482.548.938.3866.8133.340
2026.01
0.74580.84310.93630.59410.7798--------
2026.01
0.74580.84890.93850.61240.7864--------
2026.01
0.74170.85280.94180.50690.7608--------
2026.01
0.73330.88060.93530.53790.7718--------
2026.01
0.72920.880.9310.59670.785--------
2026.01
0.7250.90290.93730.57830.7859--------
2026.01
0.72080.84450.93330.53540.7585--------
2026.01
0.70.92980.93530.58210.7868--------
2026.01
0.70.92440.9330.59910.7891--------
2026.01
0.68330.82580.9320.53030.7429--------
2026.01
0.6750.84690.93430.52270.7447--------
2026.01
0.67080.91790.92870.51830.7589--------
2026.01
0.6250.90970.93050.51960.7462--------
2026.01
0.62080.89590.93030.51010.7393--------
2026.01
0.46250.84380.8830.41160.6502--------
2026.01
0.44580.84310.88850.41730.6487--------
2026.01
0.44170.8380.89050.40530.6439--------
2026.01
0.43750.82780.8810.41290.6398--------
2026.01
0.43330.84330.88320.42420.646--------
2026.01
0.38330.84310.88120.40030.627--------
2026.01
0.34580.85290.85720.38130.6093--------
2026.01
0.10420.72510.71730.25130.4495--------
2026.01
0.09580.72360.70650.2330.4397--------
2026.01
0.07920.71260.71770.26070.4426--------
2026.01
0.06250.68150.68170.24180.4169--------
2026.01
0.05830.6430.67530.24810.4062--------
2026.01
0.05420.71580.66170.24940.4203--------
2026.01
0.04580.68680.62420.23740.3986--------