Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Language Understanding and Reasoning on IFEval, BBH, MATH, GPQA, MUSR, MMLU-PRO

69.07IFEval Score

RMiPO

45.98251.97657.9763.964Apr 27, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
69.0729.018.963.58.4130.4624.91.83
2026.04
68.1529.018.943.278.4229.8524.612.5
2026.04
68.1229.138.754.155.6729.824.273
2026.04
67.9328.858.413.466.1628.0723.815
2026.04
67.5728.518.462.913.9329.6123.55.67
2026.04
65.9127.978.745.914.1528.6723.564.67
2026.04
65.0426.718.615.828.1527.6623.675.17
2026.04
52.7622.043.53.679.8519.6718.582.5
2026.04
52.119.23.352.86.0519.8817.233.83
2026.04
52.0517.253.52.856.8819.7517.053.83
2026.04
51.9517.053.272.748.9119.8417.294.67
2026.04
51.7616.883.12.465.7519.4316.566.33
2026.04
47.122.52.953.919.4720.0517.663
2026.04
46.8722.382.873.89.7619.9617.613.67