Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on MMLU-Pro (Subject Averages)

57.5History Score

RAT

31.60438.32745.0551.773Aug 8, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
57.555.37334.255
2025.08
5752.368.636.253.5
2025.08
56.753.170.438.254.5
2025.08
56.348.567.831.251
2025.08
53.555.369.835.453.5
2025.08
53.25372.23553.3
2025.08
52.243.56433.848.4
2025.08
51.249.767.832.650.3
2025.08
5046.667.534.249.6
2025.08
48.338.65828.843.4
2025.08
48.338.65828.843.4
2025.08
47.944.265.930.447.1
2025.08
47.849.363.93047.8
2025.08
47.849.363.93047.8
2025.08
47.244.764.43046.6
2025.08
44.641.157.82642.4
2025.08
44.636.95326.440.2
2025.08
44.636.95326.440.2
2025.08
43.633.95126.638.8
2025.08
43.641.256.82842.4
2025.08
43.633.95126.638.8
2025.08
4342.660.626.643.2
2025.08
42.845.963.229.645.4
2025.08
42.345.763.426.644.5
2025.08
42.337.752.628.640.3
2025.08
40.734.75524.638.7
2025.08
40.742.16026.242.3
2025.08
40.734.75524.638.7
2025.08
39.832.14723.435.6
2025.08
37.836.551.423.237.1
2025.08
37.840.953.42940.3
2025.08
37.836.551.423.237.1
2025.08
37.833.550.618.435.1
2025.08
37.840.953.42940.3
2025.08
35.736.955.225.438.3
2025.08
33.934.550.620.634.9
2025.08
33.632.348.820.633.8
2025.08
33.632.348.820.633.8
2025.08
33.130.544.220.232
2025.08
32.632.5462834.8