Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Calibration on MMLU

0.0559Brier Score

Verbalized confidence

0.0421720.1348360.22750.320164Jan 6, 2026Jan 12, 2026Jan 19, 2026Jan 25, 2026Feb 1, 2026Feb 7, 2026Feb 14, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.0559-
2026.02
0.0686-
2026.02
0.0817-
2026.02
0.0922-
2026.02
0.0977-
2026.02
0.1163-
2026.02
0.1176-
2026.02
0.12-
2026.02
0.1255-
2026.02
0.1293-
2026.02
0.13-
2026.01
0.14650.0393
2026.01
0.14730.0375
2026.01
0.150.0336
2026.02
0.1561-
2026.02
0.1597-
2026.01
0.16180.0703
2026.01
0.16620.0889
2026.01
0.17170.047
2026.01
0.17680.0525
2026.01
0.18040.0473
2026.01
0.18980.0555
2026.02
0.2006-
2026.01
0.20210.0307
2026.01
0.21520.1071
2026.02
0.2164-
2026.01
0.22750.1707
2026.01
0.23540.2261
2026.01
0.25460.1972
2026.01
0.25490.194
2026.02
0.2565-
2026.01
0.26070.2569
2026.01
0.26330.2011
2026.01
0.27470.1533
2026.01
0.27620.2465
2026.01
0.28560.2858
2026.02
0.2868-
2026.02
0.3018-
2026.01
0.30790.2971
2026.01
0.32560.3204
2026.01
0.36070.3132
2026.01
0.39910.4085