Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Over-refusal evaluation on MMMU (test)

37Math Score

Prompt-based

-1.488.5118.528.49Jan 31, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
37532.5
2026.01
35.52745
2026.01
354.532
2026.01
34.527.544
2026.01
24.511.547.5
2026.01
2446.533
2026.01
2446.532.5
2026.01
221150
2026.01
21.52230
2026.01
212127.5
2026.01
1326
2026.01
12.53.59.5
2026.01
1226.5
2026.01
11.51.59.5
2026.01
11.59.58.5
2026.01
10.52.56
2026.01
9.51.54
2026.01
978.5
2026.01
965
2026.01
7.52.55
2026.01
704
2026.01
605
2026.01
4.52.55.5
2026.01
3.501.5
2026.01
31.511.5
2026.01
30.57
2026.01
201
2026.01
1.51.510.5
2026.01
1.523
2026.01
1.502.5
2026.01
0.500
2026.01
00.54