Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Instruction Following on MM-IFEval

83.1Score

GPT-5

8.698428.014247.3366.6458Nov 28, 2025Dec 28, 2025Jan 27, 2026Feb 26, 2026Mar 28, 2026Apr 27, 2026May 28, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.04
83.1-
2026.04
80.2-
2026.04
79.6-
2026.04
78-
2026.04
77.7-
2026.04
75.8-
2026.04
75.7-
2026.04
75-
2026.04
73.5-
2026.04
69.9-
2026.04
69.2-
2026.04
68.2-
2026.04
66.5-
2026.04
66.3-
2026.04
66.1-
2026.04
65.7-
2026.05
61.61-
2026.04
59.4-
2026.05
58.4-
2026.05
57.23-
2026.03
57-
2026.04
56.3-
2026.04
54.4-
2026.05
54.32-
2026.04
53.6-
2026.03
53.3-
2026.03
53-
2026.03
52-
2026.05
51.74-
2026.03
50-
2026.03
50-
2026.03
49.3-
2026.03
49.2-
2026.05
48.83-
2026.03
48.8-
2026.03
48.7-
2026.03
48-
2026.03
46.8-
2026.03
46.7-
2025.11
46.35-
2026.05
41.25-
2026.03
39.6-
2025.11
36.17-
2025.11
33.09-
2026.05
28.83-
2025.11
11.56-
2025.11
-51.83
2025.11
-19.42
2025.11
-47.31
2025.11
-51.35
2025.11
-38.62