Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Real-world Multimodal Reasoning on RealWorldQA

75.4Accuracy

GPT-4o

-1.24818.65138.5558.449Apr 25, 2024Aug 22, 2024Dec 19, 2024Apr 18, 2025Aug 15, 2025Dec 12, 2025Apr 11, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2024.09
75.4-
2024.09
75.4-
2025.10
75-
2024.09
71-
2024.09
69-
2024.04
68.7-
68.7-
2024.09
67.8-
2024.04
67.5-
2024.04
67.5-
2024.09
67.5-
2024.09
66.8-
2024.04
66-
2026.04
65.4952.81
2024.09
64.1-
2024.09
62.6-
2024.09
62.6-
2024.09
62.5-
2025.10
61.8-
2024.04
61.4-
2024.09
61.4-
2024.09
60.7-
2024.09
60.5-
2024.09
59.4-
2024.09
59.4-
2024.09
59-
2025.10
58.7-
2026.04
57.9152.54
2026.04
57.8643
2024.09
57.8-
2026.04
57.6949.92
2024.09
57.4-
2026.04
57.3450.96
2024.12
57-
2024.09
56.9-
2024.09
56.5-
2024.09
55.8-
2024.09
55.8-
2024.09
55.7-
2024.09
55.6-
2024.12
54.6-
2025.10
54.3-
2024.12
53.7-
2024.12
53.7-
2024.09
53.3-
2026.04
52.4647.39
2024.04
51.9-
2026.04
51.3348.43
2024.09
51.2-
2024.04
49.8-
2026.04
49.4144.53
2026.04
48.2828.42
2026.04
47.4939.93
2024.12
42.3-
2026.04
35.6926.46
2026.04
31.1523.25
2025.10
1.7-