Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Reasoning on HLE

48.8Accuracy

CoT2-Meta

8.687219.101129.51539.9289Aug 5, 2025Sep 13, 2025Oct 23, 2025Dec 1, 2025Jan 10, 2026Feb 18, 2026Mar 30, 2026
Updated 12d ago

Evaluation Results

MethodLinks
2026.03
48.8
2026.03
44.5
2026.03
42.1
2026.03
38.4
2025.08
34.5
2026.03
32.5
2025.08
31.58
2025.08
26.61
2025.08
23.1
2025.08
16.67
2025.08
14.62
2025.08
14.04
2025.08
13.74
2025.08
13.45
2025.08
13.45
2025.08
12.87
2025.08
12.28
2025.08
11.99
2025.08
11.4
2025.08
11.4
2025.08
11.4
2025.08
11.11
2025.08
11.11
2025.08
11.11
2025.08
11.11
2025.08
11.11
2025.08
11.11
2025.08
10.82
2025.08
10.82
2025.08
10.82
2025.08
10.53
2025.08
10.53
2025.08
10.23