Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multiple-Choice Question Answering on Average (OBQA, ARC-C, ARC-E, SCIQ, SIQA)

87.1Average Accuracy

NTF

41.3453.2265.176.98May 31, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
87.1101.3
2026.05
87-
2026.05
86.593.4
2026.05
86.188.2
2026.05
8686.8
2026.05
85.782.9
2026.05
85.2-
2026.05
85.175
2026.05
84.996.2
2026.05
84.894.9
2026.05
83.882.3
2026.05
83.882.3
2026.05
82.970.9
2026.05
82.667.1
2026.05
81.2-
2026.05
80.995.9
2026.05
80.6-
2026.05
8093
2026.05
79.4-
2026.05
79.486
2026.05
78.879.1
2026.05
78.575.6
2026.05
78.462.2
2026.05
77.955.4
2026.05
77.3-
2026.05
77.260.5
2026.05
7758.1
2026.05
76.739.2
2026.05
75.117.6
2026.05
75103.5
2026.05
74.8-
2026.05
74.712.2
2026.05
74.493
2026.05
74.391.2
2026.05
74.187.7
2026.05
7486
2026.05
73.8-
2026.05
73.8-
2026.05
73.798.9
2026.05
72-
2026.05
7279.3
2026.05
71.745.6
2026.05
70.966.7
2026.05
69.348.3
2026.05
69.247.1
2026.05
69.1-
2026.05
68.842.5
2026.05
65.1-
2026.05
61.9-
2026.05
61.898.9
2026.05
61.696.8
2026.05
59.776.3
2026.05
59.473.1
2026.05
59.271
2026.05
5968.8
2026.05
52.6-
2026.05
52.6-
2026.05
50.5113.9
2026.05
50106.2
2026.05
49.6-
2026.05
49.293.8
2026.05
49.293.8
2026.05
4990.8
2026.05
46.146.2
2026.05
43.1-
2026.05
43.1-