Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on ARC Challenge (Accuracy and PEEM Scores)

93.7Accuracy

GPT-4o-mini

16.01236.18156.3576.519Mar 11, 2026Mar 18, 2026Mar 26, 2026Apr 3, 2026Apr 10, 2026Apr 18, 2026Apr 26, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
93.74.8154.814.702
2026.04
92.89---
2026.03
91.44.7214.7824.595
2026.03
89.84.7024.6924.555
2026.04
89.8---
2026.04
85.78---
2026.04
84.44---
2026.04
84.3---
2026.03
83.44.4614.5774.35
2026.04
82.78---
2026.04
81.67---
2026.04
78.33---
2026.04
74.67---
2026.04
71.89---
2026.03
67.94.1684.2464.105
2026.04
56---
2026.04
54.78---
2026.04
50.67---
2026.03
43.9---
2026.03
40.2---
2026.03
39.8---
2026.03
39.3---
2026.03
38.4---
2026.03
38.3---
2026.03
37.8---
2026.03
37.8---
2026.03
37---
2026.03
36.8---
2026.03
36---
2026.03
35.5---
2026.03
35---
2026.03
34.6---
2026.03
29.9---
2026.03
29.2---
2026.03
28.2---
2026.04
27---
2026.04
26---
2026.04
25---
2026.04
24---
2026.04
24---
2026.04
24---
2026.04
23---
2026.04
23---
2026.04
23---
2026.04
22---
2026.04
22---
2026.04
21---
2026.04
21---
2026.04
20---
2026.04
19---