Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on ARC Challenge (Accuracy and PEEM Scores)

93.7Accuracy

GPT-4o-mini

25.5843.26560.9578.635Mar 11, 2026Mar 16, 2026Mar 22, 2026Mar 28, 2026Apr 2, 2026Apr 8, 2026Apr 14, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.03
93.74.8154.814.702
2026.04
92.89---
2026.03
91.44.7214.7824.595
2026.03
89.84.7024.6924.555
2026.04
89.8---
2026.04
85.78---
2026.04
84.44---
2026.04
84.3---
2026.03
83.44.4614.5774.35
2026.04
82.78---
2026.04
81.67---
2026.04
78.33---
2026.04
74.67---
2026.04
71.89---
2026.03
67.94.1684.2464.105
2026.04
54.78---
2026.04
50.67---
2026.03
43.9---
2026.03
40.2---
2026.03
39.8---
2026.03
39.3---
2026.03
38.4---
2026.03
38.3---
2026.03
37.8---
2026.03
37.8---
2026.03
37---
2026.03
36.8---
2026.03
36---
2026.03
35.5---
2026.03
35---
2026.03
34.6---
2026.03
29.9---
2026.03
29.2---
2026.03
28.2---