Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Question Answering on OBQA
Loading...
93.4
Accuracy
IoT
30.792
47.046
63.3
79.554
Mar 15, 2026
Mar 28, 2026
Apr 10, 2026
Apr 23, 2026
May 6, 2026
May 19, 2026
Jun 1, 2026
Accuracy
Updated 21h ago
Evaluation Results
Method
Method
Links
Accuracy
IoT
Model=GPT-4o mini
2026.03
93.4
CoT
Model=GPT-4o mini
2026.03
88.8
IoT
Model=Olmo-2-13B
2026.03
87.6
CoT
Model=Olmo-2-13B
2026.03
85.4
IoT
Model=Olmo-2-7B
2026.03
84.2
EoT
Model=Olmo-2-13B
2026.03
83.4
SC
Model=Olmo-2-13B
2026.03
82
CoT
Model=Olmo-2-7B
2026.03
80.8
IoT
Model=Llama-3.3-8B
2026.03
78.4
SC
Model=Llama-3.3-8B
2026.03
77.8
SC
Model=Olmo-2-7B
2026.03
75.8
EoT
Model=Llama-3.3-8B
2026.03
75.8
EoT
Model=Olmo-2-7B
2026.03
74.6
CoT
Model=Llama-3.3-8B
2026.03
74.6
SUBFIT
Model=Qwen2.5-7B-Instr...
2026.06
40.2
SUBFIT
Model=Llama-3.1-8B-Ins...
2026.06
39.2
SUBFIT
Model=Llama-3.2-3B-Ins...
2026.06
34.2
SUBFIT
Model=DeepSeek-7B-chat...
2026.06
34
SUBFIT
Model=Qwen3-4B-Instruc...
2026.06
33.2
Feedback
Search any
task
Search any
task