Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Question Answering on OBQA
Loading...
93.4
Accuracy
IoT
73.848
78.924
84
89.076
Mar 15, 2026
Accuracy
Updated 10d ago
Evaluation Results
Method
Method
Links
Accuracy
IoT
Model=GPT-4o mini
2026.03
93.4
CoT
Model=GPT-4o mini
2026.03
88.8
IoT
Model=Olmo-2-13B
2026.03
87.6
CoT
Model=Olmo-2-13B
2026.03
85.4
IoT
Model=Olmo-2-7B
2026.03
84.2
EoT
Model=Olmo-2-13B
2026.03
83.4
SC
Model=Olmo-2-13B
2026.03
82
CoT
Model=Olmo-2-7B
2026.03
80.8
IoT
Model=Llama-3.3-8B
2026.03
78.4
SC
Model=Llama-3.3-8B
2026.03
77.8
SC
Model=Olmo-2-7B
2026.03
75.8
EoT
Model=Llama-3.3-8B
2026.03
75.8
EoT
Model=Olmo-2-7B
2026.03
74.6
CoT
Model=Llama-3.3-8B
2026.03
74.6
Feedback
Search any
task
Search any
task