Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Commonsense Reasoning on Common sense QA (AUCOAA)
Loading...
81.4
AUCOAA
Adaptive-Answer
46.664
55.682
64.7
73.718
Jan 6, 2026
AUCOAA
Updated 4d ago
Evaluation Results
Method
Method
Links
AUCOAA
Adaptive-Answer
Backbone=Qwen3-8B
2026.01
81.4
Format-Adaptive-Answer
Backbone=Qwen3-8B
2026.01
81.3
Normalized-Length
Backbone=Qwen3-8B
2026.01
78.6
SFT
Backbone=Qwen3-8B
2026.01
76.8
Hard-Length 8k → 4k
Backbone=Qwen3-8B
2026.01
76.5
TWYN
Backbone=Qwen3-8B
2026.01
74.8
Hard-Length 8k
Backbone=Qwen3-8B
2026.01
73.3
Soft-Length
Backbone=Qwen3-8B
2026.01
73.3
Hard-Length 16k
Backbone=Qwen3-8B
2026.01
72.5
Base model
Backbone=Qwen3-8B
2026.01
72.1
No-Thinking
Backbone=Qwen3-8B
2026.01
48
Feedback
Search any
task
Search any
task