Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Commonsense Question Answering on Commonsense QA
Loading...
50.97
Reusability Score
Phi
28.2564
34.1532
40.05
45.9468
Feb 19, 2026
Reusability Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Reusability Score
Phi
Executor=Strong Comm.
2026.02
50.97
Phi
Executor=Full Comm.
2026.02
43.52
Llama
Executor=Strong Comm.
2026.02
39.12
Llama
Executor=Full Comm.
2026.02
36.55
Phi
Executor=Weak Comm.
2026.02
36.07
Gemma
Executor=Strong Comm.
2026.02
34.92
R1
Executor=Strong Comm.
2026.02
34.69
Gemma
Executor=Full Comm.
2026.02
34.29
Llama
Executor=Weak Comm.
2026.02
33.97
Gemma
Executor=Weak Comm.
2026.02
33.67
R1
Executor=Full Comm.
2026.02
31.91
R1
Executor=Weak Comm.
2026.02
29.13
Feedback
Search any
task
Search any
task