Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Commonsense Reasoning on Com^2-hard Intervention (test)
Loading...
54.77
Accuracy
Generator Baseline
2.9884
16.4317
29.875
43.3183
Feb 5, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Generator Baseline
Model=Qwen3-8B-Base
2026.02
54.77
MENTORCOLLAB FREE
Generator=Qwen3-8B-Bas...
2026.02
54.77
MENTORCOLLAB MLP
Generator=Qwen3-1.7B,...
2026.02
13.69
CoSD
Generator=Llama3.2-3B-...
2026.02
8.3
R-Stitch
Generator=Gemma-3-4B-P...
2026.02
4.98
Feedback
Search any
task
Search any
task