Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on CLF (test)
Loading...
99
Accuracy
TCR-gold
3.944
28.622
53.3
77.978
Jan 29, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
TCR-gold
Backbone=Qwen3-8B-Inst...
2026.01
99
TCR
Backbone=Qwen3-8B-Inst...
2026.01
97
Qwen3-8B-Instruct
Backbone=Qwen3-8B-Inst...
2026.01
96.9
DoLa
Backbone=Qwen3-8B-Inst...
2026.01
94.4
TCR-gold
Backbone=Qwen2.5-7B-In...
2026.01
71.3
TCR
Backbone=Qwen2.5-7B-In...
2026.01
66.6
Qwen2.5-7B-Instruct
Backbone=Qwen2.5-7B-In...
2026.01
56.8
DoLa
Backbone=Qwen2.5-7B-In...
2026.01
52.3
TCR-gold
Backbone=LLaMA3-8B-Ins...
2026.01
32.7
TCR
Backbone=LLaMA3-8B-Ins...
2026.01
28.2
TCR-gold
Backbone=Phi-3-Instruc...
2026.01
20.3
LLaMA3-8B-Instruct
Backbone=LLaMA3-8B-Ins...
2026.01
15.2
TCR
Backbone=Phi-3-Instruc...
2026.01
11.2
Phi-3-Instruct
Backbone=Phi-3-Instruc...
2026.01
9.2
DoLa
Backbone=LLaMA3-8B-Ins...
2026.01
8.8
DoLa
Backbone=Phi-3-Instruc...
2026.01
7.6
Feedback
Search any
task
Search any
task