Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on Parity-NL (test)
Loading...
100
Accuracy
DoLa
22.832
42.866
62.9
82.934
Jan 29, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
DoLa
Backbone=Qwen3-8B-Inst...
2026.01
100
TCR
Backbone=Qwen3-8B-Inst...
2026.01
100
TCR-gold
Backbone=Qwen3-8B-Inst...
2026.01
100
Qwen3-8B-Instruct
Backbone=Qwen3-8B-Inst...
2026.01
98.7
TCR-gold
Backbone=LLaMA3-8B-Ins...
2026.01
88
TCR
Backbone=LLaMA3-8B-Ins...
2026.01
82
TCR-gold
Backbone=Qwen2.5-7B-In...
2026.01
81.2
DoLa
Backbone=LLaMA3-8B-Ins...
2026.01
71.6
LLaMA3-8B-Instruct
Backbone=LLaMA3-8B-Ins...
2026.01
70
TCR-gold
Backbone=Phi-3-Instruc...
2026.01
65.2
TCR
Backbone=Qwen2.5-7B-In...
2026.01
60.4
DoLa
Backbone=Qwen2.5-7B-In...
2026.01
58.1
TCR
Backbone=Phi-3-Instruc...
2026.01
55.8
Phi-3-Instruct
Backbone=Phi-3-Instruc...
2026.01
51.2
Qwen2.5-7B-Instruct
Backbone=Qwen2.5-7B-In...
2026.01
48.3
DoLa
Backbone=Phi-3-Instruc...
2026.01
25.8
Feedback
Search any
task
Search any
task