Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Deductive logical reasoning on ProverQA hard (test)
Loading...
0
Error Rate
ICL
-2.816
16.192
35.2
54.208
Jan 14, 2026
Error Rate
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Error Rate
Accuracy
ICL
Model=Qwen3-4B-Instruc...
2026.01
0
0
ICL
Model=Qwen2.5-3B-Instr...
2026.01
7.6
3
ICL
Model=Gemma-3-4B-Instr...
2026.01
28.4
9.4
SFT+
Model=Phi-4-mini-Instr...
2026.01
42.2
24.4
ICL
Model=Phi-4-mini-Instr...
2026.01
48.4
18.2
Increment
Model=Qwen2.5-3B-Instr...
2026.01
50.2
20.2
SFT+
Model=Qwen2.5-3B-Instr...
2026.01
50.4
22.2
Increment
Model=Phi-4-mini-Instr...
2026.01
53.4
31
SFT+
Model=Qwen3-4B-Instruc...
2026.01
64.8
40.8
SFT+
Model=Gemma-3-4B-Instr...
2026.01
65.8
33.6
Increment
Model=Gemma-3-4B-Instr...
2026.01
69.4
35.6
Increment
Model=Qwen3-4B-Instruc...
2026.01
70.4
43
Feedback
Search any
task
Search any
task