Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering on FinanceBench
Loading...
45
EM
Token-Guard
2.36
13.43
24.5
35.57
Jan 29, 2026
EM
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
F1 Score
Token-Guard
Backbone=Qwen3-8B
2026.01
45
45.37
Token-Guard
Backbone=Meta-Llama-3....
2026.01
30
30.8
Chain-of-Thoughts
Backbone=Qwen3-8B
2026.01
29
35.12
Tree-of-Thought
Backbone=Qwen3-8B
2026.01
28
34.91
Guided Decoding
Backbone=Qwen3-8B
2026.01
21
23.56
BaseModel
Backbone=Qwen3-8B
2026.01
20
26.67
BaseModel
Backbone=Meta-Llama-3....
2026.01
16
16
Guided Decoding
Backbone=Meta-Llama-3....
2026.01
14
16.44
Chain-of-Thoughts
Backbone=Meta-Llama-3....
2026.01
11
11.01
Tree-of-Thought
Backbone=Meta-Llama-3....
2026.01
10
14.44
Predictive Decoding
Backbone=Meta-Llama-3....
2026.01
9
8.79
Predictive Decoding
Backbone=Qwen3-8B
2026.01
4
11.25
Feedback
Search any
task
Search any
task