Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on FinanceMath
Loading...
64
Accuracy
F-1
24.48
34.74
45
55.26
Jan 27, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
F-1
Model=GPT-5, Model Cat...
2026.01
64
F-1
Model=Qwen3-235B, Mode...
2026.01
61
PoT
Model=Qwen3-235B, Mode...
2026.01
59.5
F-1
Model=Gemini 2.5 Pro,...
2026.01
56.5
CoT
Model=Gemini 2.5 Pro,...
2026.01
56
F-1
Model=Qwen3-30B, Model...
2026.01
56
PoT
Model=Gemini 2.5 Pro,...
2026.01
55.5
PoT
Model=GPT-5, Model Cat...
2026.01
54
CoT
Model=Qwen3-30B, Model...
2026.01
53.5
PoT
Model=Qwen3-30B, Model...
2026.01
53.5
F-1
Model=DeepSeek-V3.1, M...
2026.01
44
CoT
Model=GPT-5, Model Cat...
2026.01
42.5
PoT
Model=DeepSeek-V3.1, M...
2026.01
37.5
Zero-Shot
Model=Qwen3-235B, Mode...
2026.01
35.5
CoT
Model=Qwen3-235B, Mode...
2026.01
35
Zero-Shot
Model=Gemini 2.5 Pro,...
2026.01
32.5
Zero-Shot
Model=Qwen3-30B, Model...
2026.01
28.5
CoT
Model=DeepSeek-V3.1, M...
2026.01
28
Zero-Shot
Model=DeepSeek-V3.1, M...
2026.01
27.5
Zero-Shot
Model=GPT-5, Model Cat...
2026.01
26
Feedback
Search any
task
Search any
task