Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Financial Reasoning on S&P 500 Scenario-based MCQs Stage I
Loading...
87.14
Accuracy
DeepSeek-v3.1
44.552
55.6085
66.665
77.7215
Apr 18, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
DeepSeek-v3.1
Provider=DeepSeek-AI
2026.04
87.14
Claude-4.5-Sonnet
Provider=Anthropic
2026.04
86.19
Ours (Full Model)
Backbone=Llama-3.1-8B-...
2026.04
82.38
GPT-5.1
Provider=OpenAI
2026.04
80.95
Ours w/o DARA
Backbone=Llama-3.1-8B-...
2026.04
76.67
Qwen3-8B
Parameters=8B, Provide...
2026.04
71.9
Ours w/o CORA, DARA & Verification
Backbone=Llama-3.1-8B-...
2026.04
70.95
Mistral-7B-Instruct-v0.3
Parameters=7B
2026.04
64.29
Llama-3.1-8B-Instruct
Parameters=8B, Provide...
2026.04
57.62
Ours w/o CORA
Backbone=Llama-3.1-8B-...
2026.04
46.19
Feedback
Search any
task
Search any
task