Share your thoughts, 1 month free Claude Pro on usSee more

Financial Reasoning on S&P 500 Scenario-based MCQs Stage I

87.14Accuracy

DeepSeek-v3.1

Updated 3mo ago

Evaluation Results

Method	Links
DeepSeek-v3.1 2026.04		87.14
Claude-4.5-Sonnet 2026.04		86.19
Ours (Full Model) 2026.04		82.38
GPT-5.1 2026.04		80.95
Ours w/o DARA 2026.04		76.67
Qwen3-8B 2026.04		71.9
Ours w/o CORA, DARA & Verification 2026.04		70.95
Mistral-7B-Instruct-v0.3 2026.04		64.29
Llama-3.1-8B-Instruct 2026.04		57.62
Ours w/o CORA 2026.04		46.19