Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Financial Reasoning on FinQA

77.6Accuracy

GPT-5 mini-high

2.61622.08341.5561.017Dec 30, 2025Jan 10, 2026Jan 21, 2026Feb 1, 2026Feb 12, 2026Feb 23, 2026Mar 7, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
77.677.1--
2025.12
77.186.2--
2026.03
73.374.6--
2026.03
72.472.5--
2026.03
72.474.7--
2025.12
72.283.82--
2026.03
72.271.5--
2026.03
7267.7--
2026.03
69.872.1--
2025.12
68.381.47--
2026.03
67.761.4--
2026.02
67.3---
2026.03
67.270.3--
2026.02
67.1---
2026.02
66.7---
2025.12
65.675.88--
2025.12
65.582.56--
2026.03
64.668--
2026.02
64.2---
2026.02
64.2---
2025.12
63.582.61--
2026.02
60.1---
2026.03
57.656.9--
2025.12
56.771.85--
2026.03
52.447.8--
2026.02
51.6---
2026.02
51.3---
2026.02
51.1---
2026.02
49.8---
2026.02
49.2---
2026.02
48.6---
2026.03
23.436.7--
2026.03
5.536.2--
--57.8760.02
--58.4960.96
--59.762.11
--60.5463.48
--56.4958.33
--56.7558.81
--53.9555.76
--54.6456.97
--49.5352.08
--48.7750.9