Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FinQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Financial ReasoningFinQA
Accuracy77.6
69
Financial Question AnsweringFinQA (test)
Accuracy76.05
57
Numerical Question AnsweringFinQA (test)
Execution Accuracy91.16
33
Financial Question AnsweringFinQA
Accuracy83.46
30
Reasoning Question AnsweringFinQA (test)
Precision@176.9
28
Table ReasoningFinQA
Accuracy69.4
18
Financial Open-ended QAFinQA (test)
Token Accuracy29.67
16
Financial Open-ended Question AnsweringFinQA (test)
Token Perplexity3.9697
16
Numerical Question AnsweringFinQA 1.0 (test)
Execution Accuracy91.16
14
Financial Numerical ReasoningFinQA (test)
Execution Accuracy84.66
13
Financial Numerical ReasoningFinQA (dev)
Execution Accuracy84.71
13
Attribution Consistency and Downstream PerformanceFinQA-TAS
F1 Score76.9
12
Proactive information probingFinQA
PC4.8
12
Question AnsweringFinQA (val)
Execution Accuracy0.6122
10
Cross-modal multi-expert orchestrationFinQA
Accuracy86.1
9
Financial Document QAFinQA (test)
Execution Accuracy76.81
9
Question AnsweringFinQA
Prog Acc59.37
9
Mathematical ReasoningFINQA
Accuracy72.2
7
RAG Poisoning Attack (Document-Level Targeting)FinQA
RSR@547.1
7
Fact-Level RAG Poisoning AttackFinQA
RSR@599.8
7
Numerical Reasoning Question AnsweringFinQA v1 (dev)
Execution Accuracy72.91
7
Fact RetrievalFinQA (test)
Recall@393.31
7
Fact RetrievalFinQA (dev)
R@395.03
7
Multi-step Reasoning over Code DependenciesFinQA hard
Accuracy65.56
6
Hallucination DetectionFinQA retrieval-equalized (test)
P95 Latency (s)2.1
5
Showing 25 of 31 rows