| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Financial Reasoning | FinQA | Accuracy77.6 | 69 | |
| Financial Question Answering | FinQA (test) | Accuracy76.05 | 57 | |
| Numerical Question Answering | FinQA (test) | Execution Accuracy91.16 | 33 | |
| Financial Question Answering | FinQA | Accuracy83.46 | 30 | |
| Reasoning Question Answering | FinQA (test) | Precision@176.9 | 28 | |
| Table Reasoning | FinQA | Accuracy69.4 | 18 | |
| Financial Open-ended QA | FinQA (test) | Token Accuracy29.67 | 16 | |
| Financial Open-ended Question Answering | FinQA (test) | Token Perplexity3.9697 | 16 | |
| Numerical Question Answering | FinQA 1.0 (test) | Execution Accuracy91.16 | 14 | |
| Financial Numerical Reasoning | FinQA (test) | Execution Accuracy84.66 | 13 | |
| Financial Numerical Reasoning | FinQA (dev) | Execution Accuracy84.71 | 13 | |
| Attribution Consistency and Downstream Performance | FinQA-TAS | F1 Score76.9 | 12 | |
| Proactive information probing | FinQA | PC4.8 | 12 | |
| Question Answering | FinQA (val) | Execution Accuracy0.6122 | 10 | |
| Cross-modal multi-expert orchestration | FinQA | Accuracy86.1 | 9 | |
| Financial Document QA | FinQA (test) | Execution Accuracy76.81 | 9 | |
| Question Answering | FinQA | Prog Acc59.37 | 9 | |
| Mathematical Reasoning | FINQA | Accuracy72.2 | 7 | |
| RAG Poisoning Attack (Document-Level Targeting) | FinQA | RSR@547.1 | 7 | |
| Fact-Level RAG Poisoning Attack | FinQA | RSR@599.8 | 7 | |
| Numerical Reasoning Question Answering | FinQA v1 (dev) | Execution Accuracy72.91 | 7 | |
| Fact Retrieval | FinQA (test) | Recall@393.31 | 7 | |
| Fact Retrieval | FinQA (dev) | R@395.03 | 7 | |
| Multi-step Reasoning over Code Dependencies | FinQA hard | Accuracy65.56 | 6 | |
| Hallucination Detection | FinQA retrieval-equalized (test) | P95 Latency (s)2.1 | 5 |