| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Prompt Leakage Attack | FinanceBench | ASR50088 | 16 | |
| Question Answering | FinanceBench N=150 | Accuracy98.7 | 14 | |
| Question Answering | FinanceBench | EM45 | 12 | |
| Information Retrieval | FinanceBench 150 samples | DocRec@595 | 11 | |
| Long-document Question Answering | FinanceBench (FB) | Accuracy89.33 | 10 | |
| Question Answering | FinanceBench Single-Document 9 | Accuracy84 | 9 | |
| Financial Question Answering | FinanceBench | Cost Saving89.1 | 8 | |
| Agentic Workflow Performance (Static) | FinanceBench + CrewAI | Latency (s)65.01 | 6 | |
| Hallucination Detection | FinanceBench | F1 Score73.4 | 6 | |
| Question Answering | FinanceBench (test) | F1 Score42.74 | 5 | |
| Financial Question Answering | FinanceBench (test) | ROUGE-L20 | 4 | |
| Question Answering | FinanceBench | F1 Score28.4 | 3 |