Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FinanceBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Prompt Leakage AttackFinanceBench
ASR50088
16
Question AnsweringFinanceBench N=150
Accuracy98.7
14
Question AnsweringFinanceBench
EM45
12
Information RetrievalFinanceBench 150 samples
DocRec@595
11
Long-document Question AnsweringFinanceBench (FB)
Accuracy89.33
10
Question AnsweringFinanceBench Single-Document 9
Accuracy84
9
Financial Question AnsweringFinanceBench
Cost Saving89.1
8
Agentic Workflow Performance (Static)FinanceBench + CrewAI
Latency (s)65.01
6
Hallucination DetectionFinanceBench
F1 Score73.4
6
Question AnsweringFinanceBench (test)
F1 Score42.74
5
Financial Question AnsweringFinanceBench (test)
ROUGE-L20
4
Question AnsweringFinanceBench
F1 Score28.4
3
Showing 12 of 12 rows