Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on FinanceBench N=150
Loading...
98.7
Accuracy
Mafin 2.5 (Vectify AI)
5.412
29.631
53.85
78.069
May 7, 2026
May 9, 2026
May 12, 2026
May 15, 2026
May 18, 2026
May 21, 2026
May 24, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
Mafin 2.5 (Vectify AI)
Context Width=–, Model...
2026.05
98.7
Golden Evidence + GPT-5-mini
LLM Backbone=GPT-5-min...
2026.05
94
Our Oracle (evidence pages)
Context Width=Evidence...
2026.05
93.3
AgenticRAG
LLM Backbone=GPT-5-min...
2026.05
92
AgenticRAG
LLM Backbone=Claude So...
2026.05
91.78
Oracle (evidence pages)
Context Width=Evidence...
2026.05
85
OODA
Context Width=–, Model...
2026.05
82
Full filing in context
Context Width=95K toke...
2026.05
79
Databricks
Context Width=64K toke...
2026.05
75
Single vector store per filing
Context Width=–, Model...
2026.05
50
Agentic w. keyword search tools
Tool Suite=pdfgrep, rg...
2026.05
32.71
Traditional RAG
Retrieval Strategy=Tra...
2026.05
24.24
Shared vector store
Context Width=–, Model...
2026.05
19
Closed book
Context Width=–, Model...
2026.05
9
Feedback
Search any
task
Search any
task