| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WCEP | Thought-Retriever | F1 Score23.8 | 13 | 3d ago | |
| Gov Report | OpenOrca | F1 Score24.4 | 13 | 3d ago | |
| AcademicEval Related-multi | Thought-Retriever | F1 Score21.6 | 13 | 3d ago | |
| AcademicEval Abstract-multi | Thought-Retriever | F1 Score27.5 | 13 | 3d ago | |
| AcademicEval Abstract-single | Thought-Retriever | F129 | 13 | 3d ago | |
| Person-Sport | + LTV | Accuracy (%)89.5 | 12 | 1mo ago | |
| Person-Occupation | + LTV | Accuracy49.6 | 12 | 1mo ago | |
| Person-Instrument | + LTV | Accuracy60.5 | 12 | 1mo ago | |
| Landmark-Country | + LTV | Accuracy76.2 | 12 | 1mo ago | |
| Product-Company | + LTV | Accuracy74.2 | 9 | 1mo ago | |
| FinQA (test) | APOLLO (Ensemble model) | Recall@393.31 | 7 | 1mo ago | |
| FinQA (dev) | APOLLO (Ensemble model) | R@395.03 | 7 | 1mo ago | |
| DATESET | Toolformer | Accuracy27.3 | 6 | 1mo ago | |
| TEMPLAMA | Toolformer | Accuracy16.3 | 6 | 1mo ago | |
| ConvFinQA (dev) | APOLLO (Ensemble model) | R@392.4 | 5 | 1mo ago |