| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Information Retrieval | BRIGHT | Mean nDCG@1054.8 | 94 | |
| Passage Reranking | BRIGHT | NDCG@10 (Avg)40.3 | 54 | |
| Information Retrieval | BRIGHT 1.0 (test) | nDCG@10 (Avg)37.9 | 35 | |
| Long-context retrieval | BRIGHT StackExchange | Biology Score62.3 | 29 | |
| Downstream retrieval | BRIGHT | Biology nDCG@520 | 24 | |
| Information Retrieval | BRIGHT v1 (test) | nDCG@10 (Avg)49.1 | 22 | |
| Reasoning-intensive Retrieval | BRIGHT | BRIGHT Score (Biology)33.9 | 20 | |
| Retrieval | BRIGHT 12 datasets aggregate (test) | NDCG@1012.74 | 20 | |
| Reasoning-based Retrieval | BRIGHT 1.0 (test) | NDCG@10 (Bio.)61.77 | 16 | |
| First-stage retrieval | BRIGHT (test) | nDCG@10 (Biology)54.5 | 13 | |
| Information Retrieval | BRIGHT static retrieval setting PRO | NDCG@10 (Overall)33.8 | 13 | |
| Retrieval | BRIGHT | nDCG@1 (Econ)65.8 | 13 | |
| Retrieval | BRIGHT v1 (leaderboard) | Average Retrieval Score46.8 | 12 | |
| Multi-class Classification | BRIGHT 6class (test) | Accuracy43.4 | 11 | |
| Building Damage Assessment | BRIGHT | F1 (bcd)91.71 | 10 | |
| Information Retrieval | BRIGHT unseen 6 subsets (test) | nDCG@1011.79 | 7 | |
| Change Detection | BRIGHT DFC25-T2 | mIoU43.75 | 6 | |
| Image Classification | BRIGHT (test) | Accuracy0.6458 | 3 |