| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HotpotQA | Ensemble-A | AUROC81 | 57 | 1mo ago | |
| EuroSAT | Ours | AuROC98.1 | 48 | 1mo ago | |
| Flowers102 | Ours-D | AuROC99.38 | 46 | 1mo ago | |
| Average All shifts (test) | ENS-F | AUC90.99 | 40 | 3mo ago | |
| Corruptions (test) | MDSall | AUC99.1 | 40 | 3mo ago | |
| Adversarial Attacks (test) | ENS-V | AUC89.52 | 40 | 3mo ago | |
| In-distribution (test) | MSP | AUC0.8916 | 40 | 3mo ago | |
| MuSiQue (val) | IC-IDK | Precision1 | 36 | 3mo ago | |
| Mintaka (val) | IC-IDK | Precision100 | 36 | 3mo ago | |
| HotpotQA (val) | IC-IDK | Precision100 | 36 | 3mo ago | |
| FRAMES (test) | AYS | Precision97 | 36 | 3mo ago | |
| CRAG multi-hop subset (train) | AYS | Precision92 | 36 | 3mo ago | |
| Bamboogle Full | IC-IDK | Precision100 | 36 | 3mo ago | |
| MuSiQue | Ensemble-A | F1 Score0.93 | 36 | 3mo ago | |
| Mintaka | Ensemble-A | F1 Score88 | 36 | 3mo ago | |
| FRAMES | Ensemble-I | F1 Score95 | 36 | 3mo ago | |
| CRAG | Ensemble-A | F1 Score91 | 36 | 3mo ago | |
| Bamboogle | Ensemble-A | F1 Score0.94 | 36 | 3mo ago | |
| ImageNet | FDR47 | 36 | 8d ago | ||
| Food101 | ANTS | AuROC99.92 | 29 | 2mo ago | |
| CoSPlan | CoT | Robo-VQA-E Score9.1 | 20 | 3mo ago | |
| FCE (test) | GOPar | F0.5 Score74.1 | 16 | 3mo ago | |
| KITTI DK:test^noise noise-induced (test) | DETR | Recall96 | 15 | 1mo ago | |
| CoSPlan Blocks-World-E | CoG-VLM | Accuracy44.5 | 15 | 3mo ago | |
| CoSPlan Maze-E | GPT-4o | Accuracy0.403 | 15 | 3mo ago |