| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Average All shifts (test) | ENS-F | AUC90.99 | 40 | 3d ago | |
| Corruptions (test) | MDSall | AUC99.1 | 40 | 3d ago | |
| Adversarial Attacks (test) | ENS-V | AUC89.52 | 40 | 3d ago | |
| In-distribution (test) | MSP | AUC0.8916 | 40 | 3d ago | |
| MuSiQue (val) | IC-IDK | Precision1 | 36 | 3d ago | |
| Mintaka (val) | IC-IDK | Precision100 | 36 | 3d ago | |
| HotpotQA (val) | IC-IDK | Precision100 | 36 | 3d ago | |
| FRAMES (test) | AYS | Precision97 | 36 | 3d ago | |
| CRAG multi-hop subset (train) | AYS | Precision92 | 36 | 3d ago | |
| Bamboogle Full | IC-IDK | Precision100 | 36 | 3d ago | |
| MuSiQue | Ensemble-A | F1 Score0.93 | 36 | 3d ago | |
| Mintaka | Ensemble-A | F1 Score88 | 36 | 3d ago | |
| HotpotQA | Ensemble-A | F1 Score91 | 36 | 3d ago | |
| FRAMES | Ensemble-I | F1 Score95 | 36 | 3d ago | |
| CRAG | Ensemble-A | F1 Score91 | 36 | 3d ago | |
| Bamboogle | Ensemble-A | F1 Score0.94 | 36 | 3d ago | |
| ImageNet | Ours | AuROC88.57 | 35 | 3d ago | |
| EuroSAT | Ours | AuROC98.1 | 27 | 3d ago | |
| Food101 | Ours-D | AuROC95.06 | 27 | 3d ago | |
| Flowers102 | Ours-D | AuROC99.38 | 27 | 3d ago | |
| CoSPlan | CoT | Robo-VQA-E Score9.1 | 20 | 3d ago | |
| FCE (test) | GOPar | F0.5 Score74.1 | 16 | 3d ago | |
| CoSPlan Blocks-World-E | CoG-VLM | Accuracy44.5 | 15 | 3d ago | |
| CoSPlan Maze-E | GPT-4o | Accuracy0.403 | 15 | 3d ago | |
| CoSPlan Robo-VQA-E | GPT-4o | Accuracy45.3 | 15 | 3d ago |