| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SIB200 | Gemma3-12B | SLFR86.7 | 85 | 21d ago | |
| TAXI1500 | Gemma3-12B | SLFR93.7 | 67 | 21d ago | |
| SNLI Hypothesis | iFlip-Conf | LFR83 | 37 | 3mo ago | |
| SNLI Premise | FIZLE | LFR0.759 | 37 | 3mo ago | |
| AG News | iFlip-NL | LFR0.915 | 37 | 3mo ago | |
| IMDb | iFlip-Conf | LFR100 | 37 | 3mo ago | |
| SST2 (test) | POLYJUICE | SLFR29 | 29 | 3mo ago | |
| AG News (test) | ZEROCF | SLFR98 | 29 | 3mo ago | |
| AVICI (test) | DoWhy | LIN RMSE (IN)0 | 16 | 1mo ago | |
| Credit | TabChange | Runtime (minutes)0 | 9 | 1d ago | |
| COMPAS | TabChange | Latency (mins)0 | 9 | 1d ago | |
| Adult | TabChange | Latency (mins)0 | 9 | 1d ago | |
| Simple Dataset | Retained Rate99.7 | 9 | 1d ago | ||
| COMPAS | TabChange | VCR100 | 9 | 1d ago | |
| Adult | TabChange | VCR100 | 9 | 1d ago | |
| AI-READI (Class 1) | Llama* | Validity98 | 9 | 3mo ago | |
| AI-READI Class 0 | GPT-4 | Validity0.99 | 9 | 3mo ago | |
| hayes-roth | ExDBSCAN | Validity Score100 | 8 | 4d ago | |
| disclosure noise | ExDBSCAN | Validity Score100 | 8 | 4d ago | |
| diabetes numeric | ExDBSCAN | Validity Score100 | 8 | 4d ago | |
| breast-w | ExDBSCAN | Validity Score100 | 8 | 4d ago | |
| MIMIC-CXR (test) | FCFG | Target AUC18.8 | 8 | 23d ago | |
| rabe 131 | ExDBSCAN | Diversity2.5 | 7 | 4d ago | |
| machine cpu | ExDBSCAN | Diversity1.08 | 7 | 4d ago | |
| disclosure x noise | ExDBSCAN | Diversity0.97 | 7 | 4d ago |