| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Chest X Pneumothorax (test) | ViT-L/16 | Relevance Rank Accuracy (FeatPerm)2.2 | 11 | 22d ago | |
| Oxford-IIIT Pet (test) | ResNet-50 | Rank Acc (FeatPerm)37.2 | 11 | 22d ago | |
| COVID-Qu-Ex (test) | DenseNet-169 | RRA (FeatPerm)37.7 | 11 | 22d ago | |
| Deepfake Detection Dataset DDIM, PixArt, SD, SiT, StyleGAN | PRPO | CAC4.42 | 9 | 23d ago | |
| LIAR RAW | Meaningfulness Score2.29 | 7 | 1mo ago | ||
| RAW-FC | M Score2.07 | 7 | 1mo ago | ||
| LIAR-RAW (test) | ChatGPT Meaningfulness Score1.53 | 7 | 1mo ago | ||
| Synthetic (test) | Qwen3-VL-8b-SVR-FT | Helpfulness87.6 | 6 | 3mo ago | |
| In-house Dataset | Qwen3-VL-8b-SVR-FT | Helpfulness80.8 | 6 | 3mo ago | |
| DFD 100 randomly selected samples (test) | VRAG-DFD | GPT-4o Score7.55 | 3 | 1mo ago | |
| MMLU-CK (test) | PubMed Reasoner | Reasoning Soundness Loss (%)44 | 2 | 2mo ago | |
| PubMedQA (test) | PubMed Reasoner | Reasoning Soundness Loss39.7 | 2 | 2mo ago |