| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ARC Challenge | BERT-Judge | Accuracy99.4 | 24 | 6d ago | |
| GPQA | BERT-Judge | Accuracy93.5 | 24 | 6d ago | |
| TinyMMLU | PA-GRPO | Accuracy86.8 | 21 | 25d ago | |
| PathMMU PathCLS n = 177 (test-tiny) | Expert performance | Accuracy78.9 | 13 | 20d ago | |
| PathQABench MCQ | PathChat+ | Accuracy94.3 | 12 | 20d ago | |
| PathMMU PathCLS (test-all) | PathChat+ | Accuracy74.8 | 12 | 20d ago | |
| AgMMU | GPT-o4-mini | Accuracy (Disease)77.9 | 11 | 1mo ago | |
| Lu Xun's essay collections | CharacterBot | Accuracy88 | 7 | 1mo ago |