| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Multiple-Choice Suite | MC Avg72.1 | 49 | 2mo ago | ||
| Multiple-choice QA Benchmarks (PIQA, OpenBookQA, HellaSwag, ARC) | NexusFormer | PIQA Accuracy68.39 | 16 | 1mo ago | |
| PubMedQA (test) | AUROC81.8 | 9 | 2mo ago | ||
| Kazakh socio-cultural MC QA (test) | qwen-1.5b | Accuracy37.1 | 8 | 2mo ago | |
| CLOTH | Accuracy84.8 | 8 | 3mo ago | ||
| RQA-MC | Decoding_r | Accuracy81 | 6 | 2mo ago |