| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HellaSwag | Falcon-180B | Accuracy87.5 | 364 | 22h ago | |
| COPA | T5(3B) + PE w/ ROE (ORC.) | Accuracy92.88 | 48 | 3mo ago | |
| HellaSwag (test) | Accuracy82.1 | 19 | 1mo ago | ||
| IndoCulture native prompts (test) | Qwen2.5-7B-IT | Avg Sentence Similarity42 | 18 | 3mo ago | |
| ArabCulture native prompts (test) | Qwen2.5-7B-IT | Avg Sentence Similarity44.7 | 18 | 3mo ago | |
| ArabCulture | Qwen2.5-7B-IT (+CCKG N-Asrt) | Sentence Similarity Score42.5 | 18 | 3mo ago | |
| IndoCulture | Qwen2.5-7B-IT (+CCKG N-Path) | Sentence Similarity Score43 | 18 | 3mo ago | |
| HellaSwag v1 (test) | Mistral-7B | Normalized Accuracy81 | 16 | 1mo ago | |
| Reuters (test) | ARC-II | P@149.62 | 8 | 3mo ago | |
| KORANI Sentence Completion | mT-En-CI | Kobest Copa80.8 | 5 | 3mo ago | |
| P3 | COPA Accuracy85.3 | 5 | 3mo ago | ||
| HellaSwag 0-shot | Accuracy79.3 | 4 | 3mo ago | ||
| HellaSwag DE 10-shot (test) | T-Free | Normalized Log Accuracy68.7 | 3 | 2mo ago | |
| HellaSwag DE | HATified | Normalized Log Accuracy59.6 | 2 | 2mo ago | |
| Hellaswag | Flexora | Time (h)4.71 | 2 | 3mo ago |