| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Humanity's Last Exam 2,158 text-only | Avg@3 Score54.2 | 15 | 2mo ago | ||
| GPQA Diamond | openPangu-Embedded KD | Pass@1 Score50.51 | 14 | 13d ago | |
| HLE (Humanity's Last Exam) text-only subset (val) | ReThinker | Inference Accuracy52.2 | 13 | 3mo ago | |
| XBench-DeepSearch 1.0 (test) | ReThinker | Inference Accuracy0.9 | 12 | 3mo ago | |
| GAIA text-only (val) | ReThinker | Inference Accuracy81.6 | 12 | 3mo ago |