| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Large Model Performance Prediction dataset 1.0 (40% masking) | STAR | RMSE6.13 | 10 | 4d ago | |
| Large Model Performance Prediction 60% masking | STAR | RMSE6.77 | 10 | 4d ago | |
| OpenCompass 95% masking September 30, 2024 cutoff (temporal split) | STAR | RMSE8.75 | 10 | 4d ago | |
| Benchmark-side Pattern Shift Math | Average Score46.59 | 6 | 4d ago | ||
| 285 models on one Math benchmark | Top-10 Recall100 | 5 | 4d ago | ||
| Benchmark Chinese pattern shift | STAR | RMSE16.94 | 3 | 4d ago | |
| Benchmark OCR pattern shift | STAR | RMSE25.18 | 3 | 4d ago | |
| Frontier Top-20 pattern shift | STAR | RMSE9.71 | 3 | 4d ago | |
| Paradigm RLHF pattern shift | STAR | RMSE9.55 | 3 | 4d ago | |
| Architecture pattern shift MoE | STAR | RMSE10.68 | 3 | 4d ago | |
| Benchmark-side Pattern Shift Chinese | PMF | Average Score42.16 | 3 | 4d ago | |
| Benchmark-side Pattern Shift OCR | PMF | Score Avg47.6 | 3 | 4d ago | |
| Model-side Pattern Shift Frontier | PMF | Score Avg13.22 | 3 | 4d ago | |
| Model-side Pattern Shift Paradigm | PMF | Score Avg11.74 | 3 | 4d ago | |
| Model-side Pattern Shift Architecture | Avg Score10.64 | 3 | 4d ago |