| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SQuAD | QZO | F1 Score88.3 | 44 | 4d ago | |
| DROP | SubZero-GV (Prefix) | F1 Score32.9 | 29 | 4d ago | |
| Big-Bench Hard (test) | FLAN-PaLM 540B | Exact Match57.9 | 17 | 4d ago | |
| VPTT-Bench 1.0 (test) | Comb. (BRAG + VPRAG) | VPTT Score (Novelty Adjusted)0.644 | 15 | 4d ago | |
| MLLMU-Bench (Forget Set) | Rouge Score64.5 | 7 | 4d ago | ||
| SAGEO Arena (test) | StageAware | Cite Score0.58 | 6 | 4d ago | |
| Urdu Generation | Alif-1.0-8B-Inst. | Urdu Generation Score90.2 | 5 | 4d ago | |
| RIR | GeAR | ROUGE-187.6 | 3 | 4d ago | |
| WizardLM (test) | ATKD | LLM-as-a-Judge Score48.37 | 2 | 4d ago | |
| SelfInst (test) | ATKD | LLM-as-a-Judge Score60.16 | 2 | 4d ago | |
| VicunaEval (test) | ATKD | LLM Judge Score56.07 | 2 | 4d ago | |
| DollyEval (test) | ATKD | LLM-as-a-Judge Score62.02 | 2 | 4d ago |