| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Prefill KV-cache memory measurement | TULU 3 (dev) | Active KV-cache Memory (GiB)0.109 | 32 | |
| Stage-aware Prefill | TULU-3 (dev) | Total FLOPs (teraFLOPs)13.13 | 32 | |
| Instruction Following | Tulu3 Evaluation Suite pool (test) | ARC92.54 | 25 | |
| Tulu generation | Tulu | Grammar Accuracy85 | 12 | |
| SFT Generalization | Tulu3 SFT (Original) | General Score79.9 | 8 | |
| SFT Generalization | Tulu3 SFT (Expanded) | SFT Score65.77 | 8 | |
| Membership Inference Attack | Tulu3 Mix Aya | AUROC68 | 8 | |
| Model Fingerprinting | Tulu 2 DPO 7B | Similarity Score0.9999 | 7 | |
| Helpful assistant task | Tulu-2 13B | HV Score1.2562 | 3 |