| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Pruning | Pythia 1B (16L), 410M (24L) (val) | ΔPPL4.4 | 36 | |
| Activation Reconstruction | Pythia model activations | Pearson Correlation Coefficient0.8074 | 18 | |
| Language Modeling | Pythia-160M | Delta Perplexity (ΔPPL)14.47 | 16 | |
| Memorization mitigation | Pythia 6.9B | Memory Usage (%)89.31 | 9 | |
| Memorization mitigation | Pythia 2.8B | Memory Usage (%)5.94 | 9 | |
| Model Extrapolation | Pythia ≤6.9B (k=5) | Pooled R^20.847 | 8 | |
| Model Extrapolation | Pythia k=4 (≤2.8B) | Pooled R^20.837 | 8 | |
| Model Extrapolation | Pythia k=3 (1B, 410M, 160M) | Pooled R^20.605 | 8 | |
| Scaling Law Modeling | Pythia quanto 2-bit | R2 Score0.9031 | 8 | |
| Scaling Law Modeling | Pythia bnb 4-bit | R2 Score99.36 | 8 | |
| Scaling Law Modeling | Pythia AWQ 4-bit | R2 Score0.9935 | 8 | |
| Token Extrapolation | Pythia Predict 272.6B–307B | Pooled R20.945 | 8 | |
| Token Extrapolation | Pythia Predict 180.4B–307B | Pooled R20.781 | 8 | |
| Token Extrapolation | Pythia Predict 75.5B–307B | Pooled R280.5 | 8 | |
| Scaling Law Fitting | Pythia Suite | Performance (4-bit)99.53 | 7 | |
| Sparse Autoencoder Reconstruction | Pythia-1B activations | Delta R^20.017 | 6 | |
| Untargeted Poisoning Attack | Pythia-12B | Benign F1 Score54.82 | 5 | |
| Sparse Autoencoding | Pythia activations (The Pile) 160M (layer 12) | MSE0.148 | 4 | |
| Frobenius reconstruction | Pythia attention output projections 1.4B | Dimensionless Reconstruction Error Fraction0.269 | 4 | |
| Frobenius reconstruction | Pythia attention output projections 70M | Reconstruction Error Fraction16.8 | 4 | |
| Sparse Probing | Pythia-1.4b | Avg. F176.2 | 4 | |
| Activation Reconstruction | Pythia 1.4b | MSE0.22 | 4 | |
| Sparse Probing | Pythia 410m | Average F1 Score77.5 | 4 | |
| Activation Reconstruction | Pythia 410m | MSE0.03 | 4 | |
| Sparse Autoencoding | Pythia 70M (layer-3 residuals) | Achieved L033.9 | 4 |