Share your thoughts, 1 month free Claude Pro on usSee more

Pythia

Benchmarks

Task Name	Dataset Name	SOTA Result
Pruning	Pythia 1B (16L), 410M (24L) (val)	ΔPPL4.4	36
Activation Reconstruction	Pythia model activations	Pearson Correlation Coefficient0.8074	18
Membership Inference Attack	Pythia 6.9B	TPR@1%FPR13.1	16
Language Modeling	Pythia-160M	Delta Perplexity (ΔPPL)14.47	16
Sparse autoencoding	Pythia-70M layer 3	Relative Error0.342	13
Memorization mitigation	Pythia 6.9B	Memory Usage (%)89.31	9
Memorization mitigation	Pythia 2.8B	Memory Usage (%)5.94	9
Model Extrapolation	Pythia ≤6.9B (k=5)	Pooled R^20.847	8
Model Extrapolation	Pythia k=4 (≤2.8B)	Pooled R^20.837	8
Model Extrapolation	Pythia k=3 (1B, 410M, 160M)	Pooled R^20.605	8
Scaling Law Modeling	Pythia quanto 2-bit	R2 Score0.9031	8
Scaling Law Modeling	Pythia bnb 4-bit	R2 Score99.36	8
Scaling Law Modeling	Pythia AWQ 4-bit	R2 Score0.9935	8
Token Extrapolation	Pythia Predict 272.6B–307B	Pooled R20.945	8
Token Extrapolation	Pythia Predict 180.4B–307B	Pooled R20.781	8
Token Extrapolation	Pythia Predict 75.5B–307B	Pooled R280.5	8
Scaling Law Fitting	Pythia Suite	Performance (4-bit)99.53	7
Downstream behavior evaluation	Pythia-160m layer 8 residual-stream activations	Delta CE9.7	6
Sparse Autoencoder Reconstruction	Pythia-1B activations	Delta R^20.017	6
Untargeted Poisoning Attack	Pythia-12B	Benign F1 Score54.82	5
Sparse Autoencoding	Pythia-70M Layer 3 activations (held-out)	Relative Error0.342	4
Sparse Autoencoding	Pythia activations (The Pile) 160M (layer 12)	MSE0.148	4
Frobenius reconstruction	Pythia attention output projections 1.4B	Dimensionless Reconstruction Error Fraction0.269	4
Frobenius reconstruction	Pythia attention output projections 70M	Reconstruction Error Fraction16.8	4
Sparse Probing	Pythia-1.4b	Avg. F176.2	4

Showing 25 of 33 rows