Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pythia

Benchmarks

Task NameDataset NameSOTA ResultTrend
PruningPythia 1B (16L), 410M (24L) (val)
ΔPPL4.4
36
Activation ReconstructionPythia model activations
Pearson Correlation Coefficient0.8074
18
Language ModelingPythia-160M
Delta Perplexity (ΔPPL)14.47
16
Memorization mitigationPythia 6.9B
Memory Usage (%)89.31
9
Memorization mitigationPythia 2.8B
Memory Usage (%)5.94
9
Model ExtrapolationPythia ≤6.9B (k=5)
Pooled R^20.847
8
Model ExtrapolationPythia k=4 (≤2.8B)
Pooled R^20.837
8
Model ExtrapolationPythia k=3 (1B, 410M, 160M)
Pooled R^20.605
8
Scaling Law ModelingPythia quanto 2-bit
R2 Score0.9031
8
Scaling Law ModelingPythia bnb 4-bit
R2 Score99.36
8
Scaling Law ModelingPythia AWQ 4-bit
R2 Score0.9935
8
Token ExtrapolationPythia Predict 272.6B–307B
Pooled R20.945
8
Token ExtrapolationPythia Predict 180.4B–307B
Pooled R20.781
8
Token ExtrapolationPythia Predict 75.5B–307B
Pooled R280.5
8
Scaling Law FittingPythia Suite
Performance (4-bit)99.53
7
Sparse Autoencoder ReconstructionPythia-1B activations
Delta R^20.017
6
Untargeted Poisoning AttackPythia-12B
Benign F1 Score54.82
5
Sparse AutoencodingPythia activations (The Pile) 160M (layer 12)
MSE0.148
4
Frobenius reconstructionPythia attention output projections 1.4B
Dimensionless Reconstruction Error Fraction0.269
4
Frobenius reconstructionPythia attention output projections 70M
Reconstruction Error Fraction16.8
4
Sparse ProbingPythia-1.4b
Avg. F176.2
4
Activation ReconstructionPythia 1.4b
MSE0.22
4
Sparse ProbingPythia 410m
Average F1 Score77.5
4
Activation ReconstructionPythia 410m
MSE0.03
4
Sparse AutoencodingPythia 70M (layer-3 residuals)
Achieved L033.9
4
Showing 25 of 28 rows