Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Olmo

Benchmarks

Task NameDataset NameSOTA ResultTrend
Membership Inference AttackOLMo near-IID Dolma 3 (test)
AUC0.723
13
Downstream Policy EvaluationOLMo3 Adapt
GSM8K96
10
Membership Inference AttackOLMO initial checkpoint
AUC0.54
8
Training Data AttributionOlmo-7B
Tail-patch (%)98.6
5
Pretraining Data Mixture EstimationOLMo Pretraining Mixture (Temporal held-out)
Web Source Estimate95.5
3
Language Model Training PerformanceOLMo-1B 24k sequence length
Training Time (ms)6,161
2
General Language EvaluationOLMo-2 Held-out Evals
AGIEval Score24.4
2
Question AnsweringOLMo Benchmarks 2 (dev)
NQ Score16.1
2
Language ModelingOLMo (val)
Base CE2.24
1
Showing 9 of 9 rows