Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tulu

Benchmarks

Task NameDataset NameSOTA ResultTrend
Prefill KV-cache memory measurementTULU 3 (dev)
Active KV-cache Memory (GiB)0.109
32
Stage-aware PrefillTULU-3 (dev)
Total FLOPs (teraFLOPs)13.13
32
Instruction FollowingTulu3 Evaluation Suite pool (test)
ARC92.54
25
Tulu generationTulu
Grammar Accuracy85
12
SFT GeneralizationTulu3 SFT (Original)
General Score79.9
8
SFT GeneralizationTulu3 SFT (Expanded)
SFT Score65.77
8
Membership Inference AttackTulu3 Mix Aya
AUROC68
8
Model FingerprintingTulu 2 DPO 7B
Similarity Score0.9999
7
Helpful assistant taskTulu-2 13B
HV Score1.2562
3
Showing 9 of 9 rows