Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Differential Transformer (Diff) implementation on New-feature tasks
Loading...
38.2
Session Duration
PithTrain
37.744
40.822
43.9
46.978
May 29, 2026
Session Duration
Active GPU Time
Agent Turns
Context Size (K)
Output Tokens (K)
Updated 2d ago
Evaluation Results
Method
Method
Links
Session Duration
Active GPU Time
Agent Turns
Context Size (K)
Output Tokens (K)
PithTrain
Framework=PithTrain
2026.05
38.2
27.6
47
69,500
25,400
Megatron-LM
Framework=Megatron-LM
2026.05
47.1
33.7
125
118,700
57,100
TorchTitan
Framework=TorchTitan
2026.05
49.6
40.3
58
103,200
36,000
Feedback
Search any
task
Search any
task