| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Llama 70B Emulation 3.3 | Kareus | Time Reduction9.3 | 8 | 3d ago | |
| Qwen2.5-7B 256K context (train) | OOMB (Sparse Attn) | Throughput (tokens/sec)1,301.6 | 7 | 3d ago | |
| Qwen2.5-7B 128K context (train) | OOMB (Sparse Attn) | Throughput (tokens/sec)1,394.16 | 5 | 3d ago | |
| Qwen2.5-7B 64K context (train) | OOMB (Sparse Attn) | Throughput (tokens/sec)1,560.38 | 5 | 3d ago | |
| Llama2-70B (64 x H100-8) | Megatron-LM | Iteration Time (s)7.8 | 4 | 3d ago | |
| Llama2 7B | Iteration Time (s)1.4 | 4 | 3d ago | ||
| Llama2-7B tpu-v5p-512 | AXLearn | Iteration Time (s)2.5 | 3 | 3d ago | |
| Qwen-3 30B-A3B (64 x B200-8) | Megatron-LM | Iteration Time (s)4.1 | 2 | 3d ago | |
| Qwen-3 30B-A3B (tpu-v5p-1024) | AXLearn | Iteration Time (s)12.86 | 2 | 3d ago | |
| Llama2 70B (tpu-v5p-1024) | AXLearn | Iteration Time (s)11.6 | 2 | 3d ago | |
| Llama2 70B 64 x Trainium2-16 | AXLearn | Iteration Time (s)11.2 | 1 | 3d ago | |
| Llama2 7B 64 x Trainium2-16 | AXLearn | Iteration Time (s)1.2 | 1 | 3d ago |