LLM Training

Benchmarks

Dataset Name	SOTA Method	Metric
Llama 70B Emulation 3.3	Kareus	Time Reduction9.3	8	3mo ago
Qwen2.5-7B 256K context (train)	OOMB (Sparse Attn)	Throughput (tokens/sec)1,301.6	7	3mo ago
Qwen2.5-7B 128K context (train)	OOMB (Sparse Attn)	Throughput (tokens/sec)1,394.16	5	3mo ago
Qwen2.5-7B 64K context (train)	OOMB (Sparse Attn)	Throughput (tokens/sec)1,560.38	5	3mo ago
Llama2-70B (64 x H100-8)	Megatron-LM	Iteration Time (s)7.8	4	3mo ago
Llama2 7B		Iteration Time (s)1.4	4	3mo ago
Llama2-7B tpu-v5p-512	AXLearn	Iteration Time (s)2.5	3	3mo ago
Qwen-3 30B-A3B (64 x B200-8)	Megatron-LM	Iteration Time (s)4.1	2	3mo ago
Qwen-3 30B-A3B (tpu-v5p-1024)	AXLearn	Iteration Time (s)12.86	2	3mo ago
Llama2 70B (tpu-v5p-1024)	AXLearn	Iteration Time (s)11.6	2	3mo ago
Llama2 70B 64 x Trainium2-16	AXLearn	Iteration Time (s)11.2	1	3mo ago
Llama2 7B 64 x Trainium2-16	AXLearn	Iteration Time (s)1.2	1	3mo ago

Showing 12 of 12 rows