Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Training Efficiency on Mixtral-8x22B Coarse-grained
Loading...
49.3
MFU
MCore w/ Folding
2.5
14.65
26.8
38.95
Apr 21, 2025
MFU
Updated 1mo ago
Evaluation Results
Method
Method
Links
MFU
MCore w/ Folding
GPUs=128, Global batch...
2025.04
49.3
MCore
GPUs=128, Global batch...
2025.04
46.3
TP+EP+DP
GPUs=128, Global batch...
2025.04
36.6
FSDP + EP
GPUs=128, Global batch...
2025.04
23.4
FSDP
GPUs=128, Global batch...
2025.04
4.3
Feedback
Search any
task
Search any
task