Share your thoughts, 1 month free Claude Pro on usSee more

Training Efficiency on Mixtral-8x22b-G8T8 Fine-grained

28.8MFU

MCore w/ Folding

Updated 4mo ago

Evaluation Results

Method	Links
MCore w/ Folding 2025.04		28.8
MCore 2025.04		17.1
FSDP + EP 2025.04		9
TP+EP+DP 2025.04		8.7
FSDP 2025.04		2.2