Share your thoughts, 1 month free Claude Pro on usSee more

Training Efficiency on Llama3-8x70B Coarse-grained

41.6MFU

MCore w/ Folding

Updated 4mo ago

Evaluation Results

Method	Links
MCore w/ Folding 2025.04		41.6
MCore 2025.04		38.8
FSDP + EP 2025.04		19.6