Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoE models

Benchmarks

Task NameDataset NameSOTA ResultTrend
Kernel Throughput EvaluationMoE Models (OLMoE, Qwen3, DSv3, Mixtral) beta=0.5
Latency67
12
Expert Pruning EfficiencyMoE Models
Calibration Time (h)0.22
6
Loss curve fitting across model sizesMoE models (various sizes)
ASMT MAPE0.341
3
Showing 3 of 3 rows