Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Inference Efficiency on MoE LLMs DSV2-16B, QW3-30B, QW3-80B-I
Loading...
12.46
Decode Speed (tokens/sec)
BITSMOE
1.2176
4.1363
7.055
9.9737
May 22, 2026
Decode Speed (tokens/sec)
TTFT (sec)
Speedup Ratio
Total GPU Memory (GB)
Attention GPU Memory (GB)
MoE GPU Memory (GB)
Memory Saving Factor
Updated 1d ago
Evaluation Results
Method
Method
Links
Decode Speed (tokens/sec)
TTFT (sec)
Speedup Ratio
Total GPU Memory (GB)
Attention GPU Memory (GB)
MoE GPU Memory (GB)
Memory Saving Factor
BITSMOE
Model=DSV2-16B
2026.05
12.46
0.64
-
-
-
5.08
5.44
FP16
Model=DSV2-16B
2026.05
10.39
0.47
-
29.51
0.69
27.65
-
GPTQ
Model=DSV2-16B
2026.05
7.43
1.27
-
-
-
-
-
BITSMOE
Model=QW3-30B
2026.05
5.71
1.51
-
-
-
8.58
6.29
BITSMOE
Model=QW3-80B-I
2026.05
5.01
1.06
-
-
-
21.98
6.56
GPTQ
Model=QW3-30B
2026.05
3.25
2.94
-
-
-
-
-
FP16
Model=QW3-30B
2026.05
3.07
2.35
-
56.95
1.69
54
-
GPTQ
Model=QW3-80B-I
2026.05
2.59
7.32
-
-
-
-
-
FP16
Model=QW3-80B-I
2026.05
1.65
8.35
-
148.69
0.61
144.28
-
Feedback
Search any
task
Search any
task