Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Inference Efficiency on Qwen2.5-7B
Loading...
1,480.2
Throughput (tokens/s)
AWQ
-59
340.6
740.2
1,139.8
Feb 27, 2026
Feb 28, 2026
Mar 2, 2026
Mar 4, 2026
Mar 5, 2026
Mar 7, 2026
Mar 9, 2026
Throughput (tokens/s)
Latency (ms/token)
TTFT (ms)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Throughput (tokens/s)
Latency (ms/token)
TTFT (ms)
AWQ
Quantization=4-bit
2026.02
1,480.2
0.675
-
DACQ Hybrid
Quantization=4-bit
2026.02
1,093.1
0.915
-
DACQ Logistic
Quantization=4-bit
2026.02
1,035.4
0.966
-
Unquantized
Quantization=Unquantized
2026.02
929.4
1.0759
-
FineRMoE
#Param (B)=26.65, #A-P...
2026.03
27.3
-
178.3
DU
#Param (B)=47.54, #A-P...
2026.03
25.6
-
84.8
S16A4
#Param (B)=7.62, #A-Pa...
2026.03
24
-
78.5
NVShard
#Param (B)=47.55, #A-P...
2026.03
18.9
-
137.8
C32A2
#Param (B)=184.42, #A-...
2026.03
0.2
-
50,245.9
Feedback
Search any
task
Search any
task