Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Constrained LLM Decoding on Llama-2-70B
Loading...
27.2
Latency (ms)
Pre³
24.8512
40.7056
56.56
72.4144
Jun 4, 2025
Latency (ms)
Reduction
Updated 4d ago
Evaluation Results
Method
Method
Links
Latency (ms)
Reduction
Pre³
Batch Size=16, Hardwar...
2025.06
27.2
5.39
XGrammar
Batch Size=16, Hardwar...
2025.06
28.75
-
Pre³
Batch Size=64, Hardwar...
2025.06
48.48
14.85
Pre³
Batch Size=32, Hardwar...
2025.06
54.24
1.6
XGrammar
Batch Size=32, Hardwar...
2025.06
55.12
-
Pre³
Batch Size=128, Hardwa...
2025.06
55.39
19.48
XGrammar
Batch Size=64, Hardwar...
2025.06
56.94
-
XGrammar
Batch Size=128, Hardwa...
2025.06
68.79
-
Pre³
Batch Size=256, Hardwa...
2025.06
75.72
11.87
XGrammar
Batch Size=256, Hardwa...
2025.06
85.92
-
Feedback
Search any
task
Search any
task