Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Constrained LLM Decoding on DeepSeek-V2-Lite-Chat 15.7B
Loading...
49.91
Inference Time (ms)
Pre³
47.048
66.3665
85.685
105.0035
Jun 4, 2025
Inference Time (ms)
Reduction
Updated 4d ago
Evaluation Results
Method
Method
Links
Inference Time (ms)
Reduction
Pre³
Batch Size=16, Hardwar...
2025.06
49.91
3.57
XGrammar
Batch Size=16, Hardwar...
2025.06
51.76
-
Pre³
Batch Size=32, Hardwar...
2025.06
53.71
9.65
Pre³
Batch Size=64, Hardwar...
2025.06
54.41
30.01
XGrammar
Batch Size=32, Hardwar...
2025.06
59.45
-
Pre³
Batch Size=128, Hardwa...
2025.06
61.63
40.78
Pre³
Batch Size=256, Hardwa...
2025.06
75.47
37.86
XGrammar
Batch Size=64, Hardwar...
2025.06
77.74
-
XGrammar
Batch Size=128, Hardwa...
2025.06
104.06
-
XGrammar
Batch Size=256, Hardwa...
2025.06
121.46
-
Feedback
Search any
task
Search any
task