Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spec-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Speculative DecodingSpec-Bench
MT Score195.6
57
Inference AccelerationSpec-Bench
Speedup5.31
53
Text GenerationSpec-Bench Overall
SD Score2.33
21
TranslationSpec-Bench Trans.
CR6.41
21
SummarizationSpec-Bench Sum.
CR Score4.73
21
Retrieval-Augmented GenerationSpec-Bench RAG
CR5.48
21
Question AnsweringSpec-Bench QA
CR4.54
21
Multi-turn DialogueSpec-Bench Multi.
CR3.22
21
Mathematical ReasoningSpec-Bench Math
CR4.09
21
Language Model DecodingSpec-Bench
Conv. Acc267.6
11
Speculative Decoding ThroughputSpec-Bench
Throughput (Conv.)519.7
10
Speculative DecodingSpec-Bench OLMo 2 7B
Conversation Score5.12
5
Speculative DecodingSpec-Bench Llama2-7B v1.0 (test)
MT Score2.73
4
Speculative Decoding ThroughputSpec-Bench (test)
Throughput (Conv.)-
0
Showing 14 of 14 rows