Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LiveCodeBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code ReasoningLiveCodeBench
Accuracy87.4
90
Code GenerationLiveCodeBench
Pass@190.7
89
Code GenerationLiveCodeBench
Pass@195.8
86
Code GenerationLiveCodeBench
Accuracy88.6
84
Code GenerationLiveCodeBench
Pass@188.1
76
Code GenerationLiveCodeBench v6
Accuracy100
75
Code GenerationLiveCodeBench
Average Score168
68
Speculative DecodingLiveCodeBench
Speedup Factor7.16
66
Code GenerationLiveCodeBench
Accuracy73.2
64
Predicting code correctnessLiveCodeBench Python
ECE0.015
60
Code correctness predictionLiveCodeBench Python
AUROC86.7
60
Code Correctness PredictionLiveCodeBench Python
Brier Score0.067
60
Code GenerationLiveCodeBench
Pass@11,784
51
ProgrammingLiveCodeBench V3 V4 (test)
Accuracy61.4
42
Code GenerationLiveCodeBench (test)
Pass@1 Overall53.6
42
Code GenerationLiveCodeBench v6
Score91.7
41
CodingLiveCodeBench
Accuracy70
38
CodeLiveCodeBench V5-6
Accuracy50.8
33
CodeLiveCodeBench V1-4
Accuracy47.1
33
Competitive ProgrammingLiveCodeBench Pro 25Q2
Easy Score94.8
33
Competitive ProgrammingLiveCodeBench Pro 25Q1
Easy Score96.6
33
Code VerificationLiveCodeBench
Pass@139.31
32
Code GenerationLiveCodeBench v6 (2025-02 to 2025-05)
Accuracy74.1
31
CodingLiveCodeBench v6
Score (%)75.1
31
Code GenerationLiveCodeBench v5
Pass@161.5
30
Showing 25 of 223 rows
...