Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

APPS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationAPPS
Pass@191.2
111
Code Correctness EvaluationAPPS
Accuracy80
53
Code GenerationAPPS (test)
Introductory Score56.3
36
Code GenerationAPPS Intermediate
Pass Rate81.95
32
Code Safety EvaluationAPPS 1.0 (test)
Safety Score0.988
30
Code GenerationAPPS
Accuracy45
29
Code GenerationAPPS Introductory
pass@192.1
25
Code GenerationAPPS Competition
pass@138
20
Code GenerationAPPS Overall
PR21.38
18
Meta-reasoning quality assessmentAPPS
Thoroughness85.6
12
Code GenerationAPPS
Pass@483.2
12
Program SynthesisAPPS 1.0 (test)
Pass@5 (Introductory)25.61
11
Code GenerationAPPS
Tau5.65
10
Code GenerationAPPS Interview
Pass@12.64
9
Code GenerationAPPS interview-level (test)
Mean Score0.5717
8
Watermark message recoveryAPPS-G
Message Accuracy100
8
Code Peak-Memory PredictionAPPS
Correlation (rho)0.96
7
Competitive ProgrammingAPPS (val)
Pass@172.72
6
MonitoringAPPS (test)
pAUC81.6
6
Code metric regressionAPPS Leetcode (test)
RMSE0.474
6
Code GenerationAPPS+
Pass@1 (Introductory)1.94
5
Code GenerationAPPS+ Competition
Pass@12.67
5
Coding ReasoningApps
Pass Rate68.3
5
Program SynthesisAPPS
Pass@5 (Introductory)25.61
5
Dafny Code SynthesisAPPS Vericoding-derived (test)
Pass Rate83
4
Showing 25 of 35 rows