Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

APPS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationAPPS
Pass@191.2
69
Code GenerationAPPS (test)
Introductory Score56.3
36
Code GenerationAPPS Intermediate
Pass Rate81.95
32
Code Safety EvaluationAPPS 1.0 (test)
Safety Score0.988
30
Code Correctness EvaluationAPPS
F166.7
25
Code GenerationAPPS Introductory
PR85.18
21
Code GenerationAPPS Competition
pass@138
20
Code GenerationAPPS Overall
PR21.38
18
Code GenerationAPPS
Precision Rate60.33
12
Program SynthesisAPPS 1.0 (test)
Pass@5 (Introductory)25.61
11
Code GenerationAPPS
Tau5.65
10
Code GenerationAPPS Interview
Pass@12.64
9
Code metric regressionAPPS Leetcode (test)
RMSE0.474
6
Coding ReasoningApps
Pass Rate68.3
5
Program SynthesisAPPS
Pass@5 (Introductory)25.61
5
Code GenerationAPPS
Avg@833.7
4
Code Generation OversightAPPS
Safety Score63
4
Program RepairAPPS (test)
Strict Accuracy21.7
4
Program DiscriminationAPPS (test)
Accuracy42.9
4
Code GenerationAPPS stdin-style Plus
Syntax Validity83.4
3
Showing 20 of 20 rows