Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EvalPlus

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationEvalPlus
Pass@189
61
Code generationEvalPlus (test)
Eval+86.7
23
Showing 2 of 2 rows