Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CRUXEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code Output PredictionCRUXEval-O
Pass@149.8
47
Code Input PredictionCRUXEval-I
Pass@150
47
Code ReasoningCRUXEval
Input-CoT Accuracy73.8
27
Code ReasoningCRUXEval
Accuracy68.6
21
Code ReasoningCruxEval Output
Score51
12
Code ReasoningCRUXEval-O
Accuracy83.5
12
CodingCRUXEval-O
Score87.5
10
CodingCRUXEval
Pass@155.9
6
Code ReasoningCRUXEval I
Accuracy74
4
CodeCruxEval o
Exact Match35.3
4
CodeCruxEval-i
Exact Match36.2
4
Code Reasoning (Output Prediction)CRUXEval-O 1-shot
Accuracy84.01
3
Code Reasoning (Input Prediction)CRUXEval-I 1-shot
Accuracy79.75
3
Showing 13 of 13 rows