Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code execution on Tutorial (test)
Loading...
79.51
Output Accuracy
CEL-S2
5.0252
24.3626
43.7
63.0374
May 8, 2023
Output Accuracy
Trace Accuracy
Line Precision
Line Recall
Line F1 Score
Identifier Precision
Identifier Recall
Identifier F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Output Accuracy
Trace Accuracy
Line Precision
Line Recall
Line F1 Score
Identifier Precision
Identifier Recall
Identifier F1 Score
CEL-S2
training_stage=Stage 2...
2023.05
79.51
85.59
95.94
84.24
89.71
97.29
87.3
92.02
CodeExecutor
curriculum_learning=true
2023.05
76.42
80.09
94.49
76.74
84.7
95.91
69.15
80.36
Codex
shots=3
2023.05
13.07
-
-
-
-
-
-
-
CEL-S3
training_stage=Stage 3...
2023.05
7.89
8.35
26.58
21.33
23.67
26.36
19.47
22.4
Feedback
Search any
task
Search any
task