| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | APPS | Pass@191.2 | 69 | |
| Code Generation | APPS (test) | Introductory Score56.3 | 36 | |
| Code Generation | APPS Intermediate | Pass Rate81.95 | 32 | |
| Code Safety Evaluation | APPS 1.0 (test) | Safety Score0.988 | 30 | |
| Code Generation | APPS Introductory | PR85.18 | 21 | |
| Code Generation | APPS Competition | Accuracy69.66 | 20 | |
| Code Generation | APPS Overall | PR21.38 | 18 | |
| Code Generation | APPS | Precision Rate60.33 | 12 | |
| Program Synthesis | APPS 1.0 (test) | Pass@5 (Introductory)25.61 | 11 | |
| Code metric regression | APPS Leetcode (test) | RMSE0.474 | 6 | |
| Coding Reasoning | Apps | Pass Rate68.3 | 5 | |
| Program Synthesis | APPS | Pass@5 (Introductory)25.61 | 5 | |
| Code Generation | APPS Interview | Pass@12.64 | 5 | |
| Code Generation | APPS | Avg@833.7 | 4 | |
| Code Generation Oversight | APPS | Safety Score63 | 4 | |
| Program Repair | APPS (test) | Strict Accuracy21.7 | 4 | |
| Program Discrimination | APPS (test) | Accuracy42.9 | 4 | |
| Code Generation | APPS stdin-style Plus | Syntax Validity83.4 | 3 |