| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| C-RASP Synthesis Benchmark Suite Regular, Counting, and Context-free Languages | C-RASP | Synthesis Result Score27 | 34 | 1mo ago | |
| HumanEval | HILA | Accuracy72.15 | 32 | 1mo ago | |
| VizDoom (test) | EVAPS | Exact Match62.22 | 18 | 1mo ago | |
| HumanEval (test) | SC-MAS | Accuracy92.37 | 16 | 1mo ago | |
| PSB 1 | HOTGP | Compare String Length100 | 15 | 1mo ago | |
| MBPP | ASR88.4 | 12 | 3d ago | ||
| APPS 1.0 (test) | CodeRL+CodeT5 | Pass@5 (Introductory)25.61 | 11 | 1mo ago | |
| PSB1 1 (val) | DSLS | Last Index of Zero62 | 10 | 1mo ago | |
| SPoC (TestW) | DrRepair w/ pseudocode | Success Rate57 | 10 | 1mo ago | |
| SPoC (TestP) | DrRepair w/ pseudocode | Success Rate0.385 | 10 | 1mo ago | |
| MBPP | WavefrontDiffusion | Accuracy59.03 | 9 | 1mo ago | |
| C dataset (test) | LaSynth | Accuracy55.2 | 7 | 1mo ago | |
| CodeContests (test) | GRPO-RLVR | Pass@10.2045 | 6 | 1mo ago | |
| PSB1 (train) | HOTGP | Compare String Lengths100 | 5 | 1mo ago | |
| APPS | CodeRL+CodeT5 | Pass@5 (Introductory)25.61 | 5 | 1mo ago | |
| MBPP+ | AutoAdapt | Pass Rate68 | 4 | 1mo ago | |
| PSB1 | Checksum Correctness89 | 4 | 1mo ago | ||
| openai_humaneval (test) | BLOOM-176B | Pass@115.52 | 4 | 1mo ago | |
| Karel (test) | Generalization Accuracy86.04 | 4 | 1mo ago | ||
| PSB2 | Origami AC/DC | Basement40 | 3 | 1mo ago | |
| HumanEval Standard Relaxed (test) | QualityFlow | pass@10.988 | 3 | 1mo ago | |
| PolyPSB | Origami AC/DC | Area of Rectangle100 | 2 | 1mo ago | |
| PSB2 (test) | Basement95 | 2 | 1mo ago | ||
| HumanEval-EvalPlus Standard (test) | QualityFlow | pass@189.6 | 2 | 1mo ago | |
| MBPP-EvalPlus Standard (test) | QualityFlow | Pass@179.9 | 2 | 1mo ago |