| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Efficiency | Mercury | Beyond@183.6 | 29 | |
| Out-of-distribution AI-generated text detection | Mercury Out-of-distribution (OOD) 2 (test unseen domains) | Legal Accuracy95.7 | 16 | |
| Code Generation | Mercury (test) | Pass Rate (Easy)90.3 | 8 | |
| Black-box Vulnerability Detection | Mercury | Vulnerabilities Found Count22 | 4 |