| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Response correctness and completeness evaluation | Coding | F1 Score85 | 32 | |
| Prompt Injection Detection | Coding Direct Prompt Injection | FPR0 | 7 | |
| Code Generation | Coding Gender (test) | Cor (%)40 | 5 | |
| Code Generation | Coding Race (test) | Correctness Rate57 | 5 |