| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | Code | Accuracy98.7 | 242 | |
| AI-generated text detection | Code | AUC0.979 | 24 | |
| Machine-generated text detection | Code Llama-3-70B-Instruct (test) | AUC0.951 | 12 | |
| AI-generated text detection | Code GPT-3.5 Turbo | AUC0.906 | 12 | |
| Code Generation and Understanding | Code Crux, MultiPL_E, MBPP | Crux Score60.12 | 12 | |
| Code Generation | Code latest (test) | HumanEval83.9 | 12 | |
| Multi-turn conversation performance | Code | Avg Performance98.3 | 9 | |
| ECG Abnormality Detection | CODE 15% | AUC91.5 | 8 | |
| Code Generation | Code | ASR86.9 | 7 | |
| Critique Quality Evaluation | Code | Win Rate67.5 | 6 | |
| Language Modeling | Code 24B tokens | Cross-Entropy Loss0.6994 | 5 | |
| Coding | CODE (test) | Turns3.62 | 4 | |
| Code Generation | CODE | F1 Score99 | 4 | |
| Tokenization | Code | Average Tokens per Sample1,694.26 | 3 | |
| Code Generation | Code | Throughput (token/s)183.54 | 3 |