| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Code Generation | Code | Accuracy98.7 | 242 | |
| ECG Classification | CODE-15% (test) | Macro AUROC97 | 36 | |
| Continual Learning | Code Domain-level stream | AA62.09 | 34 | |
| AI-generated text detection | Code | AUC0.979 | 24 | |
| Speculative Decoding | Code | Throughput (tokens/s)138.72 | 22 | |
| Code Reasoning | Code HumanEval+ LiveCodeBench v5 | HEval+ (Pass@1)79.88 | 18 | |
| Code Generation | Code Category Average (test) | Accuracy77.43 | 18 | |
| Incremental BPE Tokenization | Code | End-to-end CPU Time (s)1.369 | 15 | |
| ECG Interpretation | CODE 15% | AUC86.4 | 15 | |
| Code Generation | Code | Performance Score52.67 | 12 | |
| BPE Tokenization | Code | Speedup Factor2.88 | 12 | |
| Machine-generated text detection | Code Llama-3-70B-Instruct (test) | AUC0.951 | 12 | |
| AI-generated text detection | Code GPT-3.5 Turbo | AUC0.906 | 12 | |
| Code Generation and Understanding | Code Crux, MultiPL_E, MBPP | Crux Score60.12 | 12 | |
| Code Generation | Code latest (test) | HumanEval83.9 | 12 | |
| Agentic Routing | Code MBPP HumanEval | Accuracy76 | 10 | |
| Answerability Prediction | CODE n=30 (matched pairs) | AUC85 | 9 | |
| ECG Interpretation | CODE (test) | AUC96.79 | 9 | |
| Multi-turn conversation performance | Code | Avg Performance98.3 | 9 | |
| Speculative Decoding | Code Stack-Edu | Speedup2.61 | 8 | |
| Code Generation | Code Out-of-Domain | Accuracy63.99 | 8 | |
| ECG Abnormality Detection | CODE 15% | AUC91.5 | 8 | |
| Code Generation | Code | ASR86.9 | 7 | |
| Speculative Decoding | Code | Speedup2.38 | 6 | |
| Code Generation | Code (test) | Accuracy36.5 | 6 |