| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Rubric satisfaction evaluation | Medical | Claude-4 Sonnet Score50.9 | 21 | |
| Hypernym discovery | medical Gold standard domain-specific (test) | MRR77.32 | 18 | |
| Preference Evaluation | Medical | Avg Score8.58 | 14 | |
| Importance-based Node Leakage | Medical | Leakage (Deg)36.2 | 10 | |
| Factual Precision Evaluation | Medical | SAFE87.3 | 10 | |
| Machine Translation | Medical (test) | BLEU55.42 | 9 | |
| MRI to CT translation | medical MRI→CT 256 × 256 (test) | NFE4 | 7 | |
| Machine Translation | Medical All-domain datastore (test) | BLEU55.1 | 6 | |
| Access Control | Medical | Accuracy100 | 5 | |
| Machine Translation | Medical out-of-domain (test) | BLEU15.4 | 5 | |
| Mixed Linear Regression | medical | Minimal Error (K=2)0.1591 | 5 | |
| Machine Translation | Medical multi-domain (test) | Decoding Throughput (Tok/Sec)3,152.59 | 2 | |
| DSL Evaluation | Medical | Opinion4.4 | 1 |