| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Aspect-based Sentiment Analysis | ASAP (test) | F1 Score94.59 | 95 | |
| Automated Essay Scoring | ASAP 1.0 (test) | Prompt 1 QWK0.836 | 51 | |
| Sheet music transcription | ASAP n=102 | OMR-NED64.3 | 16 | |
| Evaluation Alignment | ASAP 2.0 | QWK0.7276 | 16 | |
| Automatic Text Evaluation | ASAP | QWK0.379 | 15 | |
| Automated Essay Scoring | ASAP Kaggle 2.0 (test) | QWK0.84 | 13 | |
| Trait-wise Automated Essay Scoring | ASAP and ASAP++ (five-fold cross-val) | Overall Score77.8 | 11 | |
| Automated Essay Scoring | ASAP and ASAP++ (five-fold cross-validation) | Score P10.73 | 11 | |
| Essay Scoring | ASAP++ five-fold averaged results | Overall Score0.712 | 10 | |
| Essay Scoring | ASAP-SAS | QWK (Prompt 3)0.661 | 10 | |
| Automated Essay Scoring | ASAP++ full-data setting | Score P10.734 | 10 | |
| Multi-trait Automated Essay Scoring | ASAP++ (full-data) | Overall Score0.781 | 10 | |
| Automatic Text Scoring | ASAP (test) | QWK0.764 | 9 | |
| Automatic Essay Scoring | ASAP In-domain (5-fold cross-validation) | Overall QWK0.785 | 8 | |
| Automated Essay Scoring | ASAP 2.0 | QWK49.51 | 7 | |
| Rhythm Quantization | ASAP ACPAS definitions (test) | Epsilon Onset Error12.3 | 7 | |
| Expressive Piano Performance Rendering | ASAP (test) | Velocity JS Div0.0427 | 7 | |
| Standard-Cell Performance Prediction | ASAP 7nm (test) | Rise Delay MAPE0.94 | 6 | |
| Multi-trait automated essay scoring | ASAP Prompt 8 (test) | Ideas0.694 | 6 | |
| Multi-trait automated essay scoring | ASAP Prompt 7 (test) | Ideas Score69.5 | 6 | |
| Automated Essay Scoring | ASAP++ | QWK0.726 | 5 | |
| Automated Essay Scoring | ASAP | QWK0.743 | 5 | |
| Trait-level Essay Scoring | ASAP (test) | Content Score65.1 | 4 | |
| Automated Essay Scoring | ASAP Long Essays (Prompts 1, 2, 8) | Score (P1)83.6 | 4 | |
| Alignment | (n)ASAP Dataset piano performances | Mean Error (ms)6 | 3 |