| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Language Understanding | P3 v1 (unseen) | RTE Accuracy80.83 | 11 | |
| Constrained Bayesian Optimization | P3 | Log10 Median Utility Gap1.28 | 10 | |
| Investment decision alignment | P3 v1 (test) | Overall MSE1.59 | 6 | |
| Word Sense Disambiguation | P3 | WiC Score53.3 | 5 | |
| Coreference Resolution | P3 | Winogrades Score61.6 | 5 | |
| Sentence Completion | P3 | COPA Accuracy85.3 | 5 | |
| Natural Language Inference | P3 | RTE81.3 | 5 | |
| Multiple-Choice Question Answering | P3 | Dream77.6 | 5 | |
| Summarization | P3 | Mul. News Score7.8 | 5 | |
| Sentiment Analysis | P3 | Emotion Accuracy49.4 | 5 | |
| Minimal Problem Solving | P3.5P focal | Template Size (R×C)2,043 | 4 |