| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Answer-position balance evaluation | MCQ | MCQ TV0.061 | 34 | |
| Distractor Generation | MCQ (test) | P@122.39 | 17 | |
| Series Comparison | MCQ2 | Accuracy67 | 15 | |
| Analogical Reasoning | MCQ | Accuracy46 | 14 | |
| Distractor Generation | MCQ | P@130.5 | 12 | |
| Detection | MCQ | Detection Score71.6 | 5 | |
| Prevention | MCQ | gpt-5.1 Score99.3 | 5 | |
| Distractor Generation | MCQ dataset | Relevance4.45 | 5 |