| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reinforcement Learning from Verifiable Rewards | HEAD-QA | AR100 | 30 | |
| Medical Multiple-Choice Question Answering | Head-QA (test) | Accuracy28.79 | 10 | |
| Question Answering | HEAD-QA | AR (%)100 | 7 | |
| Question Answering | HEAD-QA (eval) | Pathwise Violations (PathV)0 | 3 |