| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Explainable AI | CRAFTER-XAI-Bench (test) | Faithfulness0.54 | 25 | |
| Sequential environment decision making | Crafter BALROG protocol | Peak Task Score (%)37.9 | 8 | |
| Question Answering | Crafter four-actor rollout robustness | Average Exact Match (EM)59.3 | 7 | |
| Raster-to-SVG conversion | 80 CRAFTER outputs (test) | Position Score8.1 | 5 | |
| World Modeling | Crafter-OO | Rank @ 118.7 | 5 | |
| Reinforcement Learning | Crafter 1M steps | Score12.1 | 5 | |
| Reinforcement Learning | Crafter | Score16.2 | 4 | |
| World Model Prediction | Crafter | Avg Imagination MSE5.07 | 2 |