| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RockSample (8, 4, 10, -1) | Average Reward16.7 | 30 | 8d ago | ||
| RockSample (7, 8, 20, 0) | Average Reward28.5 | 30 | 8d ago | ||
| Tag | Average Reward-2.3 | 30 | 8d ago | ||
| ALFWorld (test) | SMaRT | Success Rate96.27 | 26 | 2mo ago | |
| HotPotQA | Average Steps per Episode4.7 | 18 | 2mo ago | ||
| WebShop | SYMPHONY-L | Score0.88 | 11 | 3mo ago | |
| Jericho ID (meta-train) | META-TTL | Score (Detective)270.5 | 9 | 2mo ago | |
| FrozenLake (val) | EVPO | Success Rate68.4 | 6 | 1mo ago | |
| Sokoban (val) | EVPO | Success Rate60.4 | 6 | 1mo ago | |
| ScienceWorld | AutoRefine | Pass@1 Success Rate70.4 | 4 | 3mo ago | |
| COMPAS | OFS-Grid | Reward0.7556 | 3 | 3mo ago | |
| German tail mean over the last 200 iterations | OFS-Grid | Reward0.4546 | 3 | 3mo ago | |
| MVE-S tail mean over the last 200 iterations | OFS-Grid | Reward0.3068 | 3 | 3mo ago | |
| ALFWorld (val_seen) | Selective-rollout gate | Total Wall-Clock Time (s)15,527 | 2 | 26d ago |