| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ALFWorld (test) | SMaRT | Success Rate96.27 | 26 | 1mo ago | |
| HotPotQA | Average Steps per Episode4.7 | 18 | 1mo ago | ||
| WebShop | SYMPHONY-L | Score0.88 | 11 | 1mo ago | |
| Jericho ID (meta-train) | META-TTL | Score (Detective)270.5 | 9 | 17d ago | |
| ScienceWorld | AutoRefine | Pass@1 Success Rate70.4 | 4 | 1mo ago | |
| COMPAS | OFS-Grid | Reward0.7556 | 3 | 1mo ago | |
| German tail mean over the last 200 iterations | OFS-Grid | Reward0.4546 | 3 | 1mo ago | |
| MVE-S tail mean over the last 200 iterations | OFS-Grid | Reward0.3068 | 3 | 1mo ago |