| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| J-TTL Jericho (test-time learning) | EvoTest | Detective Score95 | 20 | 1mo ago | |
| Jericho Zork3 (test) | JitRL | Avg Score3.1 | 7 | 3mo ago | |
| Jericho Zork1 (test) | JitRL | Average Score53 | 7 | 3mo ago | |
| Jericho Library (test) | JitRL | Average Score25.9 | 7 | 3mo ago | |
| Jericho | APEX | Zork1 Score73 | 6 | 13d ago | |
| TW-Cooking 1000 (test) | REUSERL-SegCost (no buffer) | Pass@183.5 | 5 | 2d ago | |
| TWC out-of-distribution (Hard) | - | - | 0 | 3mo ago | |
| TWC out-of-distribution Medium | - | - | 0 | 3mo ago | |
| TWC out-of-distribution (Easy) | - | - | 0 | 3mo ago |