| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Bandit | Qwen2.5-1.5B-It + Evolving Stage | Success Rate (pass@1)95 | 19 | 3mo ago | |
| Text-based Game | Average Reward0.531 | 13 | 1mo ago | ||
| hard MAB instance | Mean Average Reward53.1 | 10 | 1mo ago | ||
| Multi-Armed Bandit (MAB) Horizon Generalization T=100 | Iterative RMFT | Average Regret22.37 | 7 | 2d ago | |
| Yahoo! dataset Day 9 | EFF-RAW-UCB | Average Computational Time (s)34 | 7 | 1mo ago | |
| Yahoo! Day 8 | EFF-RAW-UCB | Average Latency (s)44 | 7 | 1mo ago | |
| Yahoo! Day 7 | EFF-RAW-UCB | Average Computational Time (s)41 | 7 | 1mo ago | |
| Yahoo! Day 6 | EFF-RAW-UCB | Average Computational Time (s)46 | 7 | 1mo ago | |
| Yahoo! Day 5 | EFF-RAW-UCB | Average Latency (s)47 | 7 | 1mo ago | |
| Yahoo! Day 4 | EFF-RAW-UCB | Avg Time (s)43 | 7 | 1mo ago | |
| Yahoo! Day 3 | EFF-RAW-UCB | Average Time (s)33 | 7 | 1mo ago | |
| Yahoo! Day 2 | EFF-RAW-UCB | Average Time (s)35 | 7 | 1mo ago |