| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Random topology (test) | Finished Tasks Ratio94.6 | 18 | 3mo ago | ||
| Synthetic personalized interaction datasets (evaluation) | Task Completion Score8.48 | 10 | 3mo ago | ||
| ALFWorld Unseen | EAGLET | Average Steps8.2 | 6 | 1mo ago | |
| ALFWorld Seen | EAGLET | Average Steps8.6 | 6 | 1mo ago | |
| ScienceWorld Unseen | EAGLET | Average Steps10.6 | 6 | 1mo ago | |
| ScienceWorld Seen | EAGLET | Average Steps10.2 | 6 | 1mo ago | |
| Real-world (test) | Score8.09 | 6 | 3mo ago | ||
| L-IVA 1.0 (test) | ORCA | Task Success Rate - Kit73.8 | 4 | 3mo ago | |
| Internal Task Benchmark | Avg Connection Time (hours)0 | 3 | 3mo ago |