| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Simulation Task Planning | BEHAVIOR-1K 15 tasks | BT Valid100 | 14 | |
| Progress Estimation | Behavior | MRA87.08 | 12 | |
| Long-Horizon Household Tasks | BEHAVIOR-1K | Fitting44.7 | 12 | |
| Embodied AI Planning | BEHAVIOR-1K | Success Rate100 | 11 | |
| Visual Planning | MINIBEHAVIOR | EM7,580 | 8 | |
| Autonomous Driving | Behavior Shifted Environment (test) | Testing Reward1.02 | 8 | |
| Robot Learning | BEHAVIOR 2025 (private) | Binary Success12.4 | 5 | |
| Robot Learning | BEHAVIOR 2025 (public) | Binary Success14.4 | 5 | |
| Household Planning | Behavior-1K | Success Rate84.4 | 5 | |
| collecting_childrens_toys | BEHAVIOR-1K | Q-Score0.56 | 4 | |
| Pick up Soda Can | BEHAVIOR | Navigational Success Rate84 | 3 | |
| Pick up Radio | BEHAVIOR | Navigation Success Rate88 | 3 | |
| ADS Testing | Behavior | Execution Time (s)43.66 | 3 | |
| Motion Planning | BEHAVIOR Franka MM (test) | Motion Completion Time (sec)5.03 | 3 | |
| Motion Planning | BEHAVIOR HSR 1488 samples (test) | Motion Completion Time (sec)5.01 | 3 | |
| loading_the_car | BEHAVIOR-1K | Q-Score0.3 | 2 | |
| moving_boxes_to_storage | BEHAVIOR-1K | Q-Score0.8 | 2 | |
| set_up_a_coffee_station_in_your_kitchen | BEHAVIOR-1K | Q-Score0.2 | 2 | |
| carrying_in_groceries | BEHAVIOR-1K | Q-Score0 | 2 | |
| storing_food | BEHAVIOR-1K | Q-Score0 | 2 | |
| putting_shoes_on_rack | BEHAVIOR-1K | Q-Score0.49 | 2 | |
| hanging_pictures | BEHAVIOR-1K | Q-Score0 | 2 | |
| turning_on_radio | BEHAVIOR-1K | Q-Score0 | 2 | |
| wash_dog_toys | BEHAVIOR-1K | Q-Score0 | 2 | |
| make_pizza | BEHAVIOR-1K | Q-Score0 | 2 |