Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

About

We present a vision-action policy that won 1st place in the 2025 BEHAVIOR Challenge - a large-scale benchmark featuring 50 diverse long-horizon household tasks in photo-realistic simulation, requiring bimanual manipulation, navigation, and context-aware decision making. Building on the Pi0.5 architecture, we introduce several innovations. Our primary contribution is correlated noise for flow matching, which improves training efficiency and enables correlation-aware inpainting for smooth action sequences. We also apply learnable mixed-layer attention and System 2 stage tracking for ambiguity resolution. Training employs multi-sample flow matching to reduce variance, while inference uses action compression and challenge-specific correction rules. Our approach achieves 26% q-score across all 50 tasks on both public and private leaderboards.

Ilia Larchenko, Gleb Zarin, Akash Karnatak• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	--	957
Robotic Manipulation	RoboCasa	Average Success Rate13.2	39
Robotic Manipulation	Meta-World	Average Success Rate7.1	27
Robotic Manipulation	RoboMimic	Success Rate24	8
Robot Learning	BEHAVIOR 2025 (private)	Binary Success12.4	5
Robot Learning	BEHAVIOR 2025 (public)	Binary Success11.2	5
Bring water	B1K Challenge	Q0.233	4
Average	B1K Challenge	Q Score0.256	2
Box books	B1K Challenge	Q15	2
Can Food	B1K Challenge	Q3	2

Showing 10 of 27 rows

Other info

GitHub

Follow for update

@wizwand_team Discord