COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control

About

Visual reinforcement learning (RL) suffers from poor sample efficiency due to high-dimensional observations in complex tasks. While existing works have shown that vision-language models (VLMs) can assist RL, they often focus on knowledge distillation from the VLM to RL, overlooking the potential of RL-generated interaction data to enhance the VLM. To address this, we propose COVR, a collaborative optimization framework that enables the mutual enhancement of the VLM and RL policies. Specifically, COVR fine-tunes the VLM with RL-generated data to enhance the semantic reasoning ability consistent with the target task, and uses the enhanced VLM to further guide policy learning via action priors. To improve fine-tuning efficiency, we introduce two key modules: (1) an Exploration-Driven Dynamic Filter module that preserves valuable exploration samples using adaptive thresholds based on the degree of exploration, and (2) a Return-Aware Adaptive Loss Weight module that improves the stability of training by quantifying the inconsistency of sampling actions via return signals of RL. We further design a progressive fine-tuning strategy to reduce resource consumption. Extensive experiments show that COVR achieves strong performance across various challenging visual control tasks.

Canming Xia, Peixi Peng, Guang Tan, Zhan Su, Haoran Xu, Zhenxian Liu, Luntong Li• 2026

Related benchmarks

Task	Dataset	Result
Visual Reinforcement Learning	DMControl Cartpole, Swingup	Episode Return872	16
Visual Reinforcement Learning	DMControl Reacher Easy	Episode Return969	16
Visual Reinforcement Learning	DMControl Cheetah Run	Episode Return504	16
Visual Reinforcement Learning	DMControl Walker Walk	Episode Return802	16
Visual Reinforcement Learning	DMControl Finger, Spin	Episode Return976	16
Visual Reinforcement Learning	DMControl Ball in cup, Catch	Episode Return960	16
Autonomous Driving	CARLA (#HW)	Error Rate248	15
Visual Reinforcement Learning	CARLA (#GP scenario)	ER235	15
Visual Reinforcement Learning	CarRacing v0 (test)	Environment Reward6.19e+5	11
Visual Reinforcement Learning	CARLA Scenario A (test)	ER47	6

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord