Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FOSP: Fine-tuning Offline Safe Policy through World Models

About

Offline Safe Reinforcement Learning (RL) seeks to address safety constraints by learning from static datasets and restricting exploration. However, these approaches heavily rely on the dataset and struggle to generalize to unseen scenarios safely. In this paper, we aim to improve safety during the deployment of vision-based robotic tasks through online fine-tuning an offline pretrained policy. To facilitate effective fine-tuning, we introduce model-based RL, which is known for its data efficiency. Specifically, our method employs in-sample optimization to improve offline training efficiency while incorporating reachability guidance to ensure safety. After obtaining an offline safe policy, a safe policy expansion approach is leveraged for online fine-tuning. The performance of our method is validated on simulation benchmarks with five vision-only tasks and through real-world robot deployment using limited data. It demonstrates that our approach significantly improves the generalization of offline policies to unseen safety-constrained scenarios. To the best of our knowledge, this is the first work to explore offline-to-online RL for safe generalization tasks.

Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang• 2024

Related benchmarks

TaskDatasetResultRank
Autonomous DrivingCarDreamer Four Lane
Driving Cost19.8
10
Autonomous DrivingCarDreamer Roundabout
Driving Cost4.57
5
Autonomous DrivingCarDreamer Lane Merge
Arrive Rate15
5
DrivingCarDreamer Lane Merge
Driving Score131.2
5
DrivingCarDreamer Left Turn
Driving Score231.1
5
DrivingCarDreamer Right Turn
Driving Score108
5
Autonomous DrivingCarDreamer Lane Merge
Cost2.61
5
Autonomous DrivingCarDreamer Navigation
Driving Cost45.74
5
Autonomous DrivingCarDreamer Right Turn
Driving Cost2.58
5
Autonomous DrivingCarDreamer Left Turn
Driving Cost1.85
5
Showing 10 of 17 rows

Other info

Follow for update