Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

About

Recent studies suggest that Reinforcement Fine-Tuning (RFT) is inherently more resilient to catastrophic forgetting than Supervised Fine-Tuning (SFT). However, whether RFT (e.g., GRPO) can effectively overcome forgetting in challenging visual continual learning settings, such as class-incremental learning (CIL) and domain-incremental learning (DIL), remains an open problem. Through a pilot study, we confirm that while RFT consistently outperforms SFT, it still suffers from non-negligible forgetting. We empirically trace this bottleneck to Trajectory-level Drift Agnosticism: among candidate rollouts achieving identical task rewards, the KL divergence from the preceding-task policy varies substantially, which strongly correlates with catastrophic forgetting across sequential tasks. Motivated by this insight, we propose Retention-aware Policy Optimization (RaPO), a simple yet effective RFT method that explicitly mitigates forgetting through trajectory-level reward shaping. Specifically, RaPO comprises two core components: (1) Retention Reward that converts trajectory-level distribution drift into a continuous reward signal, preferentially reinforcing knowledge-preserving rollouts within each group; (2) Cross-Task Advantage Normalization (CTAN), which maintains a persistent exponential moving average of reward statistics across task boundaries to stabilize the optimization progress during continual learning. Leveraging the free-form textual generalization of MLLMs, we comprehensively evaluate RaPO across five visual continual learning settings. Extensive experiments demonstrate that RaPO achieves leading performance, substantially reducing catastrophic forgetting while preserving strong plasticity. To the best of our knowledge, this work represents the first systematic exploration of RFT in visual continual learning, offering insights that we hope will inspire future research.

Meng Lou, Hanzhong Guo, Linwei Chen, Yizhou Yu• 2026

Related benchmarks

TaskDatasetResultRank
Class-incremental learningImageNet-R
Last Accuracy85.92
147
Class-incremental learningImageNet A--
110
Object DetectionCOCO 2017
Ab19.31
16
Class-incremental image classificationTinyImageNet
Last Accuracy62.36
14
Class-incremental image classificationCUB-200
Last Accuracy45.15
14
Object DetectionPascal Series 4 Domains
Ab Score37.18
8
Video ClassificationUCF-101 (5 Tasks)
Last Accuracy79.57
8
Video ClassificationKinetics 200 5 Tasks
Last Accuracy74.18
8
Image ClassificationDomainNet 6 Domains
Metric A66.27
8
Image ClassificationOfficeHome 4 Domains
Accuracy93.63
8
Showing 10 of 12 rows

Other info

Follow for update