Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

About

With the accumulation of resources in the era of big data and the rise of pre-trained models in deep learning, optimizing neural networks for various tasks often involves different strategies for fine-tuning pre-trained models versus training from scratch. However, existing optimizers primarily focus on reducing the loss function by updating model parameters, without fully addressing the unique demands of these two major paradigms. In this paper, we propose DualOpt, a novel approach that decouples optimization techniques specifically tailored for these distinct training scenarios. For training from scratch, we introduce real-time layer-wise weight decay, designed to enhance both convergence and generalization by aligning with the characteristics of weight updates and network architecture. For more importantly fine-tuning, we integrate weight rollback with the optimizer, incorporating a rollback term into each weight update step. This ensures consistency in the weight distribution between upstream and downstream models, effectively mitigating knowledge forgetting and improving fine-tuning performance. Additionally, we extend the layer-wise weight decay to dynamically adjust the rollback levels across layers, adapting to the varying demands of different downstream tasks. Extensive experiments across diverse tasks, including image classification, object detection, semantic segmentation, and instance segmentation, demonstrate the broad applicability and state-of-the-art performance of DualOpt. Code is available at https://github.com/qklee-lz/OLOR-AAAI-2024.

Xin Ning, Qiankun Li, Xiaolong Huang, Qiupu Chen, Feng He, Weijun Li, Prayag Tiwari, Xinwang Liu• 2026

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K
mIoU44.62
559
Image ClassificationStanfordCars
Accuracy88.99
384
Image ClassificationCUB-200 2011
Accuracy89.47
374
Object DetectionCOCO 2017--
345
Image ClassificationImageNet
Top-1 Accuracy83.89
343
Instance SegmentationCOCO 2017--
236
Image ClassificationSVHN
Top-1 Accuracy97.35
186
Image ClassificationOfficeHome
Average Accuracy92.59
161
Image ClassificationCIFAR100
Average Accuracy92.89
150
Image ClassificationPACS
Accuracy96.63
130
Showing 10 of 12 rows

Other info

Follow for update