SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism

About

Adopting large-scale AI models in enterprise information systems is often hindered by high training costs and long development cycles, posing a significant managerial challenge. The standard end-to-end backpropagation (BP) algorithm is a primary driver of modern AI, but it is also the source of inefficiency in training deep networks. This paper introduces a new training methodology, Supervised Contrastive Parallel Learning (SCPL), that addresses this issue by decoupling BP and transforming a long gradient flow into multiple short ones. This design enables the simultaneous computation of parameter gradients in different layers, achieving superior model parallelism and enhancing training throughput. Detailed experiments are presented to demonstrate the efficiency and effectiveness of our model compared to BP, Early Exit, GPipe, and Associated Learning (AL), a state-of-the-art method for decoupling backpropagation. By mitigating a fundamental performance bottleneck, SCPL provides a practical pathway for organizations to develop and deploy advanced information systems more cost-effectively and with greater agility. The experimental code is released for reproducibility. https://github.com/minyaho/scpl/

Ming-Yao Ho, Cheng-Kai Wang, You-Teng Lin, Hung-Hsuan Chen• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	Tiny ImageNet (test)	Accuracy48.95	722
Text Classification	AG News (test)	Accuracy92.12	293
Text Classification	IMDB (test)	Accuracy89.84	81

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord