Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Effort-Optimized, Accuracy-Driven Labelling and Validation of Test Inputs for DL Systems: A Mixed-Integer Linear Programming Approach

About

Software systems increasingly include AI components based on deep learning (DL). Reliable testing of such systems requires near-perfect test-input validity and label accuracy, with minimal human effort. Yet, the DL community has largely overlooked the need to build highly accurate datasets with minimal effort, since DL training is generally tolerant of labelling errors. This challenge, instead, reflects concerns more familiar to software engineering, where a central goal is to construct high-accuracy test inputs, with accuracy as close to 100% as possible, while keeping associated costs in check. In this article we introduce OPAL, a human-assisted labelling method that can be configured to target a desired accuracy level while minimizing the manual effort required for labelling. The main contribution of OPAL is a mixed-integer linear programming (MILP) formulation that minimizes labelling effort subject to a specified accuracy target. To evaluate OPAL we instantiate it for two tasks in the context of testing vision systems: automatic labelling of test inputs and automated validation of test inputs. Our evaluation, based on more than 2500 experiments performed on nine datasets, comparing OPAL with eight baseline methods, shows that OPAL, relying on its MILP formulation, achieves an average accuracy of 98.8%, while cutting manual labelling by more than half. OPAL significantly outperforms automated labelling baselines in labelling accuracy across all nine datasets, when all methods are provided with the same manual-labelling budget. For automated test-input validation, on average, OPAL reduces manual effort by 28.8% while achieving 4.5% higher accuracy than the SOTA test-input validation baselines. Finally, we show that augmenting OPAL with an active-learning loop leads to an additional 4.5% reduction in required manual labelling, without compromising accuracy.

Mohammad Hossein Amini, Mehrdad Sabetzadeh, Shiva Nejati• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationFashion MNIST
Accuracy98.8
240
Labelling AccuracyCIFAR-10
Accuracy98.4
5
Labelling AccuracyMNIST
Accuracy99.5
5
Labelling AccuracySVHN
Accuracy98.7
5
Labelling AccuracyCelebA Hair
Accuracy99
5
Labelling AccuracyCelebA M/F
Accuracy (CelebA M/F)99.1
5
Labelling AccuracySynthetic Pub1
Accuracy97.9
5
Labelling AccuracySynthetic Pub 2
Accuracy99.2
5
Labelling AccuracyIndustry
Accuracy98.6
5
Showing 9 of 9 rows

Other info

Follow for update