Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks

About

Deep neural networks (DNN) have been deployed in many software systems to assist in various classification tasks. In company with the fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance. To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity, which allows us to quickly identify possibly-misclassified tests. To evaluate, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experimental results demonstrate that DeepGini outperforms existing coverage-based techniques in prioritizing tests regarding both effectiveness and efficiency. Meanwhile, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques.

Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, Zhenyu Chen• 2019

Related benchmarks

TaskDatasetResultRank
Fault DetectionCIFAR-10
FDR92
36
Fault DetectionCIFAR-100
FDR69
36
Error detectionImageNet
FDR51
36
Debugging Effectiveness EstimationSVHN
ATPF81.68
21
Debugging Effectiveness EstimationCIFAR-10
ATPF71.15
21
Debugging Effectiveness EstimationSTL10
ATPF73.34
21
Fault-revealing probability estimationCIFAR-10 (test)
Execution Time (s)3.502
6
Fault-revealing probability estimationCIFAR-100 (test)
Execution Time (s)29.01
6
Fault-revealing probability estimationImageNet (test)
Execution Time (s)105.1
6
Showing 9 of 9 rows

Other info

Follow for update