Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Benchmarking Generative AI Models for Deep Learning Test Input Generation

About

Test Input Generators (TIGs) are crucial to assess the ability of Deep Learning (DL) image classifiers to provide correct predictions for inputs beyond their training and test sets. Recent advancements in Generative AI (GenAI) models have made them a powerful tool for creating and manipulating synthetic images, although these advancements also imply increased complexity and resource demands for training. In this work, we benchmark and combine different GenAI models with TIGs, assessing their effectiveness, efficiency, and quality of the generated test images, in terms of domain validity and label preservation. We conduct an empirical study involving three different GenAI architectures (VAEs, GANs, Diffusion Models), five classification tasks of increasing complexity, and 364 human evaluations. Our results show that simpler architectures, such as VAEs, are sufficient for less complex datasets like MNIST. However, when dealing with feature-rich datasets, such as ImageNet, more sophisticated architectures like Diffusion Models achieve superior performance by generating a higher number of valid, misclassification-inducing inputs.

Maryam, Matteo Biagiola, Andrea Stocco, Vincenzo Riccio• 2024

Related benchmarks

TaskDatasetResultRank
Binary ClassificationImageNet
Runtime (s)205.7
4
Binary ClassificationCelebA
Runtime (sec)1.22e+3
4
Targeted Failure GenerationCelebA
Misclass Rate47
3
Image Perturbation Quality AssessmentImageNet Human Evaluation
Ambiguity Cases7.5
3
Targeted Failure GenerationImageNet
Misclassification Rate1
3
Targeted Failure GenerationDriving
Diversity T0.102
2
Object DetectionDriving
Runtime (sec)174.6
2
Showing 7 of 7 rows

Other info

Follow for update