Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gen-n-Val: Agentic Image Data Generation and Validation

About

The data scarcity, label noise, and long-tailed category imbalance remain important and unresolved challenges in many computer vision tasks, such as object detection and instance segmentation, especially on large-vocabulary benchmarks like LVIS, where most categories appear in only a few images. Current synthetic data generation methods still suffer from multiple objects per mask, inaccurate segmentation, incorrect category labels, and other issues, limiting their effectiveness. To address these issues, we introduce Gen-n-Val, a novel agentic data generation framework that leverages Layer Diffusion (LD), a Large Language Model (LLM), and a Vision Large Language Model (VLLM) to produce high-quality and diverse instance masks and images for object detection and instance segmentation. Gen-n-Val consists of two agents: (1) the LD prompt agent, an LLM, optimizes rompts to encourage LD to generate high-quality foreground single-object images and corresponding segmentation masks; and (2) the data validation agent, a VLLM, filters out low-quality synthetic instance images. The system prompts for both agents are optimized by TextGrad. Compared to state-of-the-art synthetic data approaches like MosaicFusion, our approach reduces invalid synthetic data from 50% to 7% and improves performance by 7.6% on rare classes in LVIS instance segmentation with Mask R-CNN, and by 3.6% mAP on rare classes in COCO instance segmentation with YOLOv9c and YOLO11m. Furthermore, Gen-n-Val shows significant improvements (7.1% mAP) over YOLO-Worldv2-M in open-vocabulary object detection benchmarks with YOLO11m. Moreover, Gen-n-Val has scalability in model capacity and dataset size. The code is available at https://github.com/aiiu-lab/Gen-n-Val.

Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Yu-Lun Liu, Chih-Yu Wang, Jun-Cheng Chen• 2025

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)--
2643
Instance SegmentationCOCO 2017 (val)--
1201
Object DetectionLVIS v1.0 (val)
APbbox51.5
529
Instance SegmentationLVIS 1.0 (val)
AP (Mask)45.9
33
Open-vocabulary object detectionCOCO 5K-image subset of LVIS 2017 (val)
mAP (Box)49.8
2
Showing 5 of 5 rows

Other info

Follow for update