Positive-First Most Ambiguous: A Simple Active Learning Criterion for Interactive Retrieval of Rare Categories

About

Real-world fine-grained visual retrieval often requires discovering a rare concept from large unlabeled collections with minimal supervision. This is especially critical in biodiversity monitoring, ecological studies, and long-tailed visual domains, where the target may represent only a tiny fraction of the data, creating highly imbalanced binary problems. Interactive retrieval with relevance feedback offers a practical solution: starting from a small query, the system selects candidates for binary user annotation and iteratively refines a lightweight classifier. While Active Learning (AL) is commonly used to guide selection, conventional AL assumes symmetric class priors and large annotation budgets, limiting effectiveness in imbalanced, low-budget, low-latency settings. We introduce Positive-First Most Ambiguous (PF-MA), a simple yet effective AL criterion that explicitly addresses the class imbalance asymmetry: it prioritizes near-boundary samples while favoring likely positives, enabling rapid discovery of subtle visual categories while maintaining informativeness. Unlike standard methods that oversample negatives, PF-MA consistently returns small batches with a high proportion of relevant samples, improving early retrieval and user satisfaction. To capture retrieval diversity, we also propose a class coverage metric that measures how well selected positives span the visual variability of the target class. Experiments on long-tailed datasets, including fine-grained botanical data, demonstrate that PF-MA consistently outperforms strong baselines in both coverage and classifier performance, across varying class sizes and descriptors. Our results highlight that aligning AL with the asymmetric and user-centric objectives of interactive fine-grained retrieval enables simple yet powerful solutions for retrieving rare and visually subtle categories in realistic human-in-the-loop settings.

Kawtar Zaher, Olivier Buisson, Alexis Joly• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	PlantNet-300K	F1 Score50.7	48
Image Classification	CIFAR100 LT	F1 Score86.1	48
Image Classification	ImageNet LT	F1 Score73	48
Active Learning	CIFAR100 LT	Coverage@556	16
Active Learning	ImageNet LT	Coverage@549.3	16
Active Learning	PlantNet300k	Coverage@529.8	16
Interactive Retrieval	Cifar100 LT iteration 25	Coverage@2595.4	16
Interactive Retrieval	ImageNet-LT iteration 25	Coverage@2586.1	16
Interactive Retrieval	PlantNet300K iteration 25	Coverage@2568.4	16

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord