Mitigating Domain Drift in Multi Species Segmentation with DINOv2: A Cross-Domain Evaluation in Herbicide Research Trials

About

Reliable plant species and damage segmentation for herbicide field research trials requires models that can withstand substantial real-world variation across seasons, geographies, devices, and sensing modalities. Most deep learning approaches trained on controlled datasets fail to generalize under these domain shifts, limiting their suitability for operational phenotyping pipelines. This study evaluates a segmentation framework that integrates vision foundation models (DINOv2) with hierarchical taxonomic inference to improve robustness across heterogeneous agricultural conditions. We train on a large, multi-year dataset collected in Germany and Spain (2018-2020), comprising 14 plant species and 4 herbicide damage classes, and assess generalization under increasingly challenging shifts: temporal and device changes (2023), geographic transfer to the United States, and extreme sensor shift to drone imagery (2024). Results show that the foundation-model backbone consistently outperforms prior baselines, improving species-level F1 from 0.52 to 0.87 on in-distribution data and maintaining significant advantages under moderate (0.77 vs. 0.24) and extreme (0.44 vs. 0.14) shift conditions. Hierarchical inference provides an additional layer of robustness, enabling meaningful predictions even when fine-grained species classification degrades (family F1: 0.68, class F1: 0.88 on aerial imagery). Error analysis reveals that failures under severe shift stem primarily from vegetation-soil confusion, suggesting that taxonomic distinctions remain preserved despite background and viewpoint variability. The system is now deployed within BASF's phenotyping workflow for herbicide research trials across multiple regions, illustrating the practical viability of combining foundation models with structured biological hierarchies for scalable, shift-resilient agricultural monitoring.

Artzai Picon, Itziar Eguskiza, Daniel Mugica, Javier Romero, Carlos Javier Jimenez, Eric White, Gabriel Do-Lago-Junqueira, Christian Klukas, Ramon Navarra-Mestre• 2025

Related benchmarks

Task	Dataset	Result
Species Identification	DRONE	Balanced Accuracy0.935	12
Species Identification	BASE dataset 2019A2 (test)	Balanced Accuracy97.7	12
Species Identification	REALITY	Balanced Accuracy96	12
Taxonomic Classification	DRONE dataset excluding misc class	Balanced Accuracy99.1	10
Damage Assessment (Damage Classes)	BASE dataset 2019A2 (test)	Balanced Accuracy92.2	2
Damage Assessment (Healthy/Damaged)	BASE dataset 2019A2 (test)	Balanced Accuracy94.3	2

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord