Robust MultiSpecies Agricultural Segmentation Across Devices, Seasons, and Sensors Using Hierarchical DINOv2 Models
About
Reliable plant species and damage segmentation for herbicide field research trials requires models that can withstand substantial real-world variation across seasons, geographies, devices, and sensing modalities. Most deep learning approaches trained on controlled datasets fail to generalize under these domain shifts, limiting their suitability for operational phenotyping pipelines. This study evaluates a segmentation framework that integrates vision foundation models (DINOv2) with hierarchical taxonomic inference to improve robustness across heterogeneous agricultural conditions. We train on a large, multi-year dataset collected in Germany and Spain (2018-2020), comprising 14 plant species and 4 herbicide damage classes, and assess generalization under increasingly challenging shifts: temporal and device changes (2023), geographic transfer to the United States, and extreme sensor shift to drone imagery (2024). Results show that the foundation-model backbone consistently outperforms prior baselines, improving species-level F1 from 0.52 to 0.87 on in-distribution data and maintaining significant advantages under moderate (0.77 vs. 0.24) and extreme (0.44 vs. 0.14) shift conditions. Hierarchical inference provides an additional layer of robustness, enabling meaningful predictions even when fine-grained species classification degrades (family F1: 0.68, class F1: 0.88 on aerial imagery). Error analysis reveals that failures under severe shift stem primarily from vegetation-soil confusion, suggesting that taxonomic distinctions remain preserved despite background and viewpoint variability. The system is now deployed within BASF's phenotyping workflow for herbicide research trials across multiple regions, illustrating the practical viability of combining foundation models with structured biological hierarchies for scalable, shift-resilient agricultural monitoring.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Species Identification | DRONE | Balanced Accuracy0.935 | 12 | |
| Species Identification | BASE dataset 2019A2 (test) | Balanced Accuracy97.7 | 12 | |
| Species Identification | REALITY | Balanced Accuracy96 | 12 | |
| Taxonomic Classification | DRONE dataset excluding misc class | Balanced Accuracy99.1 | 10 | |
| Damage Assessment (Damage Classes) | BASE dataset 2019A2 (test) | Balanced Accuracy92.2 | 2 | |
| Damage Assessment (Healthy/Damaged) | BASE dataset 2019A2 (test) | Balanced Accuracy94.3 | 2 |