Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Domain-Specific Self-Supervised Pre-training for Agricultural Disease Classification: A Hierarchical Vision Transformer Study

About

We investigate the impact of domain-specific self-supervised pre-training on agricultural disease classification using hierarchical vision transformers. Our key finding is that SimCLR pre-training on just 3,000 unlabeled agricultural images provides a +4.57% accuracy improvement--exceeding the +3.70% gain from hierarchical architecture design. Critically, we show this SSL benefit is architecture-agnostic: applying the same pre-training to Swin-Base yields +4.08%, to ViT-Base +4.20%, confirming practitioners should prioritize domain data collection over architectural choices. Using HierarchicalViT (HVT), a Swin-style hierarchical transformer, we evaluate on three datasets: Cotton Leaf Disease (7 classes, 90.24%), PlantVillage (38 classes, 96.3%), and PlantDoc (27 classes, 87.1%). At matched parameter counts, HVT-Base (78M) achieves 88.91% vs. Swin-Base (88M) at 87.23%, a +1.68% improvement. For deployment reliability, we report calibration analysis showing HVT achieves 3.56% ECE (1.52% after temperature scaling). Code: https://github.com/w2sg-arnav/HierarchicalViT

Arnav S. Sonavane• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet-C (val)
mCE81.6
97
Leaf Disease ClassificationPlantVillage
Accuracy96.3
35
Leaf Disease ClassificationPlantDoc
Accuracy87.1
18
Image ClassificationCotton Leaf Disease 1.0 (test)
F1 Score0.89
9
Image ClassificationCotton Leaf Disease Dataset (test)
Accuracy90.24
7
Leaf Disease ClassificationCotton 7 cls (test)
Accuracy90.24
5
Showing 6 of 6 rows

Other info

Follow for update