Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network

About

Adapting vision-language models to remote sensing imagery remains challenging due to two key factors: limited semantic coverage in textual representations and insufficient adaptability of visual features. These issues are particularly significant in aerial scenes, which involve various visual appearances and fine-grained object distinctions. We propose AVION, a knowledge distillation framework tailored for remote sensing adaptation of vision-language models. The teacher module constructs semantically rich textual prototypes by collecting descriptions from a large language model and verifying validity using remote sensing image features. The student module integrates lightweight and learnable prompts into both vision and language encoders, guided by the teacher to align embeddings and their cross-modal relationships. Once trained, the student operates independently during inference. Experiments on six optical remote sensing benchmarks show that AVION improves few-shot classification and base-class accuracy without degrading generalization to novel categories. It also enhances mean recall for cross-modal retrieval, with minimal additional trainable parameters.

Yu Hu, Jianyang Gu, Hao Liu, Yue Cao, Jozsef Hamari, Zheng Liu, Mohsen Zardadi• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationRESISC45
Accuracy76.13
349
Image-Text RetrievalRSICD
Mean Recall39.8
119
Image ClassificationAID
Accuracy77.99
45
Image ClassificationPatternNet
Accuracy92.09
34
Image-to-Text RetrievalRSITMD
Rank52.92
24
Text-to-Image RetrievalRSITMD
mR52.92
24
Base-to-novel generalization6 remote sensing datasets average
Base Score95.64
8
Classification6 Remote Sensing Datasets 2-shot
Average Accuracy (2-shot)81.86
7
Classification6 Remote Sensing Datasets 4-shot
Average Accuracy88.31
7
Classification6 Remote Sensing Datasets 8-shot
Average Accuracy (8-shot)91.85
7
Showing 10 of 14 rows

Other info

Follow for update