SOTA: Self-adaptive Optimal Transport for Zero-Shot Classification with Multiple Foundation Models

About

Foundation models have attracted widespread attention across domains due to their powerful zero-shot classification capabilities. This work is motivated by two key observations: (1) \textit{Vision-Language Models} (VLMs), such as CLIP, often over-rely on class-level textual priors and struggle to capture fine-grained visual cues, whereas \textit{Vision-only Foundation Models} (VFMs), such as DINO, provide rich and discriminative visual features but lack semantic alignment; (2) the performance of different VLMs varies considerably across datasets owing to differences in pre-training. To address these challenges, we propose \textbf{SOTA} (\textit{Self-adaptive Optimal TrAnsport}), a \textit{training-free} ensemble framework that integrates the outputs of multiple foundation models~(VFMs or VLMs) by learning a self-adaptive transport plan. Notably, \textbf{SOTA} is prior-free and automatically balances model contributions. Extensive experiments across diverse domains, including natural images, medical pathology, and remote sensing, validate the generalizability of \textbf{SOTA}. The results consistently show that it effectively leverages the complementary strengths of different foundation models and achieves substantial improvements over individual models. The implementation code is available at: https://github.com/Afleve/self-adaptive-Optimal-Transport.

Zhanxuan Hu, Qiyu Xu, Yu Duan, Yonghang Tai, Huafeng Li• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	DTD	Accuracy57.5	599
Image Classification	EuroSAT	Accuracy71.7	569
Image Classification	Flowers102	Accuracy85.1	558
Image Classification	RESISC45	Accuracy88.8	472
Image Classification	Food101	Accuracy90.2	457
Image Classification	SUN397	Accuracy73	450
Image Classification	StanfordCars	Accuracy78.8	384
Image Classification	Aircraft	Accuracy31.8	340
Image Classification	Pets	Accuracy95.4	308
Image Classification	Caltech101	Accuracy96.9	228

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord