Cross-Modal Fine-Tuning: Align then Refine

About

Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a series of ablation studies and demonstrate ORCA's utility in data-limited regimes.

Junhong Shen, Liam Li, Lucio M. Dery, Corey Staten, Mikhail Khodak, Graham Neubig, Ameet Talwalkar• 2023

Related benchmarks

Task	Dataset	Result
PDE solving	CodePDE	Advection Error0.98	72
PDE solving	PDEBench Diff.Sorp (test)	nRMSE0.0018	65
Advection	Advection	nRMSE27.5	60
PDE solving	PDEBench Diff.Reac 1D (test)	nRMSE0.0032	41
Cross-modal adaptation	NAS-Bench-360	Darcy (Relative L2)0.0075	9
Diverse Prediction Tasks	NAS-Bench-360 (test)	Darcy Score0.0075	9
PDE solving	PDEBench Advection (test)	nRMSE0.0098	9
PDE solving	PDEBench Darcy (test)	nRMSE0.081	8
PDE solving	PDEBench Diffusion Reaction (1D)	nRMSE0.0032	8
Darcy	PDEBench	nRMSE0.081	5

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord