Cross-Modal Fine-Tuning: Align then Refine
About
Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a series of ablation studies and demonstrate ORCA's utility in data-limited regimes.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| PDE solving | CodePDE | Advection Error0.98 | 72 | |
| PDE solving | PDEBench Diff.Reac 1D (test) | nRMSE0.0032 | 13 | |
| Cross-modal adaptation | NAS-Bench-360 | Darcy (Relative L2)0.0075 | 9 | |
| Diverse Prediction Tasks | NAS-Bench-360 (test) | Darcy Score0.0075 | 9 | |
| PDE solving | PDEBench Advection (test) | nRMSE0.0098 | 9 | |
| PDE solving | PDEBench Diff.Sorp (test) | nRMSE0.0018 | 9 | |
| PDE solving | PDEBench Darcy (test) | nRMSE0.081 | 8 | |
| Darcy | PDEBench | nRMSE0.081 | 5 | |
| Diff.Reac(2D) | PDEBench | nRMSE0.82 | 5 | |
| Navier-Stokes | PDEBench | nRMSE0.066 | 5 |