MURAL: Multimodal, Multitask Retrieval Across Languages

About

Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN (Jia et al. PMLR'21)--a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL's performance matches or exceeds ALIGN's cross-modal retrieval performance on well-resourced languages across several datasets. More importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL-base improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning. We additionally show that MURAL's text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.

Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge• 2021

Related benchmarks

Task	Dataset	Result
Image-Text Retrieval	Flickr30k (test)	--	45
Multimodal Retrieval	Multi30K (test)	Recall (EN)93.8	35
Image-Text Retrieval	MSCOCO (test)	EN Retrieval Score92.3	28
Image-to-Text Retrieval	Crisscrossed Captions (CxC)	R@146.5	20
Cross-modal retrieval	MSCOCO 1K	Mean Recall (ja)91.6	16
Cross-modal retrieval	MSCOCO (5K)	Mean Recall (ja)81.3	12
Image-to-Image Retrieval	Crisscrossed Captions (CxC)	R@150.3	10
Semantic Similarity	Crisscrossed Captions (CxC)	Mean Average74.1	10
Text-to-Text Retrieval	Crisscrossed Captions (CxC)	R@157.8	10
Text-to-Image Retrieval	XTD (test)	--	9

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord