Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

About

We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperparameters, and evaluation suite. We share our models and code at https://github.com/mlfoundations/open_flamingo.

Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt• 2023

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2
Accuracy54.8
1165
Visual Question AnsweringTextVQA
Accuracy54.7
1117
Visual Question AnsweringVizWiz
Accuracy44
1043
Visual Question AnsweringVQA v2 (test-dev)
Overall Accuracy54.8
664
Multimodal UnderstandingMM-Vet
MM-Vet Score21.8
418
Multimodal UnderstandingMMBench
Accuracy6.6
367
Visual Question AnsweringTextVQA (val)
VQA Score2.83e+3
309
Visual Question AnsweringOKVQA
Top-1 Accuracy37.8
283
Multimodal ReasoningMM-Vet
MM-Vet Score24.8
281
Video UnderstandingMVBench
Accuracy7.9
247
Showing 10 of 90 rows
...

Other info

Code

Follow for update