Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Large Language Models can Share Images, Too!

About

This paper explores the image-sharing capability of Large Language Models (LLMs), such as GPT-4 and LLaMA 2, in a zero-shot setting. To facilitate a comprehensive evaluation of LLMs, we introduce the PhotoChat++ dataset, which includes enriched annotations (i.e., intent, triggering sentence, image description, and salient information). Furthermore, we present the gradient-free and extensible Decide, Describe, and Retrieve (DribeR) framework. With extensive experiments, we unlock the image-sharing capability of DribeR equipped with LLMs in zero-shot prompting, with ChatGPT achieving the best performance. Our findings also reveal the emergent image-sharing ability in LLMs under zero-shot conditions, validating the effectiveness of DribeR. We use this framework to demonstrate its practicality and effectiveness in two real-world scenarios: (1) human-bot interaction and (2) dataset augmentation. To the best of our knowledge, this is the first study to assess the image-sharing ability of various LLMs in a zero-shot setting. We make our source code and dataset publicly available at https://github.com/passing2961/DribeR.

Young-Jun Lee, Dokyong Lee, Joo Won Sung, Jonghwan Hyeon, Ho-Jin Choi• 2023

Related benchmarks

TaskDatasetResultRank
Multi-modal dialogue retrievalPhotoChat (test)
R@128.3
29
Intent PredictionPhotoChat (test)
F1 Score65.6
26
Showing 2 of 2 rows

Other info

Code

Follow for update