Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Prompt Refinement with Image Pivot for Text-to-Image Generation

About

For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement model. Inspired by zero-shot machine translation techniques, we introduce Prompt Refinement with Image Pivot (PRIP). PRIP innovatively uses the latent representation of a user-preferred image as an intermediary "pivot" between the user and system languages. It decomposes the refinement process into two data-rich tasks: inferring representations of user-preferred images from user languages and subsequently translating image representations into system languages. Thus, it can leverage abundant data for training. Extensive experiments show that PRIP substantially outperforms a wide range of baselines and effectively transfers to unseen systems in a zero-shot manner.

Jingtao Zhan, Qingyao Ai, Yiqun Liu, Yingwei Pan, Ting Yao, Jiaxin Mao, Shaoping Ma, Tao Mei• 2024

Related benchmarks

TaskDatasetResultRank
Prompt RefinementSD 1.4 (In-distribution)
ImageReward (Anime)0.346
10
Prompt RefinementSDXL unseen v1.0 (test)
ImageReward0.983
10
Prompt RefinementDeepFloyd IF unseen (test)
ImageReward74.1
10
Prompt RefinementSUR-adapter unseen (test)
ImageReward0.789
10
Prompt RefinementReFL unseen (test)
ImageReward0.64
10
Prompt RefinementUnseen Systems Aggregated (test)
Relevance1.68
5
Showing 6 of 6 rows

Other info

Code

Follow for update