ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet

About

This paper introduces ViscoNet, a novel one-branch-adapter architecture for concurrent spatial and visual conditioning. Our lightweight model requires trainable parameters and dataset size multiple orders of magnitude smaller than the current state-of-the-art IP-Adapter. However, our method successfully preserves the generative power of the frozen text-to-image (T2I) backbone. Notably, it excels in addressing mode collapse, a pervasive issue previously overlooked. Our novel architecture demonstrates outstanding capabilities in achieving a harmonious visual-text balance, unlocking unparalleled versatility in various human image generation tasks, including pose re-targeting, virtual try-on, stylization, person re-identification, and textile transfer.Demo and code are available from project page https://soon-yau.github.io/visconet/ .

Soon Yau Cheong, Armin Mustafa, Andrew Gilbert• 2023

Related benchmarks

Task	Dataset	Result	Rank
Virtual Try-Off	VITON-HD (test)	SSIM58.5		11

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord