Spanning the Visual Analogy Space with a Weight Basis of LoRAs

About

Visual analogy learning enables image editing via demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet $\{\mathbf{a}$, $\mathbf{a}'$, $\mathbf{b}\}$, the goal is to generate $\mathbf{b}'$ such that $\mathbf{a} : \mathbf{a}' :: \mathbf{b} : \mathbf{b}'$. Recent methods adapt text-to-image models with a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed module constrains generalization. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, which specializes the model for each analogy task in a single inference pass. LoRWeB dynamically composes learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRAs to span the space of different visual transformations, and (2) a lightweight encoder that dynamically weighs these basis LoRAs given the input analogy pair. Comprehensive evaluations demonstrate state-of-the-art performance and significantly improved generalization to unseen transformations. Our findings suggest LoRA basis decompositions are a promising direction for flexible visual manipulation tasks. See https://research.nvidia.com/labs/par/lorweb for code.

Hila Manor, Rinon Gal, Haggai Maron, Tomer Michaeli, Gal Chechik• 2026

Related benchmarks

Task	Dataset	Result
Exemplar-based Image Editing	Relation-Adapter unseen (val)	CLIP-I0.945	10
Exemplar-based Image Editing	Relation seen tasks	CLIP-I0.898	4
Exemplar-based Image Editing	Human Preference Evaluation	Preference Score (Baseline)10.42	4

Showing 3 of 3 rows

Other info

GitHub

Follow for update

@wizwand_team Discord