Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation
About
Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading open-source model in this fast-growing field. However, the intricacies of fine-tuning these models pose multiple challenges from new methodology integration to systematic evaluation. Addressing these issues, this paper introduces LyCORIS (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) [https://github.com/KohakuBlueleaf/LyCORIS], an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion. Furthermore, we present a thorough framework for the systematic assessment of varied fine-tuning techniques. This framework employs a diverse suite of metrics and delves into multiple facets of fine-tuning, including hyperparameter adjustments and the evaluation with different prompt types across various concept categories. Through this comprehensive approach, our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K (test) | Accuracy18.5 | 751 | |
| Commonsense Reasoning | Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test) | BoolQ Accuracy75 | 138 | |
| Image Generation | ImageNet-1k (val) | FID2.3 | 84 | |
| Natural Language Understanding | GLUE (test val) | MRPC Accuracy81.41 | 59 | |
| Multimodal Question Answering | ScienceQA | -- | 35 | |
| Mathematical Reasoning | MathQA (test) | Accuracy18.69 | 33 | |
| Mathematical Reasoning | MetaMathQA (test) | Accuracy21.76 | 26 | |
| Human pose-conditioned image generation | Single-person images derived from Stable Diffusion XL (val) | Optimizer Memory (GB)2.3 | 15 | |
| Commonsense Reasoning | Commonsense Reasoning Tasks (ARC-e, OBQA, SIQA, ARC-c, WinoG, PIQA, BoolQ, HellaS) LLaMA3-8B | ARC-e Accuracy89.2 | 13 | |
| Language Modeling | C4 (train) | PPL15.64 | 8 |