Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LongCat-Image Technical Report

About

We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models. 1) We achieve this through rigorous data curation strategies across the pre-training, mid-training, and SFT stages, complemented by the coordinated use of curated reward models during the RL phase. This strategy establishes the model as a new state-of-the-art (SOTA), delivering superior text-rendering capabilities and remarkable photorealism, and significantly enhancing aesthetic quality. 2) Notably, it sets a new industry standard for Chinese character rendering. By supporting even complex and rare characters, it outperforms both major open-source and commercial solutions in coverage, while also achieving superior accuracy. 3) The model achieves remarkable efficiency through its compact design. With a core diffusion model of only 6B parameters, it is significantly smaller than the nearly 20B or larger Mixture-of-Experts (MoE) architectures common in the field. This ensures minimal VRAM usage and rapid inference, significantly reducing deployment costs. Beyond generation, LongCat-Image also excels in image editing, achieving SOTA results on standard benchmarks with superior editing consistency compared to other open-source works. 4) To fully empower the community, we have established the most comprehensive open-source ecosystem to date. We are releasing not only multiple model versions for text-to-image and image editing, including checkpoints after mid-training and post-training stages, but also the entire toolchain of training procedure. We believe that the openness of LongCat-Image will provide robust support for developers and researchers, pushing the frontiers of visual content creation.

Meituan LongCat Team: Hanghang Ma, Haoxian Tan, Jiale Huang, Junqiang Wu, Jun-Yan He, Lishuai Gao, Songlin Xiao, Xiaoming Wei, Xiaoqi Ma, Xunliang Cai, Yayong Guan, Jie Hu• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
GenEval Score87
277
Image EditingImgEdit-Bench
Overall Score4.5
132
Text-to-Image GenerationDPG
Overall Score86.8
131
Image EditingGEdit-Bench English
G_O (Overall Quality)7.748
73
Text-to-Image GenerationGenEval
Overall Score87
68
Knowledge-grounded reasoningWISE
Overall Score65
45
Instruction-based Image EditingImgEdit Bench 1.0 (test)
Add Score4.44
37
Reasoning-based text-to-image generationWISE
Overall Score65
33
Text-to-Image GenerationDPGBench
DPGBench Score86.8
31
Text RenderingCVTG-2K
NED93.61
28
Showing 10 of 25 rows

Other info

GitHub

Follow for update