Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

About

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan• 2025

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE
Accuracy87.4
1455
Visual Question AnsweringGQA
Accuracy62
1249
Text-based Visual Question AnsweringTextVQA
Accuracy45.6
807
Multimodal UnderstandingMMBench
Accuracy79.2
637
Multimodal UnderstandingMM-Vet
MM-Vet Score50
531
Text-to-Image GenerationGenEval
Overall Score80
506
Visual Question AnsweringGQA
Accuracy61.3
505
Multimodal UnderstandingMMMU
Accuracy41
437
Multimodal ReasoningMM-Vet
MM-Vet Score50
431
Text-to-Image GenerationGenEval
Overall Score80
391
Showing 10 of 229 rows
...

Other info

Code

Follow for update