Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Chinese Multi-modal Multi-task Understanding on CMMMU
Loading...
42.5
Accuracy
GPT-4V
23.052
28.101
33.15
38.199
Mar 8, 2024
Jul 1, 2024
Oct 24, 2024
Feb 17, 2025
Jun 12, 2025
Oct 5, 2025
Jan 29, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-4V
LLM Size=Unk
2024.03
42.5
InternVL3.5
Parameter Scale=8B
2026.01
40
Ovis2.5
Parameter Scale=9B
2026.01
40
Qwen-VL-Plus
LLM Size=Unk
2024.03
39.5
DeepSeek-VL
LLM Size=7B
2024.03
37.9
GLM-4.6V-FlashX
2026.01
36.3
Yi-VL
LLM Size=6B
2024.03
35.8
Ostrakon-VL
Parameter Scale=8B
2026.01
33.2
Qwen3-VL
Parameter Scale=8B
2026.01
33.1
Qwen2.5-VL
Parameter Scale=72B
2026.01
33
DeepSeek-VL
LLM=1.3B
2024.03
27.4
CogVLM
LLM Size=7B
2024.03
24.8
EMU2-Chat
LLM Size=7B
2024.03
23.8
Feedback
Search any
task
Search any
task