Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Large Language Model Evaluation on MME-RealWorld
Loading...
43.7
Reasoning
Thyme
32.988
35.769
38.55
41.331
Oct 1, 2025
Reasoning
Perception
Overall
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reasoning
Perception
Overall
Thyme
Backbone=QwenVL2.5-7B
2025.10
43.7
60.1
58.1
UG-Search
Backbone=QwenVL2.5-7B
2025.10
35.1
58.5
55.7
ViCrop
Backbone=QwenVL2.5-7B
2025.10
34.2
53.4
51.1
TextCoT
Backbone=QwenVL2.5-7B
2025.10
33.7
53.3
50.9
QwenVL2.5-7B
2025.10
33.4
51.5
49.3
Feedback
Search any
task
Search any
task