Share your thoughts, 1 month free Claude Pro on usSee more

Multi-modal Evaluation on MME-RW

31.9Mean Accuracy

TTAug

Updated 2mo ago

Evaluation Results

Method	Links
TTAug 2025.10		31.9	-	-	-
(2) 2025.10		31.4	-	-	-
TTAug 2025.10		31.1	-	-	-
(1) 2025.10		30.9	-	-	-
Baseline 2025.10		27.8	-	-	-
Baseline 2025.10		27.8	-	-	-
Method ③ 2025.10		27.6	-	-	-
Method ④ 2025.10		27.6	-	-	-
Method ② 2025.10		26.4	-	-	-
Method ① 2025.10		26.2	-	-	-
VL-Rethinker 2026.02		-	47.21	-	-
PixelReasoner 2026.02		-	49.7	-	-
DeepEyes 2026.02		-	49.5	-	-
Adaptive-CoF 2026.02		-	50.9	-	-
MIRROR 2026.02		-	51.49	-	-
GPT-4o 2025.09		-	45.2	46.4	42.3
Qwen2.5-VL 7B 2025.09		-	61.4	64.3	40.1
Vicrop 2025.09		-	62.3	65.1	42
HiDe 2025.09		-	63.8	66.7	42.9