Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-modal Reasoning on MMStar
Loading...
63.78
Accuracy
SINKTRACK
23.3344
33.8347
44.335
54.8353
Apr 11, 2026
Accuracy
Macro-F1
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Macro-F1
SINKTRACK
Base LLM=Qwen2.5-VL-7B...
2026.04
63.78
77.86
SINKTRACK
Base LLM=Gemma3-12B-In...
2026.04
61.6
75.81
CoT
Base LLM=Gemma3-12B-In...
2026.04
60.56
75.09
CoT
Base LLM=Qwen2.5-VL-7B...
2026.04
55.84
71.29
SINKTRACK
Base LLM=Gemma3-4B-Ins...
2026.04
53.04
69.11
CoT
Base LLM=Gemma3-4B-Ins...
2026.04
51.36
67.52
SINKTRACK
Base LLM=Qwen2.5-VL-3B...
2026.04
47.24
61.38
Direct
Base LLM=Gemma3-12B-In...
2026.04
44.58
60.89
Direct
Base LLM=Qwen2.5-VL-7B...
2026.04
37.29
54.24
Direct
Base LLM=Gemma3-4B-Ins...
2026.04
35.47
52.17
CoT
Base LLM=Qwen2.5-VL-3B...
2026.04
35.24
50.68
Direct
Base LLM=Qwen2.5-VL-3B...
2026.04
24.89
39.54
Feedback
Search any
task
Search any
task