Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-modal Reasoning on M3CoT
Loading...
66.94
Accuracy
SINKTRACK
18.84
31.3275
43.815
56.3025
Apr 11, 2026
Accuracy
Macro-F1
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Macro-F1
SINKTRACK
Base LLM=Qwen2.5-VL-7B...
2026.04
66.94
79.13
SINKTRACK
Base LLM=Gemma3-12B-In...
2026.04
60.57
70.97
CoT
Base LLM=Gemma3-12B-In...
2026.04
60.32
72.24
SINKTRACK
Base LLM=Gemma3-4B-Ins...
2026.04
51.01
59.18
CoT
Base LLM=Gemma3-4B-Ins...
2026.04
50.96
59.71
SINKTRACK
Base LLM=Qwen2.5-VL-3B...
2026.04
48.25
58.73
CoT
Base LLM=Qwen2.5-VL-7B...
2026.04
44.11
57.96
Direct
Base LLM=Gemma3-12B-In...
2026.04
41.9
51.46
Direct
Base LLM=Qwen2.5-VL-7B...
2026.04
39.2
51.01
Direct
Base LLM=Qwen2.5-VL-3B...
2026.04
33.43
44.2
Direct
Base LLM=Gemma3-4B-Ins...
2026.04
27.71
33.81
CoT
Base LLM=Qwen2.5-VL-3B...
2026.04
20.69
32.04
Feedback
Search any
task
Search any
task