Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex Reasoning on TOMATO
Loading...
38.1
Accuracy
Qwen3-VL-8B + SynRL
8.98
16.54
24.1
31.66
Mar 18, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-VL-8B + SynRL
Parameters=8B, Trainin...
2026.03
38.1
Qwen3-VL-4B + SynRL
Parameters=4B, Trainin...
2026.03
36.7
Qwen3-VL-8B
Parameters=8B
2026.03
33.2
Qwen3-VL-4B
Parameters=4B
2026.03
32.1
VideoLLaMA3-7B
Parameters=7B
2026.03
30.1
LLaVA-OneVision-7B
Parameters=7B
2026.03
25.5
Video-Jigsaw-7B
Parameters=7B
2026.03
25.3
Video-R1-7B
Parameters=7B
2026.03
25.1
VideoLLaMA-2-7B
Parameters=7B
2026.03
10.1
Feedback
Search any
task
Search any
task