Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task#5 on STAR
Loading...
82.66
Score
Qwen3-VL-8B S2(KGRPO)
48.8288
57.6119
66.395
75.1781
Oct 22, 2025
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Qwen3-VL-8B S2(KGRPO)
Backbone=Qwen3-VL-8B,...
2025.10
82.66
GPT-4o
Backbone=GPT-4o
2025.10
82.38
Qwen3-VL-4B S2(KGRPO)
Backbone=Qwen3-VL-4B,...
2025.10
81.82
Qwen3-VL-2B S2(KGRPO)
Backbone=Qwen3-VL-2B,...
2025.10
80.36
KGRPO
Backbone=Qwen2.5-VL-7B...
2025.10
77.19
Qwen2.5-VL-7B S2(KGRPO)
Backbone=Qwen2.5-VL-7B...
2025.10
77.19
Qwen2.5-VL-32B S2(SimPO)
Backbone=Qwen2.5-VL-32...
2025.10
74.16
Qwen2.5-VL-3B S2(KGRPO)
Backbone=Qwen2.5-VL-3B...
2025.10
71.51
GPT-4o-mini
Backbone=GPT-4o-mini
2025.10
69.13
GPT-4v
Backbone=GPT-4v
2025.10
59.25
GPT-4o
Backbone=GPT-4o
2025.10
57.63
QVQ-72B
Backbone=QVQ-72B
2025.10
50.13
Feedback
Search any
task
Search any
task