Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task#2 on STAR
Loading...
94.88
Accuracy
KGRPO
17.14
37.3225
57.505
77.6875
Oct 22, 2025
Accuracy
CoT
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
CoT
KGRPO
Backbone=Qwen2.5-VL-7B...
2025.10
94.88
97.75
GPT-4o-mini
Backbone=GPT-4o-mini
2025.10
72.25
-
S1(Single)
Backbone=Qwen2.5-VL-3B
2025.10
56.63
74.75
GPT-4o
Backbone=GPT-4o
2025.10
56.33
-
GPT-4v
Backbone=GPT-4v
2025.10
41.25
-
Zero-shot
Backbone=Qwen2.5-VL-3B
2025.10
20.13
-
Feedback
Search any
task
Search any
task