Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Search and Reasoning on VStar
Loading...
76.96
Score
Ours-NonMath
33.4048
44.7124
56.02
67.3276
Mar 17, 2026
Score
Updated 25d ago
Evaluation Results
Method
Method
Links
Score
Ours-NonMath
Budget=10%, Model=Qwen...
2026.03
76.96
Ours-Mixed
Budget=10%, Model=Qwen...
2026.03
76.96
Ours-Math
Budget=10%, Model=Qwen...
2026.03
74.35
Ours-Math
Budget=25%, Model=Qwen...
2026.03
69.63
Ours-Mixed
Budget=25%, Model=Qwen...
2026.03
69.63
Interlace
Budget=10%, Model=Qwen...
2026.03
68.59
Random
Budget=10%, Model=Qwen...
2026.03
64.4
Ours-NonMath
Budget=25%, Model=Qwen...
2026.03
63.87
CKA
Budget=10%, Model=Qwen...
2026.03
62.83
Random
Budget=25%, Model=Qwen...
2026.03
58.12
Ours-NonMath
Budget=40%, Model=Qwen...
2026.03
54.97
Interlace
Budget=25%, Model=Qwen...
2026.03
50.79
CKA
Budget=25%, Model=Qwen...
2026.03
49.74
Interlace
Budget=40%, Model=Qwen...
2026.03
46.6
Ours-Math
Budget=40%, Model=Qwen...
2026.03
39.79
CKA
Budget=40%, Model=Qwen...
2026.03
39.27
Ours-Mixed
Budget=40%, Model=Qwen...
2026.03
37.17
Random
Budget=40%, Model=Qwen...
2026.03
35.08
Feedback
Search any
task
Search any
task