Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Perception on Visual Probe
Loading...
45.8
Pass@1
SFT + AXPO
3.368
14.384
25.4
36.416
May 27, 2026
Pass@1
Updated 6d ago
Evaluation Results
Method
Method
Links
Pass@1
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
45.8
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
43.6
Base
Backbone=Qwen3-VL-Thin...
2026.05
40.3
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
40.1
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
40.1
SFT
Backbone=Qwen3-VL-Thin...
2026.05
38.4
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
38
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
36.1
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
35.1
SFT
Backbone=Qwen3-VL-Thin...
2026.05
34.7
Base
Backbone=Qwen3-VL-Thin...
2026.05
31.8
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
29.7
Base
Backbone=Qwen3-VL-Thin...
2026.05
24.8
SFT
Backbone=Qwen3-VL-Thin...
2026.05
24.5
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
23.8
Base
Backbone=Qwen3-VL-Thin...
2026.05
5
Feedback
Search any
task
Search any
task