Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Search on MM Search (Pass@1)
Loading...
46.1
Pass@1
Base
22.908
28.929
34.95
40.971
May 27, 2026
Pass@1
Updated 6d ago
Evaluation Results
Method
Method
Links
Pass@1
Base
Backbone=Qwen3-VL-Thin...
2026.05
46.1
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
45.1
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
45
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
44
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
43.3
Base
Backbone=Qwen3-VL-Thin...
2026.05
42.7
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
42.3
SFT
Backbone=Qwen3-VL-Thin...
2026.05
41.5
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
41.4
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
40.6
SFT
Backbone=Qwen3-VL-Thin...
2026.05
40.6
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
40.3
Base
Backbone=Qwen3-VL-Thin...
2026.05
37.8
SFT
Backbone=Qwen3-VL-Thin...
2026.05
35.9
Base
Backbone=Qwen3-VL-Thin...
2026.05
27.7
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
23.8
Feedback
Search any
task
Search any
task