Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Search on HR-MM Search
Loading...
25.9
Pass@1
SFT + AXPO
7.492
12.271
17.05
21.829
May 27, 2026
Pass@1
Updated 6d ago
Evaluation Results
Method
Method
Links
Pass@1
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
25.9
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
24.4
SFT
Backbone=Qwen3-VL-Thin...
2026.05
23
Base
Backbone=Qwen3-VL-Thin...
2026.05
22.8
Base
Backbone=Qwen3-VL-Thin...
2026.05
21
SFT
Backbone=Qwen3-VL-Thin...
2026.05
20.6
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
20.6
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
20.1
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
20
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
18.4
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
18.1
Base
Backbone=Qwen3-VL-Thin...
2026.05
16.9
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
15.7
SFT
Backbone=Qwen3-VL-Thin...
2026.05
14.5
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
12.7
Base
Backbone=Qwen3-VL-Thin...
2026.05
8.2
Feedback
Search any
task
Search any
task