Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Perception on HRBen 8K
Loading...
78.9
Pass@1
Base
54.98
61.19
67.4
73.61
May 27, 2026
Pass@1
Updated 6d ago
Evaluation Results
Method
Method
Links
Pass@1
Base
Backbone=Qwen3-VL-Thin...
2026.05
78.9
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
78.3
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
77
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
74.9
SFT
Backbone=Qwen3-VL-Thin...
2026.05
74.4
SFT
Backbone=Qwen3-VL-Thin...
2026.05
74.1
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
73.9
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
73.8
SFT + AXPO
Backbone=Qwen3-VL-Thin...
2026.05
72.4
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
71.1
SFT + GRPO
Backbone=Qwen3-VL-Thin...
2026.05
70.6
SFT
Backbone=Qwen3-VL-Thin...
2026.05
68.8
Base
Backbone=Qwen3-VL-Thin...
2026.05
67.5
Base
Backbone=Qwen3-VL-Thin...
2026.05
66.1
GRPO
Backbone=Qwen3-VL-Thin...
2026.05
59.4
Base
Backbone=Qwen3-VL-Thin...
2026.05
55.9
Feedback
Search any
task
Search any
task