Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human Preference Prediction on internal benchmark
Loading...
82.2
Accuracy
Qwen-7B
63.376
68.263
73.15
78.037
Apr 30, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen-7B
Configuration=GCPO
2026.04
82.2
Seed-1.5-VL
Configuration=T+V (Thi...
2026.04
79.3
Seed-1.6-VL
Configuration=T+V (Thi...
2026.04
77.2
Qwen-7B
Configuration=T+V (Thi...
2026.04
75.4
Seed-1.5-VL
Configuration=T (Think)
2026.04
72.2
Qwen-3B
Configuration=GCPO
2026.04
72
Seed-1.6-VL
Configuration=T (Think)
2026.04
71.2
Qwen-7B
Configuration=V (Verify)
2026.04
70.9
Seed-1.6-VL
Configuration=V (Verify)
2026.04
69.4
Qwen-3B
Configuration=T+V (Thi...
2026.04
69.3
Qwen-7B
Configuration=T (Think)
2026.04
68.9
Qwen-7B (VIESCORE)
Configuration=V (Verify)
2026.04
68.3
Qwen-3B
Configuration=V (Verify)
2026.04
66.1
Qwen-3B
Configuration=T (Think)
2026.04
64.1
Feedback
Search any
task
Search any
task