Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

In-the-wild model generalization on Human Bench Text-based Demo

23.4NSE

Qwen3VL-32B

21.6433.5245.457.28Jan 21, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
23.477.90.1
2026.01
23.875.31.9
2026.01
2569.30.1
2026.01
25.768.50
2026.01
25.959.77.7
2026.01
26.770.50.2
2026.01
29.358.711.4
2026.01
29.555.72.6
2026.01
3053.80.6
2026.01
30.94611.3
2026.01
32.449.65.8
2026.01
33.635.30.9
2026.01
52.717.223.8
2026.01
67.45.60