Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context Multimodal Understanding on HELMET
Loading...
67.6
Accuracy
Qwen3 VL
35.776
44.038
52.3
60.562
Feb 16, 2026
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3 VL
Checkpoint=235B A22B
2026.02
67.6
Qwen3 VL Plain Distillation
Checkpoint=Short Stage
2026.02
65.7
Qwen3 VL
Checkpoint=32B
2026.02
63
LongPO
Checkpoint=Short Stage
2026.02
62.9
Mistral Plain Distillation*
2026.02
53.1
Mistral 3.1 Small
Checkpoint=24B
2026.02
37
Feedback
Search any
task
Search any
task