Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Vision-Language Hallucination Evaluation on HallBench
Loading...
64.2
Accuracy
Octopus-8B (Ours)
54.112
56.731
59.35
61.969
Feb 9, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Octopus-8B (Ours)
Rollout count (rollout...
2026.02
64.2
Qwen3-VL-8B-Instruct + DAPO
Rollout count (rollout...
2026.02
63.7
MiMo-VL-7B-RL
Learning Protocol=Open...
2026.02
63.5
Qwen3-VL-8B-Instruct + GRPO
Rollout count (rollout...
2026.02
62.8
Qwen3-VL-8B-Thinking
Learning Protocol=Open...
2026.02
62.7
Qwen3-VL-8B-Instruct + GSPO
Rollout count (rollout...
2026.02
62.5
Qwen3-VL-8B-Instruct + GSPO
Rollout count (rollout...
2026.02
62.3
MiMo-VL-7B-SFT
Learning Protocol=Open...
2026.02
62.1
Qwen3-VL-8B-Instruct + GRPO
Rollout count (rollout...
2026.02
61.6
Qwen3-VL-8B-Instruct + SRPO
Rollout count (rollout...
2026.02
61.2
Qwen3-VL-8B-Instruct + SRPO
Rollout count (rollout...
2026.02
60.8
Qwen3-VL-8B-Instruct
Learning Protocol=Base...
2026.02
58.8
GPT-4o
Learning Protocol=Clos...
2026.02
56.2
Claude-3.7-Sonnet
Learning Protocol=Clos...
2026.02
55.4
InternVL3.5-8B-RL
Learning Protocol=Open...
2026.02
54.5
Feedback
Search any
task
Search any
task