Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Hallucination Evaluation on HallB
Loading...
54.2
Score
SAIL
15.616
25.633
35.65
45.667
Oct 16, 2025
Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Score
SAIL
LLM=Mistral-7B, # Data...
2025.10
54.2
Qwen2.5-VL
LLM=Qwen2.5-7B, # Data...
2025.10
52.9
Encoder-Based
LLM=Qwen3-8B, # Data=>...
2025.10
51.4
Qwen2-VL
LLM=Qwen2-7B, # Data=-...
2025.10
50.6
InternVL2.5
LLM=InternLM2.5-7B, #...
2025.10
50.1
InternVL3
LLM=Qwen2.5-7B, # Data...
2025.10
49.9
NEO
LLM=Qwen3-8B, # Data=3...
2025.10
46.4
Qwen2.5-VL
LLM=Qwen2.5-3B, # Data...
2025.10
46.3
Encoder-Based
LLM=Qwen3-1.7B, # Data...
2025.10
44.4
NEO
LLM=Qwen3-1.7B, # Data...
2025.10
43.1
InternVL2.5
LLM=InternLM2.5-1.8B,...
2025.10
42.6
InternVL3
LLM=Qwen2.5-1.5B, # Da...
2025.10
42.5
Qwen2-VL
LLM=Qwen2-1.5B, # Data...
2025.10
41.7
HoVLE
LLM=InternLM2-1.8B, #...
2025.10
38.4
BREEN
LLM=Qwen2.5-7B, # Data...
2025.10
37
Mono-InternVL
LLM=InternLM2-1.8B, #...
2025.10
34.8
Mono-InternVL-1.5
LLM=InternLM2-1.8B, #...
2025.10
32.5
EVE
LLM=Vicuna-7B, # Data=...
2025.10
26.4
Chameleon
LLM=from scratch, # Da...
2025.10
17.1
Feedback
Search any
task
Search any
task