Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hallucination Detection on POPE
Loading...
85.47
Accuracy
SINKTRACK
51.9404
60.6452
69.35
78.0548
Apr 11, 2026
Accuracy
Macro-F1
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Macro-F1
SINKTRACK
Base LLM=Qwen2.5-VL-7B...
2026.04
85.47
91.67
SINKTRACK
Base LLM=Gemma3-12B-In...
2026.04
85.4
92.13
Direct
Base LLM=Gemma3-12B-In...
2026.04
84.59
91.65
CoT
Base LLM=Gemma3-12B-In...
2026.04
84.53
91.61
SINKTRACK
Base LLM=Gemma3-4B-Ins...
2026.04
84.33
91.5
Direct
Base LLM=Gemma3-4B-Ins...
2026.04
83.98
91.3
CoT
Base LLM=Qwen2.5-VL-7B...
2026.04
83.65
90.63
CoT
Base LLM=Gemma3-4B-Ins...
2026.04
82.01
90.04
Direct
Base LLM=Qwen2.5-VL-7B...
2026.04
78.21
87.15
SINKTRACK
Base LLM=Qwen2.5-VL-3B...
2026.04
77.69
87.44
CoT
Base LLM=Qwen2.5-VL-3B...
2026.04
64.58
78.1
Direct
Base LLM=Qwen2.5-VL-3B...
2026.04
53.23
69.48
Feedback
Search any
task
Search any
task