Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering on VERHallu
Loading...
94.9
QA-Causal Score
Human
26.26
44.08
61.9
79.72
Jan 15, 2026
QA-Causal Score
QA-Temporal Score
QA-Subevent Score
Updated 4d ago
Evaluation Results
Method
Method
Links
QA-Causal Score
QA-Temporal Score
QA-Subevent Score
Human
2026.01
94.9
92.4
93.8
Gemini-3-Pro
2026.01
65.7
55
40
ChatGPT-4o
2026.01
46.5
36
42.5
Qwen-VL-2.5-32B
Parameters=32B
2026.01
39.6
34.4
31.4
Qwen-VL-2.5-72B
Parameters=72B
2026.01
28.9
28.7
34.8
Feedback
Search any
task
Search any
task