Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Fact Verification on PubHealthTab OOD (test)
Loading...
77.14
Accuracy
Qwen3-VL-8B-DISCO
42.8616
51.7608
60.66
69.5592
Feb 3, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-VL-8B-DISCO
Reasoning Strategy=Tab...
2026.02
77.14
MiniCPM-V-2.6 8B
Reasoning Strategy=HIP...
2026.02
73.32
Qwen3-VL-8B-DISCO
Reasoning Strategy=DA,...
2026.02
63.34
Gemma3n-E4B-DISCO
Reasoning Strategy=Tab...
2026.02
62.41
Gemma3n-E4B
Reasoning Strategy=DA,...
2026.02
56.08
Gemma3n-E4B-DISCO
Reasoning Strategy=DA,...
2026.02
52.47
Table-LLaVA 7B
Reasoning Strategy=SFT...
2026.02
51.03
GPT-4o-mini
Reasoning Strategy=DA,...
2026.02
48.61
Table-LLaVA 13B
Reasoning Strategy=SFT...
2026.02
48.46
Qwen3-VL-8B
Reasoning Strategy=DA,...
2026.02
44.18
Feedback
Search any
task
Search any
task