Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Claim Verification on AVerImaTeC (dev)
Loading...
92.3
Q-Eval
VILLAIN
33.124
48.487
63.85
79.213
Feb 4, 2026
Q-Eval
Evid-Eval
Veracity
Justification
Updated 1mo ago
Evaluation Results
Method
Method
Links
Q-Eval
Evid-Eval
Veracity
Justification
VILLAIN
Leaderboard Name=HUMAN...
2026.02
92.3
58.3
64.5
54.3
AIC CTU
Conditioned Threshold...
2026.02
82.2
34.7
37.5
30.4
Baseline
Conditioned Threshold...
2026.02
48.8
13.4
6.6
5.8
ADA-AGGR
Conditioned Threshold...
2026.02
35.4
38.6
45.4
37.3
Feedback
Search any
task
Search any
task