Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-modal Hallucination Evaluation on AMBER
Loading...
76.9
Mean Accuracy
(1)
52.564
58.882
65.2
71.518
Oct 3, 2025
Mean Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Mean Accuracy
(1)
Adaptation strategy=(1)
2025.10
76.9
TTAug
Adaptation strategy=Te...
2025.10
75.9
TTAug
test-time scaling=Meth...
2025.10
75.4
(2)
Adaptation strategy=Mo...
2025.10
72.8
Method ①
test-time scaling=Othe...
2025.10
70.4
Baseline
test-time scaling=none
2025.10
68.7
Baseline
2025.10
68.7
Method ④
test-time scaling=Othe...
2025.10
67.8
Method ②
test-time scaling=Othe...
2025.10
64.5
Method ③
test-time scaling=Othe...
2025.10
53.5
Feedback
Search any
task
Search any
task