Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Discriminative Hallucination Evaluation on AMBER Discriminative
Loading...
90.3
F1 Score
Qwen2.5-VL 3B + NoisyGRPO
74.076
78.288
82.5
86.712
Oct 24, 2025
F1 Score
Updated 11d ago
Evaluation Results
Method
Method
Links
F1 Score
Qwen2.5-VL 3B + NoisyGRPO
Model=Qwen2.5-VL 3B, T...
2025.10
90.3
Qwen2.5-VL 3B
Model=Qwen2.5-VL 3B, T...
2025.10
89.6
Qwen2.5-VL 3B + GRPO
Model=Qwen2.5-VL 3B, T...
2025.10
89.2
Qwen2.5-VL 3B + SFT
Model=Qwen2.5-VL 3B, T...
2025.10
88.8
Qwen2.5-VL 7B + NoisyGRPO
Model=Qwen2.5-VL 7B, T...
2025.10
88.2
Qwen2.5-VL 7B + GRPO
Model=Qwen2.5-VL 7B, T...
2025.10
88
Qwen2.5-VL 7B + SFT
Model=Qwen2.5-VL 7B, T...
2025.10
87.5
Qwen2.5-VL 7B
Model=Qwen2.5-VL 7B, T...
2025.10
87.4
LLaVA-1.5 7B
Model=LLaVA-1.5 7B
2025.10
74.7
Feedback
Search any
task
Search any
task