Share your thoughts, 1 month free Claude Pro on usSee more

Overstatement detection on claim-evidence sets

0.493CCC

GPT-5-mini (high)

Updated 4mo ago

Evaluation Results

Method	Links
GPT-5-mini (high) 2026.01		0.493	0.204	0.587
Ovis2-34B 2026.01		0.493	0.154	0.509
GPT-5-mini (low) 2026.01		0.478	0.209	0.571
Deepseek-R1 2026.01		0.463	0.201	0.544
Qwen3-VL-32B 2026.01		0.456	0.187	0.532
GLM-4.6 2026.01		0.385	0.24	0.49
GLM-4.5V 2026.01		0.358	0.169	0.447
Deepseek-V3.2 2026.01		0.356	0.195	0.392
InternVL3.5-38B 2026.01		0.347	0.161	0.36
Qwen3-VL-8B 2026.01		0.323	0.237	0.428
InternVL3.5-30B-A3B 2026.01		0.133	0.295	0.257
InternVL3.5-8B 2026.01		0.106	0.326	0.158
LLaVA-OV-1.5-8B 2026.01		0.088	0.241	0.116