Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Overstatement detection on claim-evidence sets

0.493CCC

GPT-5-mini (high)

0.07180.181150.29050.39985Jan 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
0.4930.2040.587
2026.01
0.4930.1540.509
2026.01
0.4780.2090.571
2026.01
0.4630.2010.544
2026.01
0.4560.1870.532
2026.01
0.3850.240.49
2026.01
0.3580.1690.447
2026.01
0.3560.1950.392
2026.01
0.3470.1610.36
2026.01
0.3230.2370.428
2026.01
0.1330.2950.257
2026.01
0.1060.3260.158
2026.01
0.0880.2410.116