Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Claim Verification on LLMAggreFact (test)
Loading...
78.1
Binary Accuracy
ThinknCheck
50.332
57.541
64.75
71.959
Apr 2, 2026
Binary Accuracy
Updated 15d ago
Evaluation Results
Method
Method
Links
Binary Accuracy
ThinknCheck
Model Size=1B, Precisi...
2026.04
78.1
MiniCheck
Model Size=7B, Precisi...
2026.04
77.4
Claude-Sonnet-3.5
Evaluation Protocol=ze...
2026.04
77.2
GPT-4o
Evaluation Protocol=ze...
2026.04
75.9
GPT-4
Evaluation Protocol=ze...
2026.04
75.3
AlignScore
Model Size=355M, Preci...
2026.04
70.4
ThinknCheck-nothink
Model Size=1B, Precisi...
2026.04
57.5
Gemma3
Model Size=1B, Precisi...
2026.04
55.7
Gemma3 + CoT
Model Size=1B, Precisi...
2026.04
51.4
Feedback
Search any
task
Search any
task