Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hallucination Detection on ChatProtect SC
Loading...
87
F1 Score
HalluClean
2.448
24.399
46.35
68.301
Nov 12, 2025
F1 Score
Accuracy
Updated 27d ago
Evaluation Results
Method
Method
Links
F1 Score
Accuracy
HalluClean
Backbone=GPT-3.5-turbo
2025.11
87
87
GPT-4o-mini
Strategy=Direct Ask
2025.11
84.2
72.7
ChatProtect
Backbone=GPT-3.5-turbo
2025.11
83.8
84.7
HalluClean
Backbone=Llama-3-70B
2025.11
80.8
83.3
HalluClean
Backbone=DeepSeek-V3
2025.11
76.1
80.3
Step-by-Step
Backbone=GPT-3.5-turbo
2025.11
68.1
75
Plan-and-Solve
Backbone=GPT-3.5-turbo
2025.11
66.4
73
Llama-3-70B
Strategy=Direct Ask
2025.11
65.8
73.6
DeepSeek-V3
Strategy=Direct Ask
2025.11
52
67.3
GPT-3.5-turbo
Strategy=Direct Ask
2025.11
46
64
DeepSeek-R1
Strategy=Direct Ask
2025.11
40
62
SelfCheckGPT
Backbone=GPT-3.5-turbo
2025.11
5.7
12
Feedback
Search any
task
Search any
task