Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Faithfulness detection on In-domain Step-level Benchmark Knowledge
Loading...
83.4
FF1
GeoFaith
53.032
60.916
68.8
76.684
May 26, 2026
FF1
UF1
Updated 7d ago
Evaluation Results
Method
Method
Links
FF1
UF1
GeoFaith
2026.05
83.4
70.3
GPT-o1
2026.05
81.2
72.3
DeepSeek-V3
2026.05
79.3
62.5
Qwen2.5-32B-Instruct
2026.05
77
63.7
GPT-4o
2026.05
76.1
59.8
Llama-3.1-70B-Instruct
2026.05
75.8
52.2
o3-mini
2026.05
74.8
57.6
LogicReward
2026.05
73.5
53.2
FaithLens
2026.05
69.7
42.3
HHEM2.1
2026.05
54.2
40.1
Feedback
Search any
task
Search any
task