Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Faithfulness detection on ProcessBench

83.2F1 Score

GPT-o1

15.28832.91950.5568.181May 26, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.05
83.2
2026.05
82.9
2026.05
73.8
2026.05
65.7
63.5
53.8
49.8
47.3
2026.05
23.2
2026.05
17.9