Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
CoT faithfulness detection on HLE Bio
Loading...
78
Accuracy
CIE-SCORER
34.008
45.429
56.85
68.271
May 25, 2026
Accuracy
F1 Score
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
CIE-SCORER
Paradigm=Circuit-based
2026.05
78
79.7
BiGGen
Paradigm=LLM-as-judge
2026.05
70.2
69.2
Answer Tracing
Paradigm=Logits-based
2026.05
64.3
76.2
Information Gain
Paradigm=Logits-based
2026.05
52.5
9.5
Perplexity
Paradigm=Baselines
2026.05
47.4
52.4
Adding Mistakes
Paradigm=Counterfactua...
2026.05
46.4
51.6
Early Answering
Paradigm=Counterfactua...
2026.05
46.4
48.3
Paraphrasing
Paradigm=Counterfactua...
2026.05
45.7
40
Option Shuffling
Paradigm=Counterfactua...
2026.05
40
14.3
Removing Steps
Paradigm=Counterfactua...
2026.05
39.3
37
Random
Paradigm=Baselines
2026.05
35.7
43.8
Feedback
Search any
task
Search any
task