Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
CoT faithfulness detection on Logic-QA
Loading...
69
Accuracy
CIE-SCORER
31.56
41.28
51
60.72
May 25, 2026
Accuracy
F1 Score
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
F1 Score
CIE-SCORER
Paradigm=Circuit-based
2026.05
69
60.8
Adding Mistakes
Paradigm=Counterfactua...
2026.05
57.5
47.9
BiGGen
Paradigm=LLM-as-judge
2026.05
52.9
59.4
Perplexity
Paradigm=Baselines
2026.05
52.3
19.2
Removing Steps
Paradigm=Counterfactua...
2026.05
51.7
27.6
Information Gain
Paradigm=Logits-based
2026.05
51.7
51.2
Option Shuffling
Paradigm=Counterfactua...
2026.05
48.3
52.6
Paraphrasing
Paradigm=Counterfactua...
2026.05
42.5
47.9
Random
Paradigm=Baselines
2026.05
42
35.4
Early Answering
Paradigm=Counterfactua...
2026.05
35.2
48.6
Answer Tracing
Paradigm=Logits-based
2026.05
33
45.9
Feedback
Search any
task
Search any
task