Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
CoT faithfulness detection on AQuA
Loading...
77
Accuracy (CoT Faithfulness)
CIE-SCORER
40.184
49.742
59.3
68.858
May 25, 2026
Accuracy (CoT Faithfulness)
F1 Score (CoT Faithfulness)
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy (CoT Faithfulness)
F1 Score (CoT Faithfulness)
CIE-SCORER
Paradigm=Circuit-based
2026.05
77
72.8
BiGGen
Paradigm=LLM-as-judge
2026.05
73.7
70.3
Removing Steps
Paradigm=Counterfactua...
2026.05
72.4
46.2
Early Answering
Paradigm=Counterfactua...
2026.05
72.4
53.3
Adding Mistakes
Paradigm=Counterfactua...
2026.05
71.6
66.7
Paraphrasing
Paradigm=Counterfactua...
2026.05
68.4
42.9
CRV
Paradigm=Circuit-based
2026.05
63
62.8
Option Shuffling
Paradigm=Counterfactua...
2026.05
61
16.7
Answer Tracing
Paradigm=Logits-based
2026.05
53.2
30.8
Perplexity
Paradigm=Baselines
2026.05
49.4
36.1
Random
Paradigm=Baselines
2026.05
43.2
37.4
Information Gain
Paradigm=Logits-based
2026.05
41.6
20.2
Feedback
Search any
task
Search any
task