Share your thoughts, 1 month free Claude Pro on usSee more

Faithfulness Evaluation on ArXiv (test)

53.58SummaC

o3

Updated 4mo ago

Evaluation Results

Method	Links
o3 2025.12		53.58	85.22
GPT-5 2025.12		45.9	85.55
o1 2025.12		44.24	27.61
Vanilla 2025.12		43.36	26.07
Extract-to-Abstract (E2A) 2025.12		42.96	8.68
Cited Summarization (Cite) 2025.12		41.79	5.99
Self-Consistency (SC) 2025.12		40.68	8
Chain-of-Thought (COT) 2025.12		40.54	7.46
Decomposition (Deco) 2025.12		40.37	17.88
Question-Answer Guided (QAG) 2025.12		40.37	26.94
Iterative Refine (IR) 2025.12		39.87	5.65
Plan-then-Write (Plan) 2025.12		39.09	9.52