Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EndoCogniAgent: Closed-Loop Agentic Reasoning with Self-Consistency Validation for Endoscopic Diagnosis

About

Endoscopic diagnosis is an iterative process in which clinicians progressively acquire, compare, and verify local visual evidence before reaching a conclusion. Current AI systems do not adequately support this process because fine-grained evidence acquisition and multi-step reasoning remain weakly coupled. This gives rise to two failure modes, hallucinated evidence and uncorrected error accumulation, that undermine diagnostic reliability. We propose EndoCogniAgent, a closed-loop agentic framework that formulates endoscopic diagnosis as a controlled state update process. At each reasoning round, a central planner selects the next evidence acquisition action, specialized expert tools extract the corresponding observation, and a self-consistency validation mechanism examines the observation along two dimensions, knowledge consistency against the input image and temporal consistency with prior validated findings, before updating the diagnostic state. Validated observations are admitted into the evolving state to condition subsequent planning, while insufficiently supported findings are retained with corrective feedback that redirects the planner toward additional verification. We further introduce EndoAgentBench, a workflow-oriented benchmark comprising 6,132 question-answer pairs from 11 endoscopic datasets, designed to evaluate diagnostic agents across a comprehensive diagnostic chain, from fine-grained visual perception to high-level diagnostic reasoning. Experiments show that EndoCogniAgent achieves 85.23\% average accuracy on perception tasks and 71.13\% clinical acceptance rate on reasoning tasks, with ablation analysis confirming that self-consistency validation and episodic state maintenance are individually critical to these gains.

Yi Tang, Kai-Ni Wang, Yang Chen, Xiaopu He, Guangquan Zhou• 2025

Related benchmarks

TaskDatasetResultRank
Fine-Grained PerceptionEndoAgentBench
Localization Classification Accuracy75.29
8
High-level Diagnostic ReasoningEndoAgentBench
CAP (CAR)64
8
Showing 2 of 2 rows

Other info

Follow for update