VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection

About

A standard technique for scaling inference-time reasoning is Self-Consistency, whereby multiple candidate answers are sampled from an LLM and the most common answer is selected. More recently, it has been shown that weighted majority voting (e.g. Confidence-Informed Self Consistency (CISC)), which assigns a confidence value to each candidate answer and chooses the answer with the largest accumulated score, tends to be more accurate on a wide range of popular benchmarks. In practice, weighted majority voting necessitates calling a critic LLM on each candidate's reasoning trace to produce the answer's confidence score. This secondary series of LLM calls greatly increases the overhead and cost of weighted majority voting, despite its potential performance benefits. To reduce this expense, we propose VecCISC, a lightweight, adaptive framework that uses a measure of semantic similarity to filter reasoning traces that are semantically equivalent to others, degenerate, or hallucinated, thus decreasing the number of candidate answers that must be evaluated by the critic. To ensure adequate experimental thoroughness, we evaluate VecCISC on five challenging, widely-adopted datasets spanning the domains of mathematics, chemistry, biology, commonsense reasoning, and the humanities. Our results demonstrate that VecCISC reduces the total token usage by 47%, while maintaining or exceeding the accuracy of CISC.

James Petullo, Sonny George, Dylan Cashman, Nianwen Xue• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AQUA-RAT	Accuracy87.7	183
Multi-task Language Understanding	MMLU-Pro	Best Accuracy71.4	25
Expert-Level Question Answering	GPQA	Best Accuracy61.7	25
Science Question Answering	ARC Challenging	Best Accuracy96.3	25
Commonsense Reasoning	CommonsenseQA	LLMcritic Calls15.54	10
Expert-Level Question Answering	GPQA	LLMcritic Calls17.27	5
Expert-level Science Reasoning	GPQA	LLMcritic Calls18.81	5
Massive Multitask Language Understanding	MMLU-Pro	LLMcritic Calls17.47	5
Question Answering	ARC Challenging	LLMcritic Calls15.65	5
Science Question Answering	ARC Challenging	LLMcritic Calls15.65	5

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord