Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Majority Voting: Efficient Best-Of-N with Radial Consensus Score

About

Large language models (LLMs) frequently generate multiple candidate responses for a given prompt, yet selecting the most reliable one remains challenging, especially when correctness diverges from surface-level majority agreement. Existing approaches, such as self-consistency, rely on discrete voting, while probability-based methods often fail to capture relationships among candidate answers or tend to underweight high-quality but less frequent responses, and do not fully leverage the geometric structure of answer representations. To address these limitations, we introduce Radial Consensus Score (RCS), a simple, efficient, and training-free method for best-of-N selection. RCS models semantic consensus by computing a weighted Fr\'echet mean (semantic center) of answer embeddings and ranking candidates by their radial distance to this center. Importantly, RCS provides a general framework that supports multiple weighting schemes, including uniform, frequency-based, and probability-based variants, enabling flexible integration of agreement signals and model confidence while remaining fully applicable in black-box settings. Extensive experiments across seven benchmarks covering short-form QA and long-form reasoning tasks, and five open-weight models, demonstrate that RCS variants consistently outperform strong baselines, with gains becoming more pronounced as the sampling budget increases. RCS also serves as an effective drop-in replacement for majority voting in multi-agent debate and exhibits strong robustness in black-box scenarios. Overall, these results highlight geometric consensus as a scalable and broadly applicable principle for reliable answer selection, extending beyond majority voting to more expressive and robust aggregation in LLM inference.

Manh Nguyen, Sunil Gupta, Hung Le• 2026

Related benchmarks

TaskDatasetResultRank
Arithmetic ReasoningArithmetics
Accuracy98.3
106
Logical reasoningFormal Logic
Accuracy52.9
106
Math Word Problem SolvingGSM8K
Accuracy84.1
87
Science Question AnsweringSciQ
Accuracy (SciQ)74.2
52
Showing 4 of 4 rows

Other info

Follow for update