Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Parallel Test-Time Scaling with Multi-Sequence Verifiers

About

Parallel test-time scaling, which generates multiple candidate solutions for a single problem, is a powerful technique for improving large language model performance. However, it is hindered by two key bottlenecks: accurately selecting the correct solution from the candidate pool, and the high inference latency from generating many full solutions. We argue that both challenges are fundamentally linked to verifier calibration. A well-calibrated verifier not only improves answer selection, but also enables early-stopping strategies to reduce latency. However, existing verifiers are limited as they score each candidate in isolation, overlooking rich contextual information across the set of candidates. To address this, we introduce the Multi-Sequence Verifier (MSV), the first verifier designed to jointly process all candidate solutions and model their interactions. MSV achieves improved calibration, which directly enhances best-of-N selection performance. We further introduce a streaming MSV variant that empowers a novel early-stopping framework. Our novel framework fully leverages parallel decoding, which contrasts with the existing multi-sequence early exit works that decode sequences one by one and thus incur significant latency. In this novel setting, MSV can achieve the same target accuracy with around half the latency that would be required with its counterpart that scores each solution in isolation.

Yegon Kim, Seungyoo Lee, Chaeyun Jang, Hyungi Lee, Juho Lee• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH
Accuracy37.78
882
Mathematical ReasoningAIME
AIME Accuracy66.25
288
Mathematical Problem SolvingAIME
AIME Score1.29e+3
52
Answer VerificationMATH
AUROC0.9128
43
Mathematical ReasoningOmni-MATH
ECE0.0883
28
Mathematical ReasoningMATH (test)
Latency (s)69.2
26
Mathematical ReasoningAIME
ECE1.65
23
Answer VerificationAMC12
AUROC91.05
22
Mathematical ReasoningAMC12
Expected Calibration Error (ECE)0.0563
22
Mathematical ReasoningMATH--
20
Showing 10 of 50 rows

Other info

Follow for update