Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

About

Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reasoning accuracy.

Hongbeen Kim, Juhyun Lee, Sanghyeon Lee, Kwanghoon Choi, Jaehyuk Huh• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500
Accuracy87.6
50
Mathematical ReasoningAMC 23
Accuracy (%)65
8
Showing 2 of 2 rows

Other info

Follow for update