Chain of Simulation: A Dual-Mode Reasoning Framework for Large Language Models with Dynamic Problem Routing
About
We present Chain of Simulation (CoS), a novel dual-mode reasoning framework that dynamically routes problems to specialized reasoning strategies in Large Language Models (LLMs). Unlike existing uniform prompting approaches, CoS employs three distinct reasoning modes: (1) computational flow with self-consistency for mathematical problems, (2) symbolic state tracking with JSON representations for spatial reasoning, and (3) hybrid fact-extraction for multi-hop inference. Through comprehensive evaluation on GSM8K, StrategyQA, and bAbI benchmarks using four state-of-the-art models (Gemma-3 27B, LLaMA-3.1 8B, Mistral 7B, and Qwen-2.5 14B), we demonstrate that CoS achieves 71.5% accuracy on GSM8K (1.0% absolute improvement), 90.0% on StrategyQA (2.5% improvement), and 19.0% on bAbI (65.2% relative improvement) compared to the strongest baselines. The analysis reveals that problem-specific mode selection is crucial, with computational mode achieving 81.2% accuracy when correctly applied to mathematical problems, while misrouting leads to 0% accuracy. We provide detailed algorithms for mode selection, state tracking, and answer extraction, establishing CoS as an effective approach for improving LLM reasoning without additional training. The framework provides superior trade-offs between accuracy and efficiency compared to Self-Consistency, achieving comparable performance at 54% lower computational cost.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K (test) | Accuracy84 | 797 | |
| Reasoning | StrategyQA (test) | Factuality Acc96 | 28 | |
| Spatial Reasoning | bAbI (test) | Accuracy22 | 20 | |
| Grade school mathematics reasoning | GSM8K sampled 50 (test) | Accuracy71.5 | 5 | |
| Multi-hop commonsense reasoning | StrategyQA sampled 50 (val) | Accuracy90 | 5 | |
| Spatial Reasoning | bAbI Task 1 qa1 50 samples | Accuracy19 | 5 |