Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning
About
Autonomous agents often require multiple strategies to solve complex tasks, but determining when to switch between strategies remains challenging. This research introduces a reinforcement learning technique to learn switching thresholds between two orthogonal navigation policies. Using maze navigation as a case study, this work demonstrates how an agent can dynamically transition between systematic exploration (coverage) and goal-directed pathfinding (convergence) to improve task performance. Unlike fixed-threshold approaches, the agent uses Q-learning to adapt switching behavior based on coverage percentage and distance to goal, requiring only minimal domain knowledge: maze dimensions and target location. The agent does not require prior knowledge of wall positions, optimal threshold values, or hand-crafted heuristics; instead, it discovers effective switching strategies dynamically during each run. The agent discretizes its state space into coverage and distance buckets, then adapts which coverage threshold (20-60\%) to apply based on observed progress signals. Experiments across 240 test configurations (4 maze sizes from 16$\times$16 to 128$\times$128 $\times$ 10 unique mazes $\times$ 6 agent variants) demonstrate that adaptive threshold learning outperforms both single-strategy agents and fixed 40\% threshold baselines. Results show 23-55\% improvements in completion time, 83\% reduction in runtime variance, and 71\% improvement in worst-case scenarios. The learned switching behavior generalizes within each size class to unseen wall configurations. Performance gains scale with problem complexity: 23\% improvement for 16$\times$16 mazes, 34\% for 32$\times$32, and 55\% for 64$\times$64, demonstrating that as the space of possible maze structures grows, the value of adaptive policy selection over fixed heuristics increases proportionally.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Maze Navigation | 64x64 Mazes | Mean Time160.6 | 6 | |
| Maze Navigation | 16x16 Small Mazes pseudo-randomly generated (test) | Mean Completion Time11 | 6 | |
| Maze Navigation | 32x32 Small Mazes (test) | Mean Completion Time39.6 | 6 | |
| Maze Navigation | 128x128 Mazes | Best Time (s)322 | 6 |