Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach
About
We study impartial games under fixed-latency, fixed-scale quantised inference (FSQI). In this fixed-scale, bounded-range regime, we prove that inference is simulable by constant-depth polynomial-size Boolean circuits (AC0). This yields a worst-case representational barrier: single-frame agents in the FSQI/AC0 regime cannot strongly master NIM, because optimal play depends on the global nim-sum (parity). Under our stylised deterministic rollout interface, a single rollout policy head from the structured family analysed here reveals only one fixed linear functional of the invariant, so increasing rollout budget alone does not recover the missing bits. We derive two structural bypasses: (1) a multi-policy-head rollout architecture that recovers the full invariant via distinct rollout channels, and (2) a multi-frame architecture that tracks local nimber differences and supports restoration. Experiments across multiple settings are consistent with these predictions: single-head baselines stay near chance, while two-frame models reach near-perfect restoration accuracy and multi-head FSM-controlled shootouts achieve perfect win/loss position classification. Overall, the empirical results support the view that explicit structural priors (history/differences or multiple rollout channels) are important in the FSQI/AC0 regime.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| P/N position classification | NIM (N=20, k=4) single-frame | Shootouts (avg)2.87 | 4 |