Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Scalable Approach to Solving Simulation-Based Network Security Games

About

We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network topologies, without significant scaling issues in terms of memory usage or training time. This contribution provide a practical, theoretically motivated path to efficient hierarchical policy learning for large-scale networked decision problems.

Michael Lanier, Yevgeniy Vorobeychik• 2026

Related benchmarks

TaskDatasetResultRank
Cyber DefenseCyGym Volt Typhoon 10 devices
Avg Player Utility per Device51.52
7
Cyber DefenseCyGym Volt Typhoon 50 devices
Avg Player Utility per Device2.97
7
Cyber DefenseCyGym Volt Typhoon 100 devices
Average Player Utility per Device150
7
Cyber DefenseCyGym Volt Typhoon 1000 devices
Avg Player Utility14
7
Cyber DefenseCyGym Volt Typhoon 10000 devices
Avg Player Utility per Device0.01
7
Showing 5 of 5 rows

Other info

Follow for update