Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Probabilistic Verification of Recurrent Neural Networks for Single and Multi-Agent Reinforcement Learning

About

History-dependent policies induced by recurrent neural networks (RNNs) rely on latent hidden state dynamics, making verification in partially observable reinforcement learning (RL) challenging. Existing RNN verification tools typically rely on restrictive modeling assumptions or coarse over-approximations of the hidden state space, which can lead to overly conservative or inconclusive results. We propose $\textbf{RNN}$ $\textbf{Pro}$babilistic $\textbf{Ve}$rification ($\texttt{RNN-ProVe}$), a probabilistic framework that $\textit{estimates the likelihood}$ of undesired behaviors in RNN-based policies. $\texttt{RNN-ProVe}$ uses policy-driven sampling to approximate the set of hidden states that are feasible under a trained policy, and derives statistical error bounds to produce bounded-error, high-confidence estimates of behavioral violations. Experiments on partially observable single-agent and cooperative multi-agent tasks show that $\texttt{RNN-ProVe}$ yields more quantitative, feasibility-aware probabilistic guarantees than existing tools, while scaling to recurrent and multi-agent settings.

Luca Marzari, Enrico Marchesini• 2026

Related benchmarks

TaskDatasetResultRank
RNN-based navigation policy verificationNavigation 4x4 environment
Avg. Violation Rate1.42
5
RNN-based navigation policy verificationNavigation (Nav) 8x8 environment
Average Violation Rate13.04
4
RNN-based navigation policy verificationNavigation (Nav) 16x16 environment
Average Violation Rate0.64
3
RNN-based cooperative multi-agent verificationBoxPushing (BP) 10x10 environment
Average Violation Rate1.15
2
RNN-based cooperative multi-agent verificationBoxPushing (BP) 20x20 environment
Avg Violation Rate1.51
2
Showing 5 of 5 rows

Other info

Follow for update