Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Small Gain Analysis of Single Timescale Actor Critic

About

We consider a version of actor-critic which uses proportional step-sizes and only one critic update with a single sample from the stationary distribution per actor step. We provide an analysis of this method using the small-gain theorem. Specifically, we prove that this method can be used to find a stationary point, and that the resulting sample complexity improves the state of the art for actor-critic methods to $O \left(\mu^{-2} \epsilon^{-2} \right)$ to find an $\epsilon$-approximate stationary point where $\mu$ is the condition number associated with the critic.

Alex Olshevsky, Bahman Gharesifard• 2022

Related benchmarks

TaskDatasetResultRank
Single-loop Actor-Critic OptimizationInfinite-horizon discounted MDP
Complexity Bound2
7
Unregularized Reinforcement LearningTabular MDP Finite State Action Spaces
Sample Complexity1
3
Showing 2 of 2 rows

Other info

Follow for update