Share your thoughts, 1 month free Claude Pro on usSee more

Reinforcement Learning on InvertedPendulum v2

1,000Mean Reward

TTOpt

Updated 3mo ago

Evaluation Results

Method	Links
TTOpt 2022.04		1,000	0
TTOpt 2022.04		1,000	0
Delay-free SAC 2026.04		964.3	-
D2HPG 2026.04		949.8	-
BPQL 2026.04		945.5	-
D2HPG 2026.04		937.3	-
Augmented SAC 2026.04		935.7	-
Augmented SAC 2026.04		932.2	-
Delayed SAC 2026.04		925.5	-
BPQL 2026.04		919.7	-
GA 2022.04		893	283.1
VDPO 2026.04		764.5	-
CMAES 2022.04		721	335.37
Delayed SAC 2026.04		714.2	-
OPENES 2022.04		651.86	436.37
D2HPG 2026.04		637.3	-
CMAES 2022.04		621	472.81
VDPO 2026.04		597.3	-
BPQL 2026.04		568.1	-
Augmented SAC 2026.04		340.7	-
OPENES 2022.04		224.71	217.51
GA 2022.04		222.86	342.79
Delayed SAC 2026.04		67.7	-
Naive SAC 2026.04		32.1	-
Naive SAC 2026.04		24.3	-
Naive SAC 2026.04		20.7	-
VDPO 2026.04		19.4	-