Share your thoughts, 1 month free Claude Pro on usSee more

Reinforcement Learning on LunarLander v2

2,292Final Return

Advantage-weighting

Updated 2mo ago

Evaluation Results

Method	Links
Advantage-weighting 2020.12		2,292	518,153
PPO 2026.05		283.11	-
Deep Q Network 2020.12		278.23	518,153
Oblique DT 2020.12		272.14	118.9
Deep Q Network 2020.12		266	121,000,000
Oblique DT 2020.12		262.18	86.9
DiPRL 2026.05		260.21	-
Shallow NN 2020.12		258.8	77.6
DTSemNets 2026.05		257.61	-
π-PRL 2026.05		257.21	-
Actor Critic 2020.12		254.58	4,337.3
Value-difference 2020.12		248.221	632,620.2
πdisc.-PRL 2026.05		239.9	-
πcont.-PRL 2026.05		235.72	-
AWR 2019.10		229	-
Deep Q Network 2020.12		225.79	1,295,307.1
Soft Actor Critic 2020.12		217.92	210,733.2
Soft Q Network 2020.12		217.09	647,691.1
Proximal Policy Opt. 2020.12		201.47	1,673
Deep Q Network 2020.12		201.46	30,878.1
Deep Q Network 2020.12		201.46	30,878.1
Deep Q Network 2020.12		200.65	259,285.8
Deep Q Network 2020.12		200.3	237,079.7
Dueling Deep Q N. 2020.12		200.22	30,878.1
RWR 2019.10		185	-
VIPER (PPO) 2026.05		164.33	-
NLDT* 2020.12		132.83	136.7
PPO 2019.10		121	-
TRPO 2019.10		104	-
Rule List 2020.12		-78.4	89
Oblique DT 2020.12		-	123.3439