Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning on LunarLander v2

2,292Final Return

Advantage-weighting

-173.216466.7921,106.81,746.808Oct 1, 2019Dec 13, 2019Feb 24, 2020May 8, 2020Jul 20, 2020Oct 1, 2020Dec 14, 2020
Updated 1mo ago

Evaluation Results

MethodLinks
2020.12
2,292518,153
2020.12
278.23518,153
2020.12
272.14118.9
2020.12
266121,000,000
2020.12
262.1886.9
2020.12
258.877.6
2020.12
254.584,337.3
2020.12
248.221632,620.2
2019.10
229-
2020.12
225.791,295,307.1
2020.12
217.92210,733.2
2020.12
217.09647,691.1
2020.12
201.471,673
2020.12
201.4630,878.1
2020.12
201.4630,878.1
2020.12
200.65259,285.8
2020.12
200.3237,079.7
2020.12
200.2230,878.1
2019.10
185-
2020.12
132.83136.7
2019.10
121-
2019.10
104-
2020.12
-78.489
2020.12
-123.3439