Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning on LunarLander v2

2,292Final Return

Advantage-weighting

-173.216466.7921,106.81,746.808Oct 1, 2019Nov 7, 2020Dec 16, 2021Jan 23, 2023Mar 2, 2024Apr 9, 2025May 18, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2020.12
2,292518,153
2026.05
283.11-
2020.12
278.23518,153
2020.12
272.14118.9
2020.12
266121,000,000
2020.12
262.1886.9
2026.05
260.21-
2020.12
258.877.6
2026.05
257.61-
2026.05
257.21-
2020.12
254.584,337.3
2020.12
248.221632,620.2
2026.05
239.9-
2026.05
235.72-
2019.10
229-
2020.12
225.791,295,307.1
2020.12
217.92210,733.2
2020.12
217.09647,691.1
2020.12
201.471,673
2020.12
201.4630,878.1
2020.12
201.4630,878.1
2020.12
200.65259,285.8
2020.12
200.3237,079.7
2020.12
200.2230,878.1
2019.10
185-
2026.05
164.33-
2020.12
132.83136.7
2019.10
121-
2019.10
104-
2020.12
-78.489
2020.12
-123.3439