Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning on Hopper v4

27,721,263Average Return

pop-SAN

-1,107,318.66,377,024.713,861,36821,345,711.3Jan 29, 2026Feb 14, 2026Mar 3, 2026Mar 19, 2026Apr 5, 2026Apr 21, 2026May 8, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.02
27,721,263
2026.02
3,446,131
2026.02
3,410,164
2026.02
3,403,148
2026.02
3,385,157
2026.02
3,098,281
2026.02
356,568
352,094
2026.01
3,462
2026.01
3,414
2026.01
3,384
2026.01
3,380
2026.04
3,352
2026.01
3,349
2026.03
2,944.3
2026.05
2,736
2026.05
2,719
2026.05
2,507
2026.05
2,468
2026.05
2,458
2026.05
2,384
2026.03
2,329.7
2026.03
2,017
2026.03
1,650.8
2026.05
1,644
2026.05
1,473