Share your thoughts, 1 month free Claude Pro on usSee more

Reinforcement Learning on Humanoid v4

5,715Reward

C-DSAC

Updated 1mo ago

Evaluation Results

Method	Links
C-DSAC 2026.04		5,715
PPO-Clip 2026.06		660
per-sample PPO-KL 2026.06		660
Adaptive β 2026.06		553
DDPG-AdaGamma 2026.05		457.02
DDPG-Uncertainty 2026.05		454.46
DDPG-CrossValidate 2026.05		356.85
Fixed β 2026.06		342
TRPO-AdaGamma 2026.05		284.49
TRPO-Uncertainty 2026.05		250.65
TRPO-CrossValidate 2026.05		221.1
TRPO-Fixed-γ 2026.05		218.15
DDPG-Fixed-γ 2026.05		165.82