Share your thoughts, 1 month free Claude Pro on usSee more

Reinforcement Learning on Humanoid v5 (delta=[0.8^6, 0.5^6, 0.2^5], kappa=4.0) (test)

5,620Return

DD-SRad

Updated 2mo ago

Evaluation Results

Method	Links
DD-SRad 2026.05		5,620	0.038	0.489
DD-SRad 2026.05		5,498	0.037	0.203
D-Tanh 2026.05		5,313	0.064	0.184
BoxPre+ 2026.05		5,291	0.023	0.179
BoxPre+ 2026.05		5,204	0.034	0.159
Post(QP) 2026.05		5,191	0.067	0.249
D-Tanh 2026.05		4,868	0.055	0.662
SRad-QP 2026.05		4,558	0.11	0.053
Post(QP) 2026.05		4,420	0.026	0.465
SRad-QP 2026.05		3,810	0.087	0.015
SRad-Strict 2026.05		2,742	0.493	0.024
SRad-Strict 2026.05		2,270	0.496	0.053