Share your thoughts, 1 month free Claude Pro on usSee more

Reinforcement Learning on Hopper delta=[0.2, 0.5, 0.5], kappa=2.5 v5 (test)

3,312Return

DD-SRad

Updated 2mo ago

Evaluation Results

Method	Links
DD-SRad 2026.05		3,312	11.6	46.2
BoxPre+ 2026.05		3,256	6	30.3
D-Tanh 2026.05		2,712	8.4	49.6
DD-SRad 2026.05		2,610	7	46.1
SRad-Strict 2026.05		2,394	13.4	16.6
D-Tanh 2026.05		2,334	11.8	48
BoxPre+ 2026.05		2,319	9	26.7
SRad-Strict 2026.05		2,195	33.5	15.5
SRad-QP 2026.05		1,624	20.4	37
Post(QP) 2026.05		1,278	12.9	35.3
SRad-QP 2026.05		1,020	13.6	33.3
Post(QP) 2026.05		849	29.6	48.7