Share your thoughts, 1 month free Claude Pro on usSee more

Reinforcement Learning on HalfCheetah delta=[0.2^3, 0.5^3], kappa=2.5 v5 (test)

4,329Return

DD-SRad

Updated 2mo ago

Evaluation Results

Method	Links
DD-SRad 2026.05		4,329	12.2	66.8
DD-SRad 2026.05		4,290	15.9	62.6
D-Tanh 2026.05		4,205	5.5	66.2
D-Tanh 2026.05		4,000	7.9	67.2
BoxPre+ 2026.05		3,796	1.6	60
BoxPre+ 2026.05		3,641	18.7	60.7
SRad-QP 2026.05		3,211	4	58.2
SRad-QP 2026.05		2,482	17.4	55.7
Post(QP) 2026.05		1,798	13.7	76.7
SRad-Strict 2026.05		1,518	12.2	21.1
Post(QP) 2026.05		1,484	70.2	69.5
SRad-Strict 2026.05		1,432	10.3	21.3