Share your thoughts, 1 month free Claude Pro on usSee more

Multimodal Policy Fine-Tuning on Gaussian-mixture environment G2

100SR

DPPO

Updated 2mo ago

Evaluation Results

Method	Links
DPPO 2026.05		100	42	1.67	2
RES[BMD] 2026.05		100	100	4	94
DPPO[BMD] 2026.05		100	75	3	74
RES 2026.05		92	50	2	59
DSRL 2026.05		33	8	0.33	0
DSRL[BMD] 2026.05		33	8	0.33	84
DPPO 2026.05		32	11	0	60