Share your thoughts, 1 month free Claude Pro on usSee more

Multimodal Policy Fine-Tuning on Gaussian-mixture environment (G1 landscape)

100Success Rate (SR)

DSRL

Updated 2mo ago

Evaluation Results

Method	Links
DSRL 2026.05		100	25	0.25	0
DPPO 2026.05		100	58	0.5825	0.4
RES 2026.05		100	100	1	0.99
DPPO 2026.05		100	100	1	0.99
RES 2026.05		98	98	1	1
DPPO [10] 2026.05		66	16	0.0825	0
DSRL 2026.05		33	33	0.3325	0.46