Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Policy Fine-Tuning on Gaussian-mixture environment G2
Loading...
100
SR
DPPO
29.28
47.64
66
84.36
May 12, 2026
SR
SRM
mc@80
H
Updated 21d ago
Evaluation Results
Method
Method
Links
SR
SRM
mc@80
H
DPPO
2026.05
100
42
1.67
2
RES[BMD]
regularization=BMD
2026.05
100
100
4
94
DPPO[BMD]
regularization=BMD
2026.05
100
75
3
74
RES
2026.05
92
50
2
59
DSRL
2026.05
33
8
0.33
0
DSRL[BMD]
regularization=BMD
2026.05
33
8
0.33
84
DPPO
variant=[10]
2026.05
32
11
0
60
Feedback
Search any
task
Search any
task