Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy generalization on MESSENGER-WM NewCombo (test)
Loading...
1.31
Avg Sum Score
LED-WM
0.9668
1.0559
1.145
1.2341
Nov 28, 2025
Avg Sum Score
Updated 3mo ago
Evaluation Results
Method
Method
Links
Avg Sum Score
LED-WM
2025.11
1.31
EMMA-LWM
Training protocol=Filt...
2025.11
1.18
EMMA-LWM
Training protocol=Onli...
2025.11
1.01
EMMA-LWM
Training protocol=Filt...
2025.11
0.98
Feedback
Search any
task
Search any
task