Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cooperative Multi-Agent Reinforcement Learning on Speaker-Listener (last 2% of train)

-17.22Mean Episodic Reward

SACHI

-49.3872-41.0361-32.685-24.3339May 8, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
-17.22
2026.05
-17.82
2026.05
-18.45
2026.05
-19.44
2026.05
-20.38
2026.05
-21.19
2026.05
-22.45
2026.05
-25.68
2026.05
-27.7
2026.05
-29.49
2026.05
-34.29
2026.05
-43.79
2026.05
-48.15