Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Joint action and signal payoff optimization on L6 warmup (off-diagonal)
Loading...
3.65
Payoff (Per Interaction)
Reciprocity Gradient
0.1868
1.0859
1.985
2.8841
May 8, 2026
Payoff (Per Interaction)
Reference Ratio (%)
Updated 22d ago
Evaluation Results
Method
Method
Links
Payoff (Per Interaction)
Reference Ratio (%)
Reciprocity Gradient
2026.05
3.65
81
DDPG
Touter=15, Seeds=3
2026.05
1.16
26
DPG
Touter=15, Seeds=3
2026.05
0.57
13
TD3
Touter=15, Seeds=3
2026.05
0.32
7
Feedback
Search any
task
Search any
task