Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Action-component payoff optimization on L3 warmup (off-diagonal)
Loading...
4.17
Per-Interaction Payoff
Reciprocity Gradient
0.0932
1.1516
2.21
3.2684
May 8, 2026
Per-Interaction Payoff
Reference %
Updated 22d ago
Evaluation Results
Method
Method
Links
Per-Interaction Payoff
Reference %
Reciprocity Gradient
2026.05
4.17
93
Reciprocity Gradient
2026.05
4.08
91
DPG
Touter=15, Seeds=3
2026.05
1.63
36
DDPG
Touter=15, Seeds=3
2026.05
1.58
35
TD3
Touter=15, Seeds=3
2026.05
1.48
33
DPG
Touter=15, Seeds=3
2026.05
0.55
12
TD3
Touter=15, Seeds=3
2026.05
0.32
7
DDPG
Touter=15, Seeds=3
2026.05
0.25
6
Feedback
Search any
task
Search any
task