Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Action-component payoff optimization on L6 warmup discriminative cell
Loading...
3.94
Per-interaction Payoff
Reciprocity Gradient
2.016
2.5155
3.015
3.5145
May 8, 2026
Per-interaction Payoff
% of Reference
Updated 22d ago
Evaluation Results
Method
Method
Links
Per-interaction Payoff
% of Reference
Reciprocity Gradient
Touter=400, Seeds=10
2026.05
3.94
87
DPG
Touter=125, Seeds=10
2026.05
2.19
49
TD3
Touter=125, Seeds=10
2026.05
2.1
47
DDPG
Touter=125, Seeds=10
2026.05
2.09
46
Feedback
Search any
task
Search any
task