Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Continuous Control on InvertedPendulum v5
Loading...
1,000
Average Episodic Reward
TD3
800.424
852.237
904.05
955.863
Mar 10, 2026
Mar 15, 2026
Mar 20, 2026
Mar 25, 2026
Mar 30, 2026
Apr 4, 2026
Apr 10, 2026
Average Episodic Reward
Updated 6d ago
Evaluation Results
Method
Method
Links
Average Episodic Reward
TD3
NFE=1
2026.04
1,000
MaxEntDP
NFE=20 × 10
2026.04
1,000
TRFP(ours)
NFE=4 × 4
2026.04
1,000
TRFP(one-step)
NFE=1 × 4
2026.04
1,000
SDAC
NFE=20 × 32
2026.04
992.1
PPO
Environment steps=1M,...
2026.03
975.8
PDA
Environment steps=1M,...
2026.03
916
SAC
NFE=1
2026.04
808.1
Feedback
Search any
task
Search any
task