Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning on Ant v5

6,633.8Average Return

QVPO+DBC(*)

-276.9481,517.18853,311.3255,105.4615Dec 4, 2025Dec 20, 2025Jan 5, 2026Jan 22, 2026Feb 7, 2026Feb 23, 2026Mar 12, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
6,633.8
2026.02
6,501.4
2026.02
6,373.2
2026.02
6,342.6
2026.02
6,121.8
2026.02
5,306
2025.12
4,477.33
2026.02
4,257
2026.03
4,183
2026.03
4,125
2025.12
4,067.61
2026.02
4,000
2026.02
4,000
2026.02
3,850.4
2025.12
3,761.98
2026.02
3,750
2026.02
3,662
2026.03
3,570
2026.02
3,536
2026.02
3,487.4
2026.02
3,474
2026.02
3,389
2026.02
3,190
2026.02
3,169
2026.03
3,157
2026.02
3,093
2026.02
3,084
2026.02
3,082
2026.02
2,963
2026.02
2,830
2026.02
2,818
2026.02
2,792
2026.02
2,781
2026.02
2,663
2026.02
2,650.3
2025.12
2,619.72
2026.02
2,013.2
2026.02
1,225
2026.02
1,221
2026.02
994
2025.12
960.36
2025.12
959.65
2025.12
958.54
2025.12
957.37
2026.02
957
2025.12
954.13
2025.12
953.84
2025.12
948.86
2025.12
934.15
2026.03
610
2026.03
501
2026.03
456
2026.03
376
2025.12
155.64
2025.12
32.47
2025.12
17.34
2025.12
-11.15