Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Machine Translation on WMT20
Loading...
0.2
Reward
RLOO
0.1064
0.1307
0.155
0.1793
Mar 5, 2025
Reward
Inference Time
Reward per Time
Peak Memory (GB)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reward
Inference Time
Reward per Time
Peak Memory (GB)
RLOO
Model=Llama 3.2 1B, Ha...
2025.03
0.2
36
0.005
-
ZOPrO
Model=Llama 3.2 1B, Ha...
2025.03
0.15
6
0.025
-
PPO
Model=Llama 3.2 1B, Ha...
2025.03
0.11
115
0.0015
-
Feedback
Search any
task
Search any
task