Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Machine Translation on WMT20
Loading...
0.2
Reward
RLOO
0.1064
0.1307
0.155
0.1793
Mar 5, 2025
Reward
Inference Time
Reward per Time
Peak Memory (GB)
Updated 4d ago
Evaluation Results
Method
Method
Links
Reward
Inference Time
Reward per Time
Peak Memory (GB)
RLOO
Model=Llama 3.2 1B, Ha...
2025.03
0.2
36
0.005
-
ZOPrO
Model=Llama 3.2 1B, Ha...
2025.03
0.15
6
0.025
-
PPO
Model=Llama 3.2 1B, Ha...
2025.03
0.11
115
0.0015
-
Feedback
Search any
task
Search any
task