Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Summarization on Summarize from Feedback
Loading...
70
Reward
PPO
12.8
27.65
42.5
57.35
Mar 5, 2025
Reward
Time (m)
Reward / Time
Peak Memory (GB)
Updated 4d ago
Evaluation Results
Method
Method
Links
Reward
Time (m)
Reward / Time
Peak Memory (GB)
PPO
Model=Llama 3.2 1B, Ha...
2025.03
70
49
0.014
-
ZOPrO
Model=Llama 3.2 1B, Ha...
2025.03
17
14
0.012
-
RLOO
Model=Llama 3.2 1B, Ha...
2025.03
15
42
0.0035
-
Feedback
Search any
task
Search any
task