Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Conversational Assistant on HH-RLHF
Loading...
0.5
Reward
RLOO
-0.436
-0.193
0.05
0.293
Mar 5, 2025
Reward
Execution Time (s)
Reward Rate
Peak Memory (GB)
Updated 4d ago
Evaluation Results
Method
Method
Links
Reward
Execution Time (s)
Reward Rate
Peak Memory (GB)
RLOO
Model=Llama 3.2 1B, Ha...
2025.03
0.5
5
0.0017
-
ZOPrO
Model=Llama 3.2 1B, Ha...
2025.03
0.25
120
0.0031
-
PPO
Model=Llama 3.2 1B, Ha...
2025.03
-0.4
530
-0.0012
-
Feedback
Search any
task
Search any
task