Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Conversational Assistant on HH-RLHF
Loading...
0.5
Reward
RLOO
-0.436
-0.193
0.05
0.293
Mar 5, 2025
Reward
Execution Time (s)
Reward Rate
Peak Memory (GB)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Reward
Execution Time (s)
Reward Rate
Peak Memory (GB)
RLOO
Model=Llama 3.2 1B, Ha...
2025.03
0.5
5
0.0017
-
ZOPrO
Model=Llama 3.2 1B, Ha...
2025.03
0.25
120
0.0031
-
PPO
Model=Llama 3.2 1B, Ha...
2025.03
-0.4
530
-0.0012
-
Feedback
Search any
task
Search any
task