Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Helpfulness Evaluation on Helpfulness
Loading...
97
Average Win Rate
Curriculum-RLAIF
84.52
87.76
91
94.24
May 26, 2025
Average Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Win Rate
Curriculum-RLAIF
Base Model=Qwen2.5-32B...
2025.05
97
Curriculum-RLAIF
Base Model=LLaMA-3-8B,...
2025.05
95
Internal Eval.
Base Model=Qwen2.5-32B...
2025.05
94
Curriculum-RLAIF
Base Model=Gemma-1-2B,...
2025.05
93
Conventional RLAIF
Base Model=Qwen2.5-32B...
2025.05
93
Implicit Eval. (DPO)
Base Model=Qwen2.5-32B...
2025.05
93
RLCD
Base Model=Qwen2.5-32B...
2025.05
92
Internal Eval.
Base Model=LLaMA-3-8B,...
2025.05
91
External Eval.
Base Model=Qwen2.5-32B...
2025.05
91
Conventional RLAIF
Base Model=LLaMA-3-8B,...
2025.05
90
Implicit Eval. (DPO)
Base Model=LLaMA-3-8B,...
2025.05
90
CAI
Base Model=Qwen2.5-32B...
2025.05
89
Internal Eval.
Base Model=Gemma-1-2B,...
2025.05
88
RLCD
Base Model=LLaMA-3-8B,...
2025.05
88
RLCD
Base Model=Gemma-1-2B,...
2025.05
87
External Eval.
Base Model=Gemma-1-2B,...
2025.05
87
CAI
Base Model=LLaMA-3-8B,...
2025.05
87
External Eval.
Base Model=LLaMA-3-8B,...
2025.05
87
Conventional RLAIF
Base Model=Gemma-1-2B,...
2025.05
86
CAI
Base Model=Gemma-1-2B,...
2025.05
85
Implicit Eval. (DPO)
Base Model=Gemma-1-2B,...
2025.05
85
Feedback
Search any
task
Search any
task