Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human Evaluation on LongBench Chat
Loading...
14
Helpfulness Win Rate
LongReward + DPO
13.3
13.65
14
14.35
Oct 28, 2024
Helpfulness Win Rate
Helpfulness Tie Rate
Helpfulness Loss Rate
Helpfulness Win-Loss Delta
Logicality Win Rate
Logicality Tie Rate
Logicality Loss Rate
Logicality Win-Loss Delta
Faithfulness Win Rate
Faithfulness Tie Rate
Faithfulness Loss Rate
Faithfulness Win-Loss Delta
Completeness Win Rate
Completeness Tie Rate
Completeness Loss Rate
Completeness Win-Loss Delta
Overall Win Rate
Overall Tie Rate
Overall Loss Rate
Overall Win-Loss Delta
Updated 4d ago
Evaluation Results
Method
Method
Links
Helpfulness Win Rate
Helpfulness Tie Rate
Helpfulness Loss Rate
Helpfulness Win-Loss Delta
Logicality Win Rate
Logicality Tie Rate
Logicality Loss Rate
Logicality Win-Loss Delta
Faithfulness Win Rate
Faithfulness Tie Rate
Faithfulness Loss Rate
Faithfulness Win-Loss Delta
Completeness Win Rate
Completeness Tie Rate
Completeness Loss Rate
Completeness Win-Loss Delta
Overall Win Rate
Overall Tie Rate
Overall Loss Rate
Overall Win-Loss Delta
LongReward + DPO
Model=Llama-3.1-8B, Tr...
2024.10
14
84
2
12
14
86
0
14
32
64
4
28
26
64
10
16
54
38
8
46
Feedback
Search any
task
Search any
task