Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context Factuality Evaluation on LongBench (Factuality Subset)
Loading...
32.86
Fact Count
DPO w/ LongReward
17.832
21.7335
25.635
29.5365
Oct 28, 2024
Fact Count
FactScore
Updated 4d ago
Evaluation Results
Method
Method
Links
Fact Count
FactScore
DPO w/ LongReward
Base Model=Llama-3.1-8B
2024.10
32.86
92.85
DPO w/ LongReward
Base Model=GLM-4-9B
2024.10
28.05
93.62
SFT
Base Model=Llama-3.1-8B
2024.10
21.76
91.94
SFT
Base Model=GLM-4-9B
2024.10
18.41
91.43
Feedback
Search any
task
Search any
task