Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-Task evaluation on LongBench Chat
Loading...
72.6
Point-wise Rate
DPO w/ LongReward
59.704
63.052
66.4
69.748
Oct 28, 2024
Point-wise Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Point-wise Rate
DPO w/ LongReward
Backbone=Llama-3.1-8B,...
2024.10
72.6
DPO w/ Contrast
Backbone=Llama-3.1-8B,...
2024.10
70.6
SFT
Backbone=Llama-3.1-8B,...
2024.10
69.8
DPO w/ LongReward
Backbone=GLM-4-9B, Jud...
2024.10
69.2
officially post-trained
Backbone=GLM-4-9B, Jud...
2024.10
68.6
DPO w/ Contrast
Backbone=GLM-4-9B, Jud...
2024.10
68.2
DPO w/ SRM
Backbone=Llama-3.1-8B,...
2024.10
67.4
DPO w/ SRM
Backbone=GLM-4-9B, Jud...
2024.10
66.6
SFT
Backbone=GLM-4-9B, Jud...
2024.10
64.8
officially post-trained
Backbone=Llama-3.1-8B,...
2024.10
60.2
Feedback
Search any
task
Search any
task