Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Knowledge Retention on internal e-commerce benchmark large-scale seller 1065 items (Neighbours)
Loading...
58.2
Rouge Score
ME+GD
46.344
49.422
52.5
55.578
May 9, 2025
Rouge Score
Loss
Updated 4d ago
Evaluation Results
Method
Method
Links
Rouge Score
Loss
ME+GD
Backbone=Llama 3.1 8B
2025.05
58.2
0.54
Baseline
Backbone=Llama 3.1 8B
2025.05
58.1
0.54
Undial+KL
Backbone=Llama 3.1 8B,...
2025.05
55.8
0.59
Unilogit+KL
Backbone=Llama 3.1 8B,...
2025.05
55.6
0.6
GA+KL
Backbone=Llama 3.1 8B
2025.05
52.5
0.62
NPO+KL
Backbone=Llama 3.1 8B
2025.05
51.7
0.7
Undial+KL
Backbone=Llama 3.1 8B,...
2025.05
50.2
0.64
Unilogit+KL
Backbone=Llama 3.1 8B,...
2025.05
46.8
0.69
Feedback
Search any
task
Search any
task