Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Knowledge Retention on Internal e-commerce benchmark Neighbours medium-scale seller 387 items
Loading...
80.8
Rouge Score
Baseline
23.6
38.45
53.3
68.15
May 9, 2025
Rouge Score
Loss
Updated 4d ago
Evaluation Results
Method
Method
Links
Rouge Score
Loss
Baseline
Base Model=Llama 3.1 8B
2025.05
80.8
0.28
ME+GD
Base Model=Llama 3.1 8B
2025.05
80.7
0.28
UnDIAL+KL
Base Model=Llama 3.1 8...
2025.05
79
0.28
Unilogit+KL
Base Model=Llama 3.1 8...
2025.05
78.2
0.29
UnDIAL+KL
Base Model=Llama 3.1 8...
2025.05
75.9
0.29
NPO+KL
Base Model=Llama 3.1 8...
2025.05
75
0.31
SimNPO+KL
Base Model=Llama 3.1 8...
2025.05
72.7
0.31
NPO+KL
Base Model=Llama 3.1 8...
2025.05
69
0.33
SimNPO+KL
Base Model=Llama 3.1 8...
2025.05
68.1
0.35
GA+KL
Base Model=Llama 3.1 8B
2025.05
67.5
0.33
GA
Base Model=Llama 3.1 8B
2025.05
66.7
0.33
RKLD+KL
Base Model=Llama 3.1 8B
2025.05
66.1
0.34
NPO
Base Model=Llama 3.1 8B
2025.05
57.9
0.52
Unilogit+KL
Base Model=Llama 3.1 8...
2025.05
25.8
1.68
Feedback
Search any
task
Search any
task