Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
e-Commerce Task on Internal e-commerce benchmark Task medium-scale seller 387 items
Loading...
54.3
Performance Score
Baseline
42.964
45.907
48.85
51.793
May 9, 2025
Performance Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Performance Score
Baseline
Base Model=Llama 3.1 8B
2025.05
54.3
ME+GD
Base Model=Llama 3.1 8B
2025.05
54.3
UnDIAL+KL
Base Model=Llama 3.1 8...
2025.05
54.2
UnDIAL+KL
Base Model=Llama 3.1 8...
2025.05
54.1
RKLD+KL
Base Model=Llama 3.1 8B
2025.05
53.8
Unilogit+KL
Base Model=Llama 3.1 8...
2025.05
53.8
NPO+KL
Base Model=Llama 3.1 8...
2025.05
53.7
SimNPO+KL
Base Model=Llama 3.1 8...
2025.05
53.5
GA+KL
Base Model=Llama 3.1 8B
2025.05
53.4
GA
Base Model=Llama 3.1 8B
2025.05
53.1
NPO+KL
Base Model=Llama 3.1 8...
2025.05
53
SimNPO+KL
Base Model=Llama 3.1 8...
2025.05
52.6
NPO
Base Model=Llama 3.1 8B
2025.05
51.8
Unilogit+KL
Base Model=Llama 3.1 8...
2025.05
43.4
Feedback
Search any
task
Search any
task