Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
e-Commerce Task on Internal e-commerce benchmark Task medium-scale seller 387 items
Loading...
54.3
Performance Score
Baseline
42.964
45.907
48.85
51.793
May 9, 2025
Performance Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Performance Score
Baseline
Base Model=Llama 3.1 8B
2025.05
54.3
ME+GD
Base Model=Llama 3.1 8B
2025.05
54.3
UnDIAL+KL
Base Model=Llama 3.1 8...
2025.05
54.2
UnDIAL+KL
Base Model=Llama 3.1 8...
2025.05
54.1
RKLD+KL
Base Model=Llama 3.1 8B
2025.05
53.8
Unilogit+KL
Base Model=Llama 3.1 8...
2025.05
53.8
NPO+KL
Base Model=Llama 3.1 8...
2025.05
53.7
SimNPO+KL
Base Model=Llama 3.1 8...
2025.05
53.5
GA+KL
Base Model=Llama 3.1 8B
2025.05
53.4
GA
Base Model=Llama 3.1 8B
2025.05
53.1
NPO+KL
Base Model=Llama 3.1 8...
2025.05
53
SimNPO+KL
Base Model=Llama 3.1 8...
2025.05
52.6
NPO
Base Model=Llama 3.1 8B
2025.05
51.8
Unilogit+KL
Base Model=Llama 3.1 8...
2025.05
43.4
Feedback
Search any
task
Search any
task