Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instruction Following on MT-bench and AlpacaEval
Loading...
1.55
Aggregated P
NovelSelect
1.03
1.165
1.3
1.435
Feb 24, 2025
Aggregated P
Updated 4d ago
Evaluation Results
Method
Method
Links
Aggregated P
NovelSelect
Base Model=LLaMA-3-8B,...
2025.02
1.55
K-means
Base Model=LLaMA-3-8B,...
2025.02
1.32
K-Center-Greedy
Base Model=LLaMA-3-8B,...
2025.02
1.31
QDIT
Base Model=LLaMA-3-8B,...
2025.02
1.25
Random
Base Model=LLaMA-3-8B,...
2025.02
1.2
Repr Filter
Base Model=LLaMA-3-8B,...
2025.02
1.05
Feedback
Search any
task
Search any
task