Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Domain

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction followingGeneral Domain AlpacaEval Arena-Hard LLaMA3-8B (10% selection)
AlpacaEval Score12.09
18
Chinese-to-English speech translationGeneral-domain (test)
BLEU40.77
6
Question AnsweringGeneral Domain Average
Average EM42.35
5
Language ModelingGeneral Domain (holdout test)
L_inf0.79
4
Showing 4 of 4 rows