Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MixEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge-focused evaluationMixEval Hard
Accuracy21.8
8
Knowledge-focused evaluationMixEval Standard
Accuracy33
8
AlignmentMixEval
Score86.7
5
AlignmentMixEval v1 (test)
Accuracy76.5
4
Showing 4 of 4 rows