Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Domain

Benchmarks

Task NameDataset NameSOTA ResultTrend
Image-Text RetrievalGeneral Domain
Retrieval Score31.27
30
Image ClassificationGeneral Domain 31 tasks
CLS Score57.97
30
Instruction followingGeneral Domain AlpacaEval Arena-Hard LLaMA3-8B (10% selection)
AlpacaEval Score12.09
18
Chinese-to-English speech translationGeneral-domain (test)
BLEU40.77
6
Question AnsweringGeneral Domain Average
Average EM42.35
5
Language ModelingGeneral Domain (holdout test)
L_inf0.79
4
Showing 6 of 6 rows