Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WizardLM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingWizardLM (test)
Score6.87
25
Instruction TuningWizardLM
Reasoning Score75.07
20
Refusal behavior defenseWizardLM (test)
BadNet CACC90.4
12
Toxic behavior defenseWizardLM (test)
BadNet CACC0.904
12
Fine-tuningWizardLM
Evaluation Loss0.661
7
Instruction FollowingWizardLM low-resource
Win Rate (bn)62.8
7
Instruction Following EvaluationWizardLM
Score72.06
5
GenerationWizardLM (test)
LLM-as-a-Judge Score48.37
2
Showing 8 of 8 rows