Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MultiChallenge

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reverse Chain-of-Thought GenerationMultiChallenge
Score45
20
Instruction FollowingMultiChallenge
Score65.98
10
General-purpose BehaviorMultiChallenge
Score58.6
7
Medical Instruction FollowingMultiChallenge
Pass@166.8
4
Showing 4 of 4 rows