Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Benchmark Family

Benchmarks

Task NameDataset NameSOTA ResultTrend
Model MergingLLM Benchmark Family MMLU, TruthfulQA, BBQ, CNN/DailyMail
MMLU Score69.87
5
Showing 1 of 1 rows