Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MergeBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-task Language ModelingMergeBench
Instruction Score39.56
21
Model Merging EvaluationMergeBench
MATH-500 Score52.6
12
Vision-Language Multi-task PerformanceMergeBench (Vision-Language tasks: MMSI-Bench, EmbSpatial, MMMU_Med, PathVQA, OCRBench, CharXiv)
MMSI-Bench32.6
11
Showing 3 of 3 rows