Share your thoughts, 1 month free Claude Pro on usSee more

Model Merging

Benchmarks

Dataset Name	SOTA Method	Metric
8 Vision tasks (test)		Accuracy95.8	77	2mo ago
Average of 8 benchmarks	Pico	Average Accuracy52.79	72	3mo ago
GLUE CoLA, MRPC, RTE, SST-2	Task arithmetic	Absolute Accuracy75.9	60	4mo ago
Llama-3.2-1B-Instruct Task Set	K-Merge	Score S(gamma)0.83	56	23d ago
20-task vision merging scenario (test)		Accuracy94.7	44	2mo ago
14-task vision merging scenario (test)		Accuracy94.3	44	2mo ago
Language Benchmarks 5-task		Score0.63	44	2mo ago
Large-scale tasks	DOGE AM	Average Normalized Accuracy98.2	36	2mo ago
TED Talks and XLSum 40 sequential tasks (test)	K-Merge	S^(γ)84	27	23d ago
Sustainability to large-scale tasks	DOGE AM	Average Normalized Accuracy91.4	24	2mo ago
Sustainability to large-scale tasks 2 tasks	DOGE AM	Average Normalized Accuracy101.2	24	2mo ago
7 NLP tasks (test)		Accuracy79.2	22	3mo ago
7-task NLP benchmark	METIS	Avg Performance1.18	20	1mo ago
8 Vision Tasks (average)	PACT-Iso-C	Average Accuracy86.3	18	1mo ago
Large-scale tasks 16 tasks merged	DOGE AM	Average Normalized Accuracy91.5	12	2mo ago
Large-scale tasks 12 tasks merged	DOGE AM	Avg Normalized Acc94.3	12	2mo ago
Sustainability to large-scale tasks (20 tasks)	RegMean++	Average Normalized Accuracy82.9	12	2mo ago
Sustainability to large-scale 8 tasks	DOGE AM	Avg Normalized Accuracy94.8	12	2mo ago
Sustainability to large-scale tasks 4 tasks	DOGE AM	Average Accuracy98.3	12	2mo ago
LLM Evaluation Suite	KARCHER	Normalized Score0.401	12	4mo ago
Vision, Language, and Multi-modal tasks	Multiple Models	Parameters8	11	2mo ago
model merging benchmark Many-shot	METIS	Average1.015	9	1mo ago
Multi-task Evaluation Suite Instruction, Math, Multilingual, Safety	METIS	Average Score1.015	9	1mo ago
CIFAR100, Cars196, SUN397, EuroSAT, GTSRB, Pets		Clean Accuracy83.22	7	1mo ago
Qwen3-4B-Base Transfer 8 benchmarks	Pico	Math Accuracy32.65	6	3mo ago

Showing 25 of 26 rows