Share your thoughts, 1 month free Claude Pro on usSee more

Downstream Task Evaluation on ARC Challenge, BoolQ, OpenbookQA, GSM8K (Strict), and MMLU

66.72ARC Challenge Accuracy

Original

Updated 2mo ago

Evaluation Results

Method	Links
Original 2025.05		66.72	88.5	41.2	82.79	77.97	71.44
A^3 2025.05		61.18	88.41	38	75.89	73.4	67.38
SVD-LLM 2025.05		57.51	87.03	37.2	61.79	71.34	62.97
A^3 2025.05		52.73	86.45	34.8	60.73	67.15	60.37
SVD-LLM 2025.05		50.34	86.18	32.6	49.13	67.73	57.2