Share your thoughts, 1 month free Claude Pro on usSee more

Instruction Tuning on MMLU, BBH, GSM, TydiQA, HumanEval, and AlpacaEval Suite

55.7MMLU

Alpaca-GPT4

Updated 4mo ago

Evaluation Results

Method	Links
Alpaca-GPT4 2024.02		55.7	46.6	30.5	48.1	40.8	46.5	44.7	-
SelectIT 2024.02		55.7	48.9	33	54.1	42.2	48.8	47.1	2.4
Q2Q 2024.02		55.3	48.5	32	50.8	41.3	47.3	45.9	1.2
Token-R 2024.02		55.3	47.3	30.5	51.3	39.8	46.2	45.1	0.4
Sentence-R 2024.02		55.2	48.3	31	52.2	42.5	46.3	45.9	1.2
Model-R 2024.02		55.1	47.5	31.5	52.3	40.2	46.1	45.5	0.8
LIMA 2024.02		54.6	45.3	30.5	51.1	34.1	42.6	43	-1.7
AlpaGasus 2024.02		54.1	47.3	31.5	50.6	41.3	46.3	45.2	0.5
Instruction Mining 2024.02		54.1	47.3	32.5	52.6	43.3	48.3	46.3	1.6
SelectIT 2024.02		47.4	40.6	16.8	47.4	29.4	35.7	36.2	2.2
Model-R 2024.02		47.3	37.4	16.1	45.3	28.4	35.8	35.1	1
Instruction Mining 2024.02		47	39.6	16.5	47.1	28.6	34.4	35.5	1.5
Q2Q 2024.02		46.9	39.4	15.3	46.7	28.2	35.7	35.4	1.3
Sentence-R 2024.02		46.9	38.1	16.1	48.4	26.9	35.3	35.3	1.2
Token-R 2024.02		46.8	36.5	14.5	44.6	28.9	35.5	34.5	0.4
Alpaca-GPT4 2024.02		46.5	38.4	15	43.4	26.8	34.2	34.1	-
AlpaGasus 2024.02		45.9	39	14.5	46.4	27.5	35.4	34.8	0.7
LIMA 2024.02		45.4	37.5	14.3	45.1	24.6	33.1	33.3	-0.7