Share your thoughts, 1 month free Claude Pro on usSee more

General Language Modeling on General Benchmarks Llama 3.1 8B

66.5Generation Quality Score

Baseline

Updated 5mo ago

Evaluation Results

Method	Links
Baseline 2025.05		66.5	43.2	36.4	62.3	70
GA 2025.05		66.2	43	36.7	62.8	69
ME 2025.05		66.2	43.2	35.5	62.4	70
Undial 2025.05		66.1	43.4	35.6	62.9	70
Undial 2025.05		66.1	43.4	35.3	62.7	69
Undial 2025.05		66	41.9	36.1	59.8	69
NPO 2025.05		65.9	43.2	35.3	63.2	67
Unilogit 2025.05		65.7	41.9	35.1	63.3	67
Unilogit 2025.05		65.2	28.1	29.1	57	58
Undial 2025.05		65	41	35.4	48.5	66
RKLD 2025.05		63.1	20.5	35.6	13.9	63