Share your thoughts, 1 month free Claude Pro on usSee more

Commonsense reasoning on HellaSwag 1.0 (test)

85.6Accuracy

ALUM-RoBERTa-Large

Updated 1mo ago

Evaluation Results

Method	Links
ALUM-RoBERTa-Large 2020.04		85.6
RoBERTa-Large 2020.04		85.2
Mistral-7B + DSIR 2024.02		63.1
Mistral-7B Base 2024.02		62.82
Mistral-7B + AutoDS 2024.02		62.72
Mistral-7B + QuRating 2024.02		62.64
Mistral-7B + Uniform 2024.02		62.21
LLaMA2-7B Base 2024.02		58.88
LLaMA2-7B + QuRating 2024.02		58.79
LLaMA2-7B + Uniform 2024.02		58.43
LLaMA2-7B + DSIR 2024.02		58.38
LLaMA2-7B + AutoDS 2024.02		58.28
Gemma-2B + QuRating 2024.02		53.1
Gemma-2B + DSIR 2024.02		52.95
Gemma-2B + Uniform 2024.02		52.91
Gemma-2B + AutoDS 2024.02		52.82
Gemma-2B Base 2024.02		48.3