Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Commonsense reasoning on HellaSwag 1.0 (test)
Loading...
85.6
Accuracy
ALUM-RoBERTa-Large
46.808
56.879
66.95
77.021
Apr 20, 2020
Dec 8, 2020
Jul 28, 2021
Mar 17, 2022
Nov 4, 2022
Jun 24, 2023
Feb 12, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ALUM-RoBERTa-Large
Backbone=RoBERTa-Large...
2020.04
85.6
RoBERTa-Large
Backbone=RoBERTa-Large
2020.04
85.2
Mistral-7B + DSIR
Model=Mistral-7B, Sele...
2024.02
63.1
Mistral-7B Base
Model=Mistral-7B, Sele...
2024.02
62.82
Mistral-7B + AutoDS
Model=Mistral-7B, Sele...
2024.02
62.72
Mistral-7B + QuRating
Model=Mistral-7B, Sele...
2024.02
62.64
Mistral-7B + Uniform
Model=Mistral-7B, Sele...
2024.02
62.21
LLaMA2-7B Base
Model=LLaMA2-7B, Selec...
2024.02
58.88
LLaMA2-7B + QuRating
Model=LLaMA2-7B, Selec...
2024.02
58.79
LLaMA2-7B + Uniform
Model=LLaMA2-7B, Selec...
2024.02
58.43
LLaMA2-7B + DSIR
Model=LLaMA2-7B, Selec...
2024.02
58.38
LLaMA2-7B + AutoDS
Model=LLaMA2-7B, Selec...
2024.02
58.28
Gemma-2B + QuRating
Model=Gemma-2B, Select...
2024.02
53.1
Gemma-2B + DSIR
Model=Gemma-2B, Select...
2024.02
52.95
Gemma-2B + Uniform
Model=Gemma-2B, Select...
2024.02
52.91
Gemma-2B + AutoDS
Model=Gemma-2B, Select...
2024.02
52.82
Gemma-2B Base
Model=Gemma-2B, Select...
2024.02
48.3
Feedback
Search any
task
Search any
task