| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Large Language Model Evaluation | HuggingFace Open LLM Leaderboard | GSM8K55.37 | 49 | |
| General language understanding and reasoning | Huggingface Open LLM Leaderboard | HellaSwag Accuracy62 | 20 | |
| Large Language Model Evaluation | HuggingFace Open LLM Leaderboard lm-eval-harness default (various) | HellaSwag84.34 | 18 |