LLaMA: Open and Efficient Foundation Language Models
About
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\'ee Lacroix, Baptiste Rozi\`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText2 | Perplexity5.69 | 2839 | |
| Language Modeling | WikiText-2 (test) | PPL5.68 | 1949 | |
| Commonsense Reasoning | HellaSwag | Accuracy84.2 | 1891 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)12.63 | 1624 | |
| Mathematical Reasoning | GSM8K | Accuracy93 | 1362 | |
| Commonsense Reasoning | WinoGrande | Accuracy73 | 1085 | |
| Code Generation | HumanEval | Pass@145.7 | 1036 | |
| Language Modeling | PTB | Perplexity8.93 | 1034 | |
| Question Answering | ARC Challenge | Accuracy52.7 | 906 | |
| Mathematical Reasoning | MATH | Accuracy50.4 | 882 |
Showing 10 of 514 rows
...