LLaMA: Open and Efficient Foundation Language Models
About
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\'ee Lacroix, Baptiste Rozi\`ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText2 | Perplexity5.69 | 1875 | |
| Language Modeling | WikiText-2 (test) | PPL5.68 | 1541 | |
| Commonsense Reasoning | HellaSwag | Accuracy84.2 | 1460 | |
| Mathematical Reasoning | GSM8K | Accuracy93 | 983 | |
| Code Generation | HumanEval | Pass@145.7 | 850 | |
| Multi-task Language Understanding | MMLU | Accuracy66.5 | 842 | |
| Language Modeling | WikiText-2 | Perplexity (PPL)12.63 | 841 | |
| Commonsense Reasoning | WinoGrande | Accuracy70.1 | 776 | |
| Language Understanding | MMLU | Accuracy66.2 | 756 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy50.9 | 751 |
Showing 10 of 404 rows
...