TeleChat Technical Report
About
In this technical report, we present TeleChat, a collection of large language models (LLMs) with parameters of 3 billion, 7 billion and 12 billion. It includes pretrained language models as well as fine-tuned chat models that is aligned with human preferences. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, including trillions of tokens. Subsequently, the model undergoes fine-tuning to align with human preferences, following a detailed methodology that we describe. We evaluate the performance of TeleChat on various tasks, including language understanding, mathematics, reasoning, code generation, and knowledge-based question answering. Our findings indicate that TeleChat achieves comparable performance to other open-source models of similar size across a wide range of public benchmarks. To support future research and applications utilizing LLMs, we release the fine-tuned model checkpoints of TeleChat's 7B and 12B variant, along with code and a portion of our pretraining data, to the public community.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Medical Question Answering | MedMCQA | Accuracy57.08 | 521 | |
| Mathematical Reasoning | MATH 500 | Top-1 Accuracy70 | 384 | |
| Reasoning | MMLU-Pro | Accuracy67.98 | 241 | |
| Reasoning | GPQA Diamond | Accuracy33.33 | 185 | |
| Scientific Question Answering | GPQA Diamond | Accuracy33.33 | 123 | |
| Instruction Following | IFEval | Accuracy (IFEval)82 | 89 | |
| Mathematical Problem Solving | MATH500 | Accuracy70 | 83 | |
| Medical Reasoning | MedMCQA | Accuracy57.08 | 58 | |
| Mathematical Problem Solving | AIME 2024 | Top-1 Accuracy10 | 54 | |
| Tabular Question Answering | ReasonTabQA 1.0 (Overall) | Overall Accuracy51.13 | 33 |