KoCo: Conditioning Language Model Pre-training on Knowledge Coordinates
About
Standard Large Language Model (LLM) pre-training typically treats corpora as flattened token sequences, often overlooking the real-world context that humans naturally rely on to contextualize information. To bridge this gap, we introduce Knowledge Coordinate Conditioning (KoCo), a simple method that maps every document into a three-dimensional semantic coordinate. By prepending these coordinates as textual prefixes for pre-training, we aim to equip the model with explicit contextual awareness to learn the documents within the real-world knowledge structure. Experiment results demonstrate that KoCo significantly enhances performance across 10 downstream tasks and accelerates pre-training convergence by approximately 30\%. Furthermore, our analysis indicates that explicitly modeling knowledge coordinates helps the model distinguish stable facts from noise, effectively mitigating hallucination in generated outputs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instruction Following | IFEval | -- | 625 | |
| Question Answering | ARC Easy | -- | 597 | |
| Physical Interaction Question Answering | PIQA | Accuracy74.8 | 333 | |
| Common Sense Reasoning | COPA | Accuracy83 | 197 | |
| Question Answering | ARC Challenge | Accuracy (ARC)44.11 | 142 | |
| Question Answering | OpenBookQA | Accuracy51.2 | 119 | |
| Social Interaction Question Answering | SIQA | Accuracy53.4 | 109 | |
| Commonsense Question Answering | CSQA | Accuracy61.83 | 58 | |
| Truthfulness | TruthfulQA | Truthfulness Score36.61 | 16 |