Self Knowledge Re-expression: A Fully Local Method for Adapting LLMs to Tasks Using Intrinsic Knowledge
About
While the next-token prediction (NTP) paradigm enables large language models (LLMs) to express their intrinsic knowledge, its sequential nature constrains performance on specialized, non-generative tasks. We attribute this performance bottleneck to the LLMs' knowledge expression mechanism, rather than to deficiencies in knowledge acquisition. To address this, we propose Self-Knowledge Re-expression (SKR), a novel, task-agnostic adaptation method. SKR transforms the LLM's output from generic token generation to highly efficient, task-specific expression. SKR is a fully local method that uses only unannotated data, requiring neither human supervision nor model distillation. Experiments on a large financial document dataset demonstrate substantial improvements: over 40% in Recall@1 for information retrieval tasks, over 76% reduction in object detection latency, and over 33% increase in anomaly detection AUPRC. Our results on the MMDocRAG dataset surpass those of leading retrieval models by at least 12.6%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Anomaly Detection | TAD (test) | Overall AUC99.1 | 23 | |
| Text-to-Text Retrieval | MMDocRAG (test) | MRR67.4 | 19 | |
| Object Detection | T_OD | mIoU72.6 | 14 | |
| Text-to-Image Retrieval | MMDocRAG (test) | MRR69.1 | 13 | |
| Information Retrieval | MMDocRAG | Recall@1087.7 | 10 | |
| Anomaly Detection | CUB-200 2011 | Accuracy0.986 | 6 | |
| Object Detection | CUB-200-2011 (test) | IoU69.2 | 6 | |
| Text-to-Image Retrieval | DocVQA 2020 (test) | MRR0.931 | 2 | |
| Text-to-Image Retrieval | SciMMIR (test) | MRR56.2 | 2 |