PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference
About
We show that Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a foundational model whose deductive outputs are invariant tensors up to a small perturbation. PLDR-LLM learns a singularity condition for the deductive outputs that enable the once-inferred energy-curvature tensor $\mathbf{G}_{LM}$ to replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for $\mathbf{G}_{LM}$ (G-cache) and KV-cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDPA) is a special case of PLDR-LLM where $\mathbf{G}_{LM}$ is predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV-cache and G-cache.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | WinoGrande | -- | 1085 | |
| Question Answering | ARC Easy | -- | 597 | |
| Physical Commonsense Reasoning | PIQA | Accuracy62.46 | 572 | |
| Question Answering | OpenBookQA | Normalized Accuracy26.2 | 102 | |
| Social Commonsense Reasoning | SIQA | Accuracy42.07 | 89 | |
| Question Answering | ARC Challenge | Normalized Accuracy23.12 | 86 | |
| Question Answering | TruthfulQA | TruthfulQA Score45.58 | 61 | |
| Commonsense Reasoning | HellaSwag | HS Score30.4 | 28 | |
| Zero-shot Reasoning | Multiple Reasoning Datasets Combined | Average Score 041.91 | 11 |