PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

About

We show that Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a foundational model whose deductive outputs are invariant tensors up to a small perturbation. PLDR-LLM learns a singularity condition for the deductive outputs that enable the once-inferred energy-curvature tensor $\mathbf{G}_{LM}$ to replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for $\mathbf{G}_{LM}$ (G-cache) and KV-cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDPA) is a special case of PLDR-LLM where $\mathbf{G}_{LM}$ is predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV-cache and G-cache.

Burc Gokden• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	--	1442
Physical Commonsense Reasoning	PIQA	Accuracy62.46	696
Question Answering	ARC Easy	--	597
Social Commonsense Reasoning	SIQA	Accuracy42.07	112
Question Answering	ARC Challenge	Normalized Accuracy23.12	105
Question Answering	OpenBookQA	Normalized Accuracy26.2	102
Question Answering	TruthfulQA	TruthfulQA Score45.58	61
Commonsense Reasoning	HellaSwag	HS Score30.4	43
Zero-shot Reasoning	Multiple Reasoning Datasets Combined	Average Score 041.91	11

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord