Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference

About

We show that Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a foundational model whose deductive outputs are invariant tensors up to a small perturbation. PLDR-LLM learns a singularity condition for the deductive outputs that enable the once-inferred energy-curvature tensor $\mathbf{G}_{LM}$ to replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for $\mathbf{G}_{LM}$ (G-cache) and KV-cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDPA) is a special case of PLDR-LLM where $\mathbf{G}_{LM}$ is predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV-cache and G-cache.

Burc Gokden• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande--
1085
Question AnsweringARC Easy--
597
Physical Commonsense ReasoningPIQA
Accuracy62.46
572
Question AnsweringOpenBookQA
Normalized Accuracy26.2
102
Social Commonsense ReasoningSIQA
Accuracy42.07
89
Question AnsweringARC Challenge
Normalized Accuracy23.12
86
Question AnsweringTruthfulQA
TruthfulQA Score45.58
61
Commonsense ReasoningHellaSwag
HS Score30.4
28
Zero-shot ReasoningMultiple Reasoning Datasets Combined
Average Score 041.91
11
Showing 9 of 9 rows

Other info

Follow for update