Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

About

Transformers suffer from a high computational cost that grows with sequence length for self-attention, making inference in long streams prohibited by memory consumption. Constant-memory alternatives such as RNNs and SSMs compress history into states with fixed size and thus lose long-tail dependencies, while methods that memorize contexts into parameters, such as Test-Time Training (TTT), are prone to overfitting token-level projection and fail to preserve the causal effect of context in pretrained LLMs. We propose Absorber LLM, which formulates long-context retention as a self-supervised causal synchronization: after absorbing historical contexts into parameters, a contextless model should match the original model with full context on future generations. We optimize this objective by synchronizing internal behaviors of the updated model with the original one, ensuring context absorption and generalization. Experiments on long-context and streaming benchmarks show that Absorber LLM reduces inference memory and improves accuracy over prior parameter-as-memory baselines.

Zhixin Zhang, Shabo Zhang, Chengcan Wu, Zeming Wei, Meng Sun• 2026

Related benchmarks

TaskDatasetResultRank
Text GenerationBooks3
Amortized per-token Latency (ms)29.73
15
Multi-step ReasoningMusique 12k ~ 16k
Accuracy (macro)31.6
4
Multi-step ReasoningMusique 16k ~ 20k
Macro Accuracy29.5
4
Multi-step ReasoningMusique 8k ~ 12k
Macro Accuracy33.8
4
SummarizationSamsum 4k~8k
BLEURT Score0.417
4
Text ClassificationAgnews 1k ~ 2k
Macro Avg Accuracy37.9
4
Text ClassificationAgnews (2k ~ 4k)
Macro Accuracy38.2
4
Text ClassificationAgnews 4k ~ 8k
Macro Avg Accuracy35.5
4
SummarizationSamsum 2k~4k
BLEURT0.423
4
SummarizationSamsum 8k~16k
BLEURT Score0.408
3
Showing 10 of 10 rows

Other info

Follow for update