Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

About

Tokenizer-free language models eliminate the tokenizer step of the language modeling pipeline by operating directly on bytes; patch-based variants further aggregate contiguous byte spans into patches for efficiency. However, the average patch size chosen at the model design stage governs a tight trade-off: larger patches reduce compute and KV-cache footprint, but degrade modeling quality. We trace this trade-off to patch lag: until a patch is fully observed, byte predictions within it must rely on a stale representation from the previous patch to preserve causality; this lag widens as patches grow larger. We introduce Scratchpad Patching (SP), which inserts transient scratchpads inside each patch to aggregate the bytes seen so far and refresh patch-level context for subsequent predictions. SP triggers scratchpads using next-byte prediction entropy, selectively allocating compute to information-dense regions and enabling post-hoc adjustment of inference-time compute. Across experiments on natural language and code, SP improves model quality at the same patch size; for example, even at $16$ bytes per patch, SP-augmented models match or closely approach the byte-level baseline on downstream evaluations while using a $16\times$ smaller KV cache over patches and $3$-$4\times$ less inference compute.

Lin Zheng, Vasilisa Bashlovkina, Timothy Dozat, Dan Garrette, Laura Rimell, Joshua Maynez• 2026

Related benchmarks

TaskDatasetResultRank
Language UnderstandingMMLU
Accuracy34.7
844
Question AnsweringOpenBookQA
Accuracy48
305
Question AnsweringBoolQ
Accuracy66.3
201
Natural Language UnderstandingARC Easy
Accuracy71
36
Natural Language UnderstandingHellaSwag
Accuracy59.7
35
Natural Language UnderstandingARC-C
Accuracy41.9
34
Natural Language UnderstandingWinoGrande
Accuracy59
30
Natural Language UnderstandingPIQA
PIQA Accuracy73.8
16
Showing 8 of 8 rows

Other info

Follow for update