Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

About

Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is fast: selection and finetuning both happen per query, making each a direct bottleneck. Existing methods trade speed for quality: fast retrieval is often redundant, while stronger diversity-aware selection adds prohibitive per-query cost. We introduce HullFT, a geometric approach to TTFT that addresses both bottlenecks. Given a query, HullFT first represents the query embedding as a sparse convex combination of few training sequences, using efficient projection-free Frank-Wolfe optimization. This yields a support set that is inherently relevant and diverse. We then convert the fractional convex weights into an exact integer multiset for finetuning through a geometric integerization procedure. The resulting multiplicities naturally create repeated examples, which we exploit with Gradient Reuse to amortize forward-backward computation across repeated finetuning steps. Our experiments show that HullFT improves the quality-efficiency tradeoff over current state-of-the-art TTFT methods, achieving lower bits-per-byte at substantially lower total runtime.

Alaa Khamis, Alaa Maalouf• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingThe Pile (test)--
53
Language ModelingThe Pile ArXiv (test)
BPB Score88.57
6
Language ModelingThe Pile FreeLaw (test)
BPB (%)71.63
6
Language ModelingThe Pile Github (test)
Bits Per Byte (BPB)40.88
6
Language ModelingThe Pile Enron (test)
BPB57.32
6
Language ModelingThe Pile HackerNews (test)
BPB (Bits Per Byte)0.8583
6
Language ModelingThe Pile NIH (test)
BPB66.05
6
Language ModelingThe Pile PubMed Abs. (test)
BPB (%)87.7
6
Language ModelingThe Pile PubMed Cent. (test)
BPB%84.13
6
Language ModelingThe Pile StackEx. (test)
BPB (%)80.37
6
Showing 10 of 13 rows

Other info

Follow for update