Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Looped Transformers for Length Generalization

About

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, they struggle with length generalization, i.e., handling inputs of unseen lengths. In this work, we demonstrate that looped Transformers with an adaptive number of steps significantly improve length generalization. We focus on tasks with a known iterative solution, involving multiple iterations of a RASP-L operation - a length-generalizable operation that can be expressed by a finite-sized Transformer. We train looped Transformers using our proposed learning algorithm and observe that they learn highly length-generalizable solutions for various tasks.

Ying Fan, Yilun Du, Kannan Ramchandran, Kangwook Lee• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMathVista
Score43.27
322
Multimodal Capability EvaluationMM-Vet
Score51.24
282
Massive Multi-discipline Multimodal UnderstandingMMMU--
88
Multimodal UnderstandingMMB
Score60.65
30
Multimodal Hallucination EvaluationHallusionBench
Hallucination Score34.61
14
Complex Multimodal ReasoningMM-Star
Reasoning Score42.38
10
OCR RobustnessOCR Bench
Score69.9
10
Showing 7 of 7 rows

Other info

Follow for update