LiFT: Does Instruction Fine-Tuning Improve In-Context Learning for Longitudinal Modelling by Large Language Models?

About

Longitudinal NLP tasks require reasoning over temporally ordered text to detect persistence and change in human behavior and opinions. However, in-context learning with large language models struggles on tasks where models must integrate historical context, track evolving interactions, and handle rare change events. We introduce LiFT, a longitudinal instruction fine-tuning framework that unifies diverse longitudinal modeling tasks under a shared instruction schema. LiFT uses a curriculum that progressively increases temporal difficulty while incorporating few-shot structure and temporal conditioning to encourage effective use of past context. We evaluate LiFT across five datasets. Models trained on longitudinal tasks with different levels of temporal granularity are tested for generalisability on two separate datasets. Across models with different parameter sizes (OLMo (1B/7B), LLaMA-8B, and Qwen-14B), LiFT consistently outperforms base-model ICL, with strong gains on out-of-distribution data and minority change events.

Iqra Ali, Talia Tseriotou, Mahmud Elahi Akhter, Yuxiang Zhou, Maria Liakata• 2026

Related benchmarks

Task	Dataset	Result
Longitudinal Classification	AnnoMI (80/20)	Macro F1 Score52.6	24
Longitudinal Classification	LRS (80/20)	Macro-F157.8	24
Longitudinal Classification	TalkLife (80/20)	Macro F134.6	24
Longitudinal Classification	Reddit (test)	Macro F1 Score52.1	24
Longitudinal Classification	CMV (test)	Macro-F157.7	24

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord