LLM4ES: Learning User Embeddings from Event Sequences via Large Language Models
About
This paper presents LLM4ES, a novel framework that exploits large pre-trained language models (LLMs) to derive user embeddings from event sequences. Event sequences are transformed into a textual representation, which is subsequently used to fine-tune an LLM through next-token prediction to generate high-quality embeddings. We introduce a text enrichment technique that enhances LLM adaptation to event sequence data, improving representation quality for low-variability domains. Experimental results demonstrate that LLM4ES achieves state-of-the-art performance in user classification tasks in financial and other domains, outperforming existing embedding methods. The resulting user embeddings can be incorporated into a wide range of applications, from user segmentation in finance to patient outcome prediction in healthcare.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Age Prediction | Age | Accuracy65.1 | 12 | |
| Classification | Rosbank | AUC0.849 | 12 | |
| Age Classification | Private Dataset | Accuracy69.2 | 6 | |
| Gender Classification | Private Dataset | AUC78.9 | 6 | |
| Regression | Private Dataset | MAE1.15e+4 | 6 |