Byte-token Enhanced Language Models for Temporal Point Processes Analysis
About
Temporal Point Processes (TPPs) have been widely used for modeling event sequences on the Web, such as user reviews, social media posts, and online transactions. However, traditional TPP models often struggle to effectively incorporate the rich textual descriptions that accompany these events, while Large Language Models (LLMs), despite their remarkable text processing capabilities, lack mechanisms for handling the temporal dynamics inherent in Web-based event sequences. To bridge this gap, we introduce Language-TPP, a unified framework that seamlessly integrates TPPs with LLMs for enhanced Web event sequence modeling. Our key innovation is a novel temporal encoding mechanism that converts continuous time intervals into specialized byte-tokens, enabling direct integration with standard language model architectures for TPP modeling without requiring TPP-specific modifications. This approach allows Language-TPP to achieve state-of-the-art performance across multiple TPP benchmarks, including event time prediction and type prediction, on real-world Web datasets spanning e-commerce reviews, social media and online Q&A platforms. More importantly, we demonstrate that our unified framework unlocks new capabilities for TPP research: incorporating temporal information improves the quality of generated event descriptions, as evidenced by enhanced ROUGE-L scores, and better aligned sentiment distributions. Through comprehensive experiments, including qualitative analysis of learned distributions and scalability evaluations on long sequences, we show that Language-TPP effectively captures both temporal dynamics and textual patterns in Web user behavior, with important implications for content generation, user behavior understanding, and Web platform applications. Code is available at https://github.com/qykong/Language-TPP.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Event Prediction | StackOverflow | RMSE0.516 | 42 | |
| Event Forecasting | taxi | RMSE0.32 | 23 | |
| Event Prediction | Retweet | Accuracy59.7 | 18 | |
| Event sequence modeling | Chicago Crime | Accuracy27.2 | 13 | |
| Event sequence modeling | NYC Taxi | Accuracy92 | 13 | |
| Event sequence modeling | Amazon Review | Accuracy (%)69.7 | 13 | |
| Event sequence modeling | US Earthquake | Accuracy64 | 13 | |
| Multimodal Temporal Point Process prediction | DanmakuTPP (test) | RMSE5.3845 | 9 | |
| Multimodal Temporal Point Process prediction | TAXI-PRO (test) | RMSE0.3376 | 9 | |
| Event Time Prediction | Retweet | RMSE18.1 | 7 |