Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

About

In financial backtesting, large language models pretrained on internet-scale data risk introducing lookahead bias that undermines their forecasting validity, as they may have already seen the true outcome during training. To address this, we present DatedGPT, a family of twelve 1.3B-parameter language models, each trained from scratch on approximately 100 billion tokens of temporally partitioned data with strict annual cutoffs spanning 2013 to 2024. We further enhance each model with instruction fine-tuning on both general-domain and finance-specific datasets curated to respect the same temporal boundaries. Perplexity-based probing confirms that each model's knowledge is effectively bounded by its data cutoff year, while evaluation on standard benchmarks shows competitive performance with existing models of similar scale. We provide an interactive web demo that allows users to query and compare responses from models across different cutoff years.

Yutong Yan, Raphael Tang, Zhenyu Gao, Wenxi Jiang, Yao Lu• 2026

Related benchmarks

TaskDatasetResultRank
Instruction FollowingIFEval--
625
Question AnsweringARC Easy
Accuracy71.6
597
Physical Commonsense ReasoningPIQA
Accuracy71.8
572
Question AnsweringARC-E
Accuracy52
416
Multitask Language UnderstandingMMLU
Accuracy26.3
413
Commonsense ReasoningHellaSwag
HellaSwag Accuracy54.6
350
Multi-task Language UnderstandingMMLU
Accuracy25.3
321
Science Question AnsweringARC-C
Accuracy34.8
193
Question AnsweringARC-C
Accuracy35.2
192
Science Question AnsweringARC-E
Accuracy52
184
Showing 10 of 15 rows

Other info

Follow for update