Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Anticipating Innovation Using Large Language Models

About

Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.

Enrico Maria Fenoaltea, Filippo Santoro, Giordano De Marzo, Segun Taofeek Aroyehun, Andrea Tacchella• 2026

Related benchmarks

TaskDatasetResultRank
Citation predictionPaecter citation dataset
RFR1.26
7
IPC classificationPatent dataset IPC
Macro F148.8
6
Title-abstract matchingPatent dataset Title-Abstract
AUC-ROC0.994
6
Innovation PredictionPatents Publication 2024 (test)
AUC-ROC (0.005%)0.936
5
Showing 4 of 4 rows

Other info

Follow for update