Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SpecTUS: Spectral Translator for Unknown Structures annotation from EI-MS spectra

About

Compound identification and structure annotation from mass spectra is a well-established task widely applied in drug detection, criminal forensics, small molecule biomarker discovery and chemical engineering. We propose SpecTUS: Spectral Translator for Unknown Structures, a deep neural model that addresses the task of structural annotation of small molecules from low-resolution gas chromatography electron ionization mass spectra (GC-EI-MS). Our model analyzes the spectra in \textit{de novo} manner -- a direct translation from the spectra into 2D-structural representation. Our approach is particularly useful for analyzing compounds unavailable in spectral libraries. In a rigorous evaluation of our model on the novel structure annotation task across different libraries, we outperformed standard database search techniques by a wide margin. On a held-out testing set, including \numprint{28267} spectra from the NIST database, we show that our model's single suggestion perfectly reconstructs 43\% of the subset's compounds. This single suggestion is strictly better than the candidate of the database hybrid search (common method among practitioners) in 76\% of cases. In a~still affordable scenario of~10 suggestions, perfect reconstruction is achieved in 65\%, and 84\% are better than the hybrid search.

Adam H\'ajek, Michal Star\'y, Elliott Price, Filip Jozefov, Helge Hecht, Ale\v{s} K\v{r}enek• 2025

Related benchmarks

TaskDatasetResultRank
Database SearchNIST (test)
Similarity (Sim_k)84
10
Database SearchSWGDRUG
Sim_k86
10
Database SearchCayman
Sim_k78
10
Database SearchMONA
Sim_k0.58
10
Database SearchMONA library
Win Rate66.8
9
Database SearchCayman library
Win Rate84.2
9
Database search retrievalNIST (test)
Win Rate88.9
9
Database SearchSWGDRUG (test)
Win Rate (vs BDC)77.5
3
Showing 8 of 8 rows

Other info

Follow for update