Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MS2MetGAN: Latent-space adversarial training for metabolite-spectrum matching in MS/MS database search

About

Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite-spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search-based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite-spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite-spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.

Meng Tsai, Alexzander Dwyer, Estelle Nuckels, Yingfeng Wang• 2026

Related benchmarks

TaskDatasetResultRank
Metabolite IdentificationCASMI FP 2017
Accuracy86.3
18
Metabolite IdentificationGNPS S
Accuracy75.65
18
Metabolite IdentificationEMBL-MCF
Accuracy93.25
18
Metabolite IdentificationMONA
Accuracy77.19
18
Metabolite IdentificationCASMI SP 2017
Accuracy90.48
18
Metabolite IdentificationCASMI FP 2016
Accuracy87.4
18
Metabolite IdentificationGNPS-M
Accuracy79.9
18
Metabolite IdentificationCASMI SP 2016
Accuracy86.07
18
Metabolite IdentificationCASMI 2022P
Accuracy37.89
18
Database SearchingMetaCyc Database (test)
MIDAS66.67
1
Showing 10 of 11 rows

Other info

Follow for update