Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fast and Low-Cost Genomic Foundation Models via Outlier Removal

About

To address the challenge of scarce computational resources in genomic modeling, we introduce GERM, a genomic foundation model with strong compression performance and fast adaptability. GERM improves upon models like DNABERT-2 by eliminating outliers that hinder low-rank adaptation and post-training quantization, enhancing both efficiency and robustness. We replace the vanilla attention layer with an outlier-free mechanism inspired by associative memory models. By removing outliers during both pre-training and fine-tuning, this approach accelerates adaptation, reduces computational costs, and enhances quantization robustness within acceptable loss margins. Additionally, we propose GERM-T, a strategy that employs small-step continual learning within the outlier-free framework, leveraging original checkpoints to avoid retraining from scratch. Empirically, GERM improves fine-tuning performance by 37.98% and quantization by 64.34% over the baseline model. It also reduces average kurtosis by 92.14% and maximum infinity norm by 82.77%. Compared to leading methods, GERM consistently delivers superior performance, offering a practical solution for genomic modeling in resource-constrained settings. Code is available at https://github.com/MAGICS-LAB/GERM.

Haozheng Luo, Chenghao Qiu, Maojiang Su, Zhihan Zhou, Zoe Mehta, Guo Ye, Jerry Yao-Chieh Hu, Han Liu• 2025

Related benchmarks

TaskDatasetResultRank
Genomic Sequence ClassificationNucleotide Transformer Benchmark Human 500M (test)
MCC0.5653
42
Genomic sequence modelingNucleotide Transformer (NT) 2.5B multi-species
MCC57.16
39
Genomic classificationGERM
MCC59.73
31
Genome sequence classificationGenome sequence classification (test)
MCC59.73
12
Genomic Sequence ClassificationGenomic Benchmark--
5
Showing 5 of 5 rows

Other info

Follow for update