Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Wisteria: A Unified Multi-Scale Feature Learning Framework for DNA Language Model

About

DNA language model aims to decipher the regulatory grammar and semantic of genomes by capturing long range dependencies in DNA sequences. Existing methods emphasize long range token interactions but often ignore the interplay between local motifs and global dependencies. In this paper, we propose Wisteria, a genomic language model that integrates multi scale feature learning within a unified framework for DNA sequence. Specifically, Wisteria augments the Mamba based architecture with gated dilated convolutions to capture local motifs and regulatory patterns, while gated multilayer perceptrons refine global dependencies. We further introduce a Fourier based attention mechanism to support frequency domain modeling, periodic extension and length generalization. Across four experimental settings with both short and long range dependencies, Wisteria demonstrates strong performance on downstream benchmarks against competitive DNA language model baselines. These results indicate that Wisteria effectively unifies local and global dependency modeling for multi scale genomic sequence analysis.

Weihua Wang, Haoji Li, Feilong Bao, Lei Yang, Guanglai Gao• 2026

Related benchmarks

TaskDatasetResultRank
Genomic sequence modelingBEND
Gene Finding MCC0.67
6
Histone mark predictionNucleotide Transformer benchmark
H3 Accuracy84.47
5
Regulatory element predictionNucleotide Transformer benchmark
Enhancer Accuracy57.95
5
Splice site identificationNucleotide Transformer benchmark
Splice Acceptor Accuracy98.13
5
Variant Effect PredictionHuman SNP 0–30k distance-to-TSS bin
AUROC0.681
5
Variant Effect PredictionHuman SNP 30–100k distance-to-TSS bin
AUROC0.663
5
Variant Effect PredictionHuman SNP (100k+ distance-to-TSS bin)
AUROC60.4
5
Sequence ClassificationGenomic Benchmarks Mouse Enhancers (test)
Top-1 Accuracy79.5
4
Sequence ClassificationGenomic Benchmarks Coding vs. Intergenomic (test)
Top-1 Accuracy93.5
4
Sequence ClassificationGenomic Benchmarks Human Enhancer Ensembl (test)
Top-1 Accuracy89.8
4
Showing 10 of 15 rows

Other info

Follow for update