Wisteria: A Unified Multi-Scale Feature Learning Framework for DNA Language Model
About
DNA language model aims to decipher the regulatory grammar and semantic of genomes by capturing long range dependencies in DNA sequences. Existing methods emphasize long range token interactions but often ignore the interplay between local motifs and global dependencies. In this paper, we propose Wisteria, a genomic language model that integrates multi scale feature learning within a unified framework for DNA sequence. Specifically, Wisteria augments the Mamba based architecture with gated dilated convolutions to capture local motifs and regulatory patterns, while gated multilayer perceptrons refine global dependencies. We further introduce a Fourier based attention mechanism to support frequency domain modeling, periodic extension and length generalization. Across four experimental settings with both short and long range dependencies, Wisteria demonstrates strong performance on downstream benchmarks against competitive DNA language model baselines. These results indicate that Wisteria effectively unifies local and global dependency modeling for multi scale genomic sequence analysis.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Genomic sequence modeling | BEND | Gene Finding MCC0.67 | 6 | |
| Histone mark prediction | Nucleotide Transformer benchmark | H3 Accuracy84.47 | 5 | |
| Regulatory element prediction | Nucleotide Transformer benchmark | Enhancer Accuracy57.95 | 5 | |
| Splice site identification | Nucleotide Transformer benchmark | Splice Acceptor Accuracy98.13 | 5 | |
| Variant Effect Prediction | Human SNP 0–30k distance-to-TSS bin | AUROC0.681 | 5 | |
| Variant Effect Prediction | Human SNP 30–100k distance-to-TSS bin | AUROC0.663 | 5 | |
| Variant Effect Prediction | Human SNP (100k+ distance-to-TSS bin) | AUROC60.4 | 5 | |
| Sequence Classification | Genomic Benchmarks Mouse Enhancers (test) | Top-1 Accuracy79.5 | 4 | |
| Sequence Classification | Genomic Benchmarks Coding vs. Intergenomic (test) | Top-1 Accuracy93.5 | 4 | |
| Sequence Classification | Genomic Benchmarks Human Enhancer Ensembl (test) | Top-1 Accuracy89.8 | 4 |