Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

About

We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60$\times$ fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38$\times$ improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.

Jonathan Liu, Kia Ghods• 2026

Related benchmarks

TaskDatasetResultRank
Regulatory Activity PredictionGM12878
Median Activity Score4.195
5
Regulatory Activity PredictionHepG2
Median Activity Score4.1142
5
Regulatory Activity PredictionK562
Median Activity Score4.762
5
Regulatory Activity PredictionhESCT0 DNAse
Median Activity Score1.8609
4
Showing 4 of 4 rows

Other info

Follow for update