$\textit{BlockFormer}$ : Transformer-based inference from interaction maps
About
Inference from interaction maps, such as centromere identification from genome-wide chromosome conformation capture techniques -- notably Hi-C -- can be formulated as a generic inverse problem: infer a set of parameters given a map summarizing pairwise interactions between entities through blocks of variable numbers and sizes. In this work, we introduce a data-driven approach that leverages shared structure between these maps, such as global alignment between localized patterns, while handling the variability in number and size of entities arising in real-world data. Our approach relies on a transformer architecture capable of handling such variability and a custom simulator to generate abundant, yet computationally cheap synthetic data for training. Applied to the problem of centromere localization, the method accurately recovers their genomic positions across a wide range of species of various genome sizes.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Centromere identification | Saccharomyces cerevisiae (S.C.) | Error Rate0.002 | 8 | |
| Centromere identification | Lachancea kluyveri (L.K.) | Error Rate0.17 | 8 | |
| Centromere identification | Lachancea thermotolerans (L.T.) | Error Rate0.28 | 4 | |
| Centromere identification | Saccharomyces mikatae (S.M.) | Error Rate18 | 4 | |
| Centromere identification | K.L. (Kluyveromyces lactis) | Error Rate0.18 | 4 | |
| Centromere identification | S.P. (Schizosaccharomyces pombe) | Error Rate0.94 | 4 | |
| Centromere identification | A.T. (Arabidopsis thaliana) | Error Rate3.7 | 4 | |
| Centromere identification | P.F.r. (Plasmodium falciparum rings stage) | Error Rate0.18 | 4 | |
| Centromere identification | P.F.s. Plasmodium falciparum schizonts stage | Error Rate0.27 | 4 | |
| Centromere identification | P.F.t. Plasmodium falciparum trophozoites stage | Error Rate0.28 | 4 |