GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies
About
Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Marginal log-likelihood estimation | DS1 27 Taxa, 1949 Sites | Marginal Log-Likelihood-7.11e+3 | 30 | |
| Marginal log-likelihood estimation | DS2 29 Taxa, 2520 Sites | MLL-2.64e+4 | 30 | |
| Marginal log-likelihood estimation | DS3 36 Taxa, 1812 Sites | MLL-3.37e+4 | 30 | |
| Marginal log-likelihood estimation | DS4 41 Taxa, 1137 Sites | Marginal Log-Likelihood-1.33e+4 | 30 | |
| Marginal log-likelihood estimation | DS5 50 Taxa, 378 Sites | MLL-8.25e+3 | 30 | |
| Marginal log-likelihood estimation | DS6 (50 Taxa, 1133 Sites) | MLL-6.73e+3 | 30 | |
| Marginal log-likelihood estimation | DS8 64 Taxa, 1008 Sites | Marginal Log-Likelihood-8.74e+3 | 29 | |
| Marginal log-likelihood estimation | DS7 59 Taxa, 1824 Sites | Marginal Log-Likelihood-3.73e+4 | 27 | |
| Marginal log-likelihood estimation | DS1 (test) | MLL-7.11e+3 | 11 | |
| Marginal log-likelihood estimation | DS2 (test) | MLL-2.64e+4 | 11 |