Variational Combinatorial Sequential Monte Carlo Methods for Bayesian Phylogenetic Inference
About
Bayesian phylogenetic inference is often conducted via local or sequential search over topologies and branch lengths using algorithms such as random-walk Markov chain Monte Carlo (MCMC) or Combinatorial Sequential Monte Carlo (CSMC). However, when MCMC is used for evolutionary parameter learning, convergence requires long runs with inefficient exploration of the state space. We introduce Variational Combinatorial Sequential Monte Carlo (VCSMC), a powerful framework that establishes variational sequential search to learn distributions over intricate combinatorial structures. We then develop nested CSMC, an efficient proposal distribution for CSMC and prove that nested CSMC is an exact approximation to the (intractable) locally optimal proposal. We use nested CSMC to define a second objective, VNCSMC which yields tighter lower bounds than VCSMC. We show that VCSMC and VNCSMC are computationally efficient and explore higher probability spaces than existing methods on a range of tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Marginal log-likelihood estimation | DS5 50 Taxa, 378 Sites | MLL-9.45e+3 | 30 | |
| Marginal log-likelihood estimation | DS2 29 Taxa, 2520 Sites | MLL-2.87e+4 | 30 | |
| Marginal log-likelihood estimation | DS1 27 Taxa, 1949 Sites | Marginal Log-Likelihood-9.18e+3 | 30 | |
| Marginal log-likelihood estimation | DS3 36 Taxa, 1812 Sites | MLL-3.72e+4 | 30 | |
| Marginal log-likelihood estimation | DS4 41 Taxa, 1137 Sites | Marginal Log-Likelihood-1.71e+4 | 30 | |
| Marginal log-likelihood estimation | DS6 (50 Taxa, 1133 Sites) | MLL-9.30e+3 | 30 | |
| Marginal log-likelihood estimation | DS2 (test) | MLL-2.87e+4 | 11 | |
| Marginal log-likelihood estimation | DS1 (test) | MLL-9.18e+3 | 11 | |
| Marginal log-likelihood estimation | DS3 | MLL-3.72e+4 | 11 | |
| Marginal log-likelihood estimation | DS4 | MLL-1.71e+4 | 11 |