16 Jul 2021 » Postdoc position available: Bayesian phylogenetics in the densely sampled regime
The project
Statistical phylogenetic (evolutionary tree) methods have been essential for understanding the SARS-CoV-2 epidemic, whether for understanding origins, global spread, or lineage dynamics of the virus. These methods are extremely mature, with optimized code and software packages implementing complex models. However, these methods were developed with the “classical” sampling regime in mind: a relatively small number of sequences with relatively large divergences between them.
Methods for the classical sampling regime work to integrate out the uncertainty we have in ancestral sequences. Although the Felsenstein algorithm does allow for efficient calculation... (full post)
29 Mar 2021 » Postdoc position available: variational Bayes phylogenetic inference
The project
Bayesian phylogenetic (evolutionary tree) inference is important for genomic epidemiology and for our understanding of evolution. Trees, along with associated information, are complicated objects of inference, with intertwined discrete (tree structure) and continuous (dates, rates) structure. Random-walk Markov Chain Monte Carlo, implemented in packages such as BEAST (~20,000 citations) and MrBayes (>70,000 citations), is currently the only widely-applied inference technique.
We have recently developed a rich means of parameterizing tree distributions with a fixed parameter set. This renders them accessible to more modern inference techniques, such as variational Bayes.... (full post)
19 Oct 2020 » Life changes
Hi everyone. Through a combination of COVID and the arrival of a second child, I haven’t had time to write about our recent work. I’ll be back to posting at some point, but right now I’m focusing on being a dad and supporting my trainees. Thanks for understanding.
23 Jan 2020 » A Bayesian phylogenetic hidden Markov model for B cell receptor sequences
Summary
- antibodies develop within you via an evolutionary process
- understanding these evolutionary patterns is important for understanding how we respond to infection and vaccination
- we have found using Bayesian methods that evolutionary inferences are uncertain in this regime
- our most recent work develops a “Bayesian phylogenetic hidden Markov model,” which takes into account uncertainty in both the V(D)J recombination process and the evolutionary process
- this work reveals substantial amino-acid uncertainty in the inference of the unmutated common ancestor of VRC01, an important and heavily-studied anti-HIV antibody
- our results are described in a... (full post)
24 Aug 2019 » Variational Bayesian phylogenetic inference
In late 2017 we were stuck without a clear way forward for our research on Bayesian phylogenetic inference methods.
We knew that we should be using gradient (i.e. multidimensional derivative) information to aid in finding the posterior, but couldn’t think of a way to find the right gradient. Indeed, we had recently finished our work on a variant of Hamiltonian Monte Carlo (HMC) that used the branch length gradient to guide exploration, along with a probabilistic means of hopping from one tree structure to another when a branch became zero. Although this project was a... (full post)
18 Jun 2019 » Bayesian phylogenetic inference without sampling trees
Most every description of Bayesian phylogenetics I’ve read proceeds as follows:
- “Bayesian phylogenetic analyses are conducted using a simulation technique known as Markov chain Monte Carlo (MCMC).” (Alfaro & Holder, 2006)
- “Posterior probabilities are obtained by exploring tree space using a sampling technique, called Markov chain Monte Carlo (MCMC).” (Lemey et al, The Phylogenetic Handbook)
- “Once the biologist has decided on the data, model and prior, the next step is to obtain a sample from the posterior. This is done by using MCMC…” (Nascimento et al, 2017.)
With statements like these... (full post)
05 Dec 2018 » Generalizing tree probability estimation via Bayesian networks
Posterior probability estimation of phylogenetic tree topologies from an MCMC sample is currently a pretty simple affair. You run your sampler, you get out some tree topologies, you count them up, normalize to get a probability, and done. It doesn’t seem like there’s a lot of room for improvement, right?
Wrong.
Let’s step back a little and think like statisticians. The posterior probability of a tree topology is an unknown quantity. By running an MCMC sampler, we get a histogram, the normalized version of which will converge to the true posterior in the limit of... (full post)
15 May 2018 » Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity
High-throughput sequencing of our adaptive immune repertoires holds great promise for understanding immune state. These sequences implicitly contain a wealth of information on past and present exposures to infectious and autoimmune diseases, to environmental stimuli, and even to tumor-derived antigens. In principle, we should be able to use these sequences of rearranged receptors to infer their eliciting antigens, either individually or collectively.
We’re starting to see neat progress in these areas for T cell receptors (TCRs). Some recent studies compare TCR repertoire between individuals who do or do not have some immune state, such as...
(full post)
Complete list of all posts