07 Apr 2025 » Two new approaches to learn about antibody somatic hypermutation
Somatic hypermutation (SHM) is one of the craziest things I’ve ever heard of: we have enzymes that intentionally damage and mutate our DNA! This process is the foundation of the affinity maturation process that enables us to generate high-affinity antibodies against just about any target. Without it, we’d be limited to “just” random recombinants of germline genes.
Our SHM work was motivated by a desire to reconcile statistical analysis of somatic hypermutation with what we know mechanistically from lab experiments. Whenever one can bring reality into modeling there is the potential for a double win: one can learn something about underlying mechanism, and one can get better inferences by using a model that reflects underlying reality.
My interests were piqued by several fascinating papers. On the computational side there were Spisak et al 2020 and Zhou and Kleinstein 2020 who found evidence for an effect of absolute position along the sequence in SHM rates. The Spisak paper in particular showed a very irregular pattern of per-site effects that seemed surprising. One the lab... (full post)
03 Apr 2025 » Let's blog!
I love reading about others’ work in publications, but also in a more informal way that ties motivation, results, history, and context together. That’s what I’ve tried to do in this blog, and I’m going to start that again now.
This is an intentional return to Web 1.0.
Scientists have been unpaid content creators on social media platforms that may or may not have their best interests at heart. In contrast, email and blogs are powerful tools that use open protocols.
There is a modern argument against long-form blogs: modern attention spans are insufficient to read a long-format blog post. I hear that and loudly rebel against it. If reading 1300 words strains our attention span, let the effort strengthen our attention muscle. This is a hill I’m willing to die on for the thinking world and for my children.
I’ll announce blog posts on social media, but I would most like to connect via email and RSS. If you would like to join our Google Groups group, here’s the link: https://groups.google.com/g/matsengrp.
I’m... (full post)
16 Jul 2021 » Postdoc position available: Bayesian phylogenetics in the densely sampled regime
The project
Statistical phylogenetic (evolutionary tree) methods have been essential for understanding the SARS-CoV-2 epidemic, whether for understanding origins, global spread, or lineage dynamics of the virus. These methods are extremely mature, with optimized code and software packages implementing complex models. However, these methods were developed with the “classical” sampling regime in mind: a relatively small number of sequences with relatively large divergences between them.
Methods for the classical sampling regime work to integrate out the uncertainty we have in ancestral sequences. Although the Felsenstein algorithm does allow for efficient calculation and updating of phylogenetic likelihoods, even this is not enough to handle the massive trees we would like to use for SARS-CoV-2. Furthermore, the Felsenstein algorithm only works for IID models between sites.
With SARS-CoV-2 we are in a completely different sampling regime, with over 2 million genomes for a virus without very much evolutionary divergence. That means that we frequently sample identical viruses, and we often sequence the direct ancestor of a given virus. This greatly limits the uncertainty... (full post)
29 Mar 2021 » Postdoc position available: variational Bayes phylogenetic inference
The project
Bayesian phylogenetic (evolutionary tree) inference is important for genomic epidemiology and for our understanding of evolution. Trees, along with associated information, are complicated objects of inference, with intertwined discrete (tree structure) and continuous (dates, rates) structure. Random-walk Markov Chain Monte Carlo, implemented in packages such as BEAST (~20,000 citations) and MrBayes (>70,000 citations), is currently the only widely-applied inference technique.
We have recently developed a rich means of parameterizing tree distributions with a fixed parameter set. This renders them accessible to more modern inference techniques, such as variational Bayes. We have developed a proof-of-concept application of phylogenetic variational Bayes using modern general-purpose gradient estimators. Our collaborative group also has preliminary integrations with both PyTorch and TensorFlow.
To achieve the promise of variational Bayes phylogenetics, we will develop:
- structure learning methods that will infer the discrete aspect of our variational approximation
- fitting methods that leverage the special structure of our variational phylogenetic models
- a modeling framework that integrates with PyTorch, enabling rich models that leverage covariates such as... (full post)
19 Oct 2020 » Life changes
Hi everyone. Through a combination of COVID and the arrival of a second child, I haven’t had time to write about our recent work. I’ll be back to posting at some point, but right now I’m focusing on being a dad and supporting my trainees. Thanks for understanding.
23 Jan 2020 » A Bayesian phylogenetic hidden Markov model for B cell receptor sequences
Summary
- antibodies develop within you via an evolutionary process
- understanding these evolutionary patterns is important for understanding how we respond to infection and vaccination
- we have found using Bayesian methods that evolutionary inferences are uncertain in this regime
- our most recent work develops a “Bayesian phylogenetic hidden Markov model,” which takes into account uncertainty in both the V(D)J recombination process and the evolutionary process
- this work reveals substantial amino-acid uncertainty in the inference of the unmutated common ancestor of VRC01, an important and heavily-studied anti-HIV antibody
- our results are described in a preprint which is now being revised for PLOS Computational Biology
A brief description of antibody affinity maturation
In order to defend against a very large and ever-mutating pool of pathogens, your body randomly generates, and then optimizes, a large collection of antibodies. These antibodies are displayed as so-called B cell receptors on the surface of specialized B cells. The random generation is a process called V(D)J recombination, in which a collection of candidate genes are randomly selected, trimmed... (full post)
24 Aug 2019 » Variational Bayesian phylogenetic inference
In late 2017 we were stuck without a clear way forward for our research on Bayesian phylogenetic inference methods.
We knew that we should be using gradient (i.e. multidimensional derivative) information to aid in finding the posterior, but couldn’t think of a way to find the right gradient. Indeed, we had recently finished our work on a variant of Hamiltonian Monte Carlo (HMC) that used the branch length gradient to guide exploration, along with a probabilistic means of hopping from one tree structure to another when a branch became zero. Although this project was a lot of fun and was an ICML paper, it wasn’t the big advance that we needed: these continuous branch length gradients weren’t contributing enough to the fundamental challenge of keeping the sampler in the good region of phylogenetic tree structures. But it was hard to even imagine a good solution to the central question: how can we take gradients in the discrete space of phylogenetic trees?
Meanwhile, in another line of research we were trying to separate out the... (full post)
18 Jun 2019 » Bayesian phylogenetic inference without sampling trees
Most every description of Bayesian phylogenetics I’ve read proceeds as follows:
- “Bayesian phylogenetic analyses are conducted using a simulation technique known as Markov chain Monte Carlo (MCMC).” (Alfaro & Holder, 2006)
- “Posterior probabilities are obtained by exploring tree space using a sampling technique, called Markov chain Monte Carlo (MCMC).” (Lemey et al, The Phylogenetic Handbook)
- “Once the biologist has decided on the data, model and prior, the next step is to obtain a sample from the posterior. This is done by using MCMC…” (Nascimento et al, 2017.)
With statements like these in popular (and otherwise excellent!) reviews, it’s not surprising that people confuse Bayesian phylogenetics and Markov chain Monte Carlo (MCMC). Well, let’s be clear.
MCMC is one way to approximate a Bayesian phylogenetic posterior distribution. It is not the only way.
In this post I’ll describe two of our recent papers that together give a systematic, rather than random, means of approximating a phylogenetic posterior distribution.
Without a doubt MCMC is the most popular means of approximating the posterior. MCMC... (full post)
Complete list of all posts