Replaying evolution to learn about the fitness landscape of affinity maturation

B cell and T cell diagram

Your adaptive immune system contains a Darwinian evolutionary mechanism through which B cells evolve to have improved affinity, that is, binding ability to antigens. The B cells work to retrieve antigen via their membrane-bound antibody (called a B cell receptor). If they succeed, they are rewarded with increased capacity to reproduce, at least in part via T cell signaling (see diagram borrowed from Victora and Mesin 2014).

But what exactly is this reward function? Implicit inside this evolutionary system, there is some translation of statements like

a B cell is \(x\) fold better at binding compared to a baseline

to

this B cell reproduces at a \(y\)-fold faster rate than the baseline.

Will DeWitt calls the reward function an affinity-fitness response curve. He likens it to a power-band curve for car engines. The power band curve tells you how much power a given engine produces when it is running at a given RPM. It’s not something like the size of a piston that one can measure by dissecting the engine. Instead, it’s an emergent property of the entire system, depending on everything from the air intake system through engine layout to the ignition cycle.

In a collaboration with the Victora lab we have worked for the last five years to learn the affinity-fitness response curve, which is an important step for immunology and evolutionary biology. For immunology, this curve is fundamental to affinity maturation but is not yet quantified. Existing research has made some statements about its overall characteristics, such as that it may plateau at high values of affinity, but a quantitative description has not yet been achieved. Like the power band curve, this quantitative relationship cannot be measured in vitro because it is an emergent property of the entire intact evolving system, including T cell help, antigen availability, and cellular competition.

For evolutionary biology, understanding how genetic changes translate through antibody-antigen binding affinity to cellular fitness is the mapping of a “fitness landscape.” Although the idea fitness landscape inference is far from new in evolutionary biology, there are a few ways in which the fitness landscape we infer here is uniquely precise:

  • Fitness is determined by a single, directly measurable phenotype: the ability of the B cell to obtain antigen. This contrasts with systems like Lenski’s long-term evolution experiment, where optimal phenotypes depend on complex interactions among evolving lineages.
  • This phenotype is controlled by two loci (the heavy and light chains of the antibody) that are vertically inherited together in B cell lineages. This contrasts with other evolving systems.
  • The relevant variables can be assayed in great detail and in high throughput. We can learn the mutation patterns accurately, and we can understand B cell binding using biolayer interferometry (BLI) and deep mutational scanning experiments.

There is a lot to gain, so we have put a lot of love into these experiments!


We call our collaboration the “replay project”, as it involves “replaying” evolution again and again. The project emerged from discussions in early 2020 between Tatsuya Araki, when he was in the Victora lab, and Will DeWitt, when he was a PhD student with Kelley Harris and me. After Tatsuya graduated, the work was picked up by Ashni Vora, another PhD student. Will did a postdoc with Yun Song and is now faculty at UW Genome Sciences. The project has been continually active through these career arcs, and we have recently preprinted the first results from the work.

The evolutionary process that improves B cell receptor binding happens in specialized structures called “germinal centers” (GCs): microanatomical sites within lymph nodes where B cells undergo rapid proliferation, somatic hypermutation, and selection. On a conceptual level, the Victora lab engineered mice to have essentially a single progenitor (“naive”) B cell receptor that can bind the immunization antigen reasonably well, initiating the affinity maturation process. In practice, they achieved this experimental setup through B cell transfer:

In this way, the system always evolves with the same original B cell receptor in response to the same antigen. By dissecting out individual germinal centers, Tatsuya and Ashni were able to sequence 119 replicates of the same evolutionary process at 15 and 20 days. This is a landmark dataset with more replicates than other experimental evolutionary systems. Note that each one of these germinal centers required delicate care: this is not a high-throughput assay!

As if this wasn’t enough, the experiment includes:

  • A deep mutational scan giving the amount by which mutations improve binding against the target antigen using a yeast display experiment by Tyler Starr. This is essential because it allows us to predict the affinity of the variants that are sequenced.
  • A passenger allele experiment that gave us a custom probabilistic model of somatic hypermutation specifically for our naive B cell receptor. This allows us to factor out the influence of the mutation process, which as previously described is essential!
  • Antibody-antigen co-crystal structures from the Ward lab, providing structural context for observed mutations.
  • An additional denser time course of immunization experiments using bulk 10X data rather than germinal center extraction (this has the advantage of many more cells, but these cells are a mix between different germinal centers). This was essential for the traveling wave model described below.

All of this data has to get analyzed computationally. The unsung heroes of this story include Tiago Castro who preprocessed the data and Jared Galloway who orchestrated the many bioinformatics steps in a lovely reproducible pipeline.


Over the years, we developed three distinct analytical approaches, each leading to a separate manuscript. Each approach offered different insights into the affinity-fitness relationship and revealed different limitations of birth-death modeling.

Will and I originally formulated the model in a way that seemed obvious from the perspective of probabilistic modeling: as a multitype birth-death process. Thinking forward in time, the rate with which lineages give birth to daughter lineages is in proportion to their fitness. That fitness is determined by the affinity-fitness response function: a function from the phenotype (the binding affinity extrapolated from Tyler’s yeast display experiment) to the birth rate in the branching process.

By parameterizing that function, it seemed that we could use existing technology to formulate a likelihood and optimize it. Branching process likelihoods via ODEs have their roots in the original BiSSE models and were detailed for the multitype case in Barido-Sottani et al.. Because we have so many samples we should have plenty of signal, and because we can take gradients through ODEs with packages such as diffrax we should be able to fit the models.

Seems straightforward, no? Well, it ended up being something of an adventure…

1. “Classical” birth-death inference

The first effort to directly infer using existing birth-death modeling approaches found a champion in Thanasi Bakis, a graduate student in Volodymyr Minin’s group at UC Irvine. Thanasi’s work showed that even on simple simulated data, the Barido-Sottani approach consistently underestimated the death rate. The reason was that their approach does not condition on the existence of lineages continuing to the sampling time. Using the unconditioned likelihood is a reasonable approximation if you have a single sample, but when you have many samples it creates problems.

We also realized that there is a fundamental incompatibility between crucial modeling assumptions and the germinal center biology. Birth-death models have computable likelihoods because each lineage is considered to be evolving independently. This is mostly true for germinal center B cells, however it does conflict with population size constraints. Namely, the germinal center is of finite size, and so there is some mechanism that limits its size through suppressing birth or increasing death. This is a model misspecification that will distort inference.

The preprint with Thanasi’s work is up as Bayesian inference of antibody evolutionary dynamics using multitype branching processes. Thanasi accurately reports the distortion coming from population size constraints in Figure 6. The paper is full of other interesting things, including an analysis of the approximation to full ODE inference used in the Barido-Sottani paper. Hats off to Thanasi and Volodya for being great collaborators in this work.

These limitations of classical birth-death modeling led us to explore alternative approaches.

2. Simulation-based birth-death inference

As we were thinking about the problems inherent in classical birth-death modeling, inference of phylodynamic models via simulation and neural networks came on the scene. We were inspired by papers such as Voznica, Zhukova et al. and Thompson et al.. The concept is straightforward: one draws samples from the forward-time process for a range of parameters. Each sample is turned into some vector, either by encoding the simulated tree or by using summary statistics. A neural network is then trained to take that vector and find the corresponding parameters used to generate the sample.

This sounded straightforward but in practice was a lengthy endeavor led courageously by Duncan Ralph. The main challenge is that if we want the simulator to halfway-decently mimic real germinal centers, it has to have a lot of parameters, many of which have unknown values. Even parameters that we think we know are often hard to interpret in the context of our simulation. Because the GC is not well mixed, for instance, the population size with which a cell competes is smaller than the experimentally-measured number of cells in the GC. But how much smaller is this “effective” population size than the measured one?

Although the data is rich and neural networks are powerful, it’s difficult to constrain all of these parameters using the core birth-death inference. For this reason, we split the parameter inference into two sub-problems with two different inferential approaches: the parameters about the overall behavior of the GC come from matching summary statistics, while those that concerned the affinity-fitness response function come from neural-network inference. The result is up on eLife, containing the following summary for the evolutionary biology community:

Simulation-based methods have enabled inference on this otherwise-intractable, messy biological problem. However, they were not a panacea.

3. Traveling wave modeling

While we were working studiously using the phylogenetics-based approaches, another opportunity appeared via another data source. Where the phylogenetics approaches reconstruct past evolutionary history, instead we can assay the past dynamics just by taking more samples through time. Temporal sampling via the GC extraction described above would be prohibitive, however bulk lymph node sequencing with 10X technology is tractable. The trade-off is that we cannot reconstruct phylogenetic trees: the trees assume that sequences come from a single GC, but the 10X data mixes sequences across GCs.

Will, working with Armita Nourmohammad, formulated the evolution of the distribution of affinity in these samples through time using a partial-differential-equation-based model. Briefly, they modeled the likelihood of observing the data under a probabilistic model, but that model itself is the solution of a PDE involving the affinity-fitness function. One can differentiate through this PDE solution through the modern magic of diffrax. You can see the result in Will’s notebook. Will also estimates the extinction probability of a cell of a given affinity at a given time. That’s something that should get B cell biologists excited!

Will also demonstrates how this evolutionary system suffers from “push of the past” bias: the tendency for phylogenetic reconstructions to overestimate the pace of early evolution. Indeed, when we reconstruct ancestral affinities from surviving lineages, it appears that affinity spikes up quite early in the experiment. However, this reconstruction only includes lineages that survived to the present, a highly biased subpopulation. We can demonstrate this bias by comparing these ancestral inferences to the 10X time samples and Will’s reconstructed population dynamics. This is a special feature of our system compared to typical evolutionary systems: we can “go back in time” and sample directly!

The companion “pull of the present” effect also holds lessons for B cell researchers. Previous work has suggested that germinal center evolution is “permissive” because sequenced GCs often contain low-affinity cells. We demonstrate that this observation is due to those low-affinity cells not yet being purged.

Another central theme in evolution is that of stochasticity and contingency in evolution, going back to Gould or before. We found a striking pattern: despite significant stochasticity in which specific mutations arose, the phenotypic evolution (i.e. the improvement in binding affinity) remained remarkably consistent across replicates. The reason we are able to make such statements is that we have 119 replicates of an evolutionary system, for which we can thank the extraordinary efforts of Tatsuya, Ashni, and the rest of the Victora lab.

This and much more is reported in the core replay manuscript. All of the data is public, and we hope nicely organized, so we hope that many others will go beyond the analyses that we have performed.


It is difficult for me to describe what the experience of working on this project has been like. It has been awe-inspiring, highly motivating, and a beautiful merging of theory and experiment. It has also been exhausting, but through all of it I have emerged with great respect for everyone involved. Will and Gabriel are both phenomenal scientists who don’t back down on rigor in their own domains. Thanasi and Duncan pushed forward projects despite the difficulties we encountered along the way.

There is more to come! Believe it or not, another even longer-term collaboration with the Victora lab is still several years away… and it’ll be worth the wait.