11 Dec 2018 » Open postdoc position to work on variational Bayesian phylogenetic inference
We are obsessed with finding efficient alternatives to random-walk MCMC for Bayesian phylogenetic inference. We have developed online sequential Monte Carlo theory and algorithms, phylogenetic Hamiltonian Monte Carlo, and inference via direct topology search and efficient marginal likelihood computation.
Come work with us on a strategy that is producing very promising results: variational Bayesian phylogenetic inference based on subsplit Bayesian networks. There are lots of opportunities for projects to flesh out this direction. We would like to find someone who can collaborate with us on methods development and implementation, thus knowledge of both Bayesian statistics and programming expertise are needed. Experience with an existing code base for phylogenetics would be a big plus.
We’re stoked but are happy to wait for the right person to fill the position. If you aren’t ready until this summer, no problem!
Apply here or just get in touch.
05 Dec 2018 » Generalizing tree probability estimation via Bayesian networks
Posterior probability estimation of phylogenetic tree topologies from an MCMC sample is currently a pretty simple affair. You run your sampler, you get out some tree topologies, you count them up, normalize to get a probability, and done. It doesn’t seem like there’s a lot of room for improvement, right?
Let’s step back a little and think like statisticians. The posterior probability of a tree topology is an unknown quantity. By running an MCMC sampler, we get a histogram, the normalized version of which will converge to the true posterior in the limit of a large number of samples. We can use that simple histogram estimate, but nothing is stopping us from taking other estimators of the per-topology posterior distribution that may have nicer properties.
For real-valued samples we might use kernel density estimates to smooth noisy sampled distributions, which may reduce error when sampling is sparse. Because the number of phylogenies is huge, MCMC is computationally expensive, and we are naturally... (full post)
15 May 2018 » Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity
High-throughput sequencing of our adaptive immune repertoires holds great promise for understanding immune state. These sequences implicitly contain a wealth of information on past and present exposures to infectious and autoimmune diseases, to environmental stimuli, and even to tumor-derived antigens. In principle, we should be able to use these sequences of rearranged receptors to infer their eliciting antigens, either individually or collectively.
We’re starting to see neat progress in these areas for T cell receptors (TCRs). Some recent studies compare TCR repertoire between individuals who do or do not have some immune state, such as an immunization, an autoimmune disease or a viral infection and work to find sequence-level differences between the repertoires. The Walczak-Mora team recently upped the bar by not requiring a control cohort. There has also been interesting progress on predicting epitope specificity from TCR sequence using structurally-informed sequence analysis.
12 May 2018 » The Bayesian optimist's guide to adaptive immune receptor repertoire analysis
Immune receptor sequencing is stochastic through and through. We have cells with random V(D)J rearrangements that are stimulated through some random process of exposures, which lead to some random amount of expansion, and in the B cell case there is some random process of mutation and selection. So why don’t we use methods incorporating that uncertainty into our analysis?
We’ve tried to do this in our work, and have made some progress, but there is so much left to be done. When Sarah Cobey and Patrick Wilson kindly invited me to contribute to their special issue of Immunological Reviews, I knew I wanted to step back and ask:
If computation was no barrier, how would we design an analysis framework that integrated out uncertainty in unknown quantities and took advantage of the hierarchical structure inherent in immune receptor data?
I teamed up with Branden Olson, a Statistics PhD student in the lab, and went to work. It was a fun exercise to think... (full post)
02 May 2018 » Benchmarking tree and ancestral sequence inference for B cell receptor sequences
Phylogenetic tools, in particular for ancestral sequence reconstruction, get used a lot in the B cell receptor (BCR) sequence analysis world. For example, they get used to reconstruct intermediate antibodies that then get synthesized in the lab and tested for binding (Wu et. al, 2011). But how well do phylogenetic tools work in this parameter regime? Although there have been countless benchmarking studies for phylogenetics, the case of B cell sequence evolution is different than the usual setting for phylogenetics:
- Sampling and sequencing, especially for direct sequencing of germinal centers, is dense compared to divergence between sequences. Because of the resulting distribution of short branch lengths, zero-length branches and multifurcations representing simultaneous divergence are common.
- The somatic hypermutation (SHM) process in affinity maturation is highly nucleotide-context-dependent process.
- Repertoire sequencing typically focuses on the coding sequence of antibodies, which are under very strong selective constraint. This contrasts with the neutral evolution assumptions of most phylogenetic algorithms, as well as the simulation... (full post)
19 Apr 2018 » Predicting B cell receptor substitution profiles using public repertoire data
Can we predict how sites of an antibody will tolerate amino acid substitutions? Kristian Davidsen posed this question shortly after he arrived in my group, pointing out that being able to do such prediction would be quite useful. For example, engineered antibodies sometimes aggregate into clumps or have other properties that that make them useless for mass production. If we could figure out ways to change the amino acid sequence of an antibody without changing binding properties, that could help us avoid aggregation and make a more useful antibody.
How to start to address this complex and high-dimensional question? Although people have started to do deep mutational scanning on antibodies this type of data is hard to come by. On the other hand, B cell repertoire (i.e. antibody-coding) sequence data is becoming plentiful. B cells undergo affinity maturation to improve binding in collections of sequences called “clonal families” grouped by naive ancestor sequence (more background here). Although it’s not quite the... (full post)
10 Jan 2018 » Postdoc opening to learn about antibody development during HIV superinfection
Please see https://b-t.cr/t/506 for details.
01 Dec 2017 » Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data
Every B cell receptor sequence in a repertoire came from a V(D)J recombination of germline genes. Each individual has only certain alleles of these genes in their germline, and knowing this set improves the accuracy of all aspects of BCR sequence analysis, from alignment to phylogenetic ancestral sequence reconstruction. This germline allele set can be estimated directly from BCR sequence data, and it’s time to treat such estimation as part of standard BCR sequence analysis pipelines.
This central message is not new, but it’s worth emphasizing because doing germline set inference is not part of most current studies of B cell receptor (BCR) sequences.
Indeed, the most common way to annotate sequences is to align them one by one to the full set of alleles present in the IMGT database, which has hundreds of alleles. Each individual has only a fraction of these alleles in their genome.
Unsurprisingly, aligning sequences one by one to the whole IMGT set can cause problems. Imagine that...
Complete list of all posts