v1.1.alpha10: FastTree CAT model support and big-tree heuristics

28 Oct 2011, by Erick

This rolls out our FastTree CAT model support and a new collection of heuristics for the initial evaluation phase of placement.

To run pplacer using a FastTree tree, build your tree using the -gtr flag and save the log file using the -log option. The log file is used in the same way as the statistics file when building a reference package. If you haven’t built one already, just have a look at the taxtastic quickstart. From there, the reference package is used just like any other. Note that pplacer won’t have to re-infer site categories (faster) if you are using the alignment in the reference package. Placing on FastTree trees takes about about 1/4 the memory of the equivalent tree inferred using GTRGAMMA in RAxML.

This release also contains command line flags that control the new “fig” heuristics. These heuristics greatly accelerate placement on reference trees when the reference tree is big (e.g. > 10k leaves). In short, the tree gets divided up into subtrees, that we call “figs”. These are connected units of the tree such that the distance between any two leaves is less than the value specified with the --fig-cutoff flag on the command line. The initial evaluation of edges for a placement then happens in three phases: first, evaluate each of the figs using representative edges. Then, merge figs that are close to one another in score and sort them. Finally, treat each (potentially merged) fig as a unit in the baseball heuristics; if we try all of the edges of one fig then we drop down to the next highest scoring fig and evaluate its edges. We have not seen a noticeable drop in accuracy using --fig-cutoff 0.2, and it’s much faster for a 35K taxon tree. Your mileage may vary.

Both of these new features are experimental.

I’m afraid that there is a new version of the installation script for those of you who are compiling. Our recent work placing on big trees broke our previous XML library and we’ve had to replace it. We’re hoping this will be the last change for a while.

all posts