v1.1.alpha07: SQL overhaul, multiplicity and fixes

Work continues on tuning up the core code. The biggest change is that guppy now makes actual SQLite databases rather than a collection of SQLite commands. This means database building is much much faster. For those of you compiling the code, you will now need godi-sqlite3.

We have also finished full support for multiplicity of placements now (i.e. > 1 sequence name per placement). They are supported in the database code. There is also a guppy redup command for re-adding duplicate sequences to placefiles generated from deduplicated sequence files. Deduplication will make your pipeline much faster, and it’s easy with seqmagick (the guppy redup documentation has some details).

Also

  • fixed all of the sequence parsers to be tail-recursive, so parsing large files no longer causes segfaults.
  • better consistency of output flags across all guppy commands.
  • renamed the --normal flag for guppy kr to --gaussian to avoid confusion with normalization.
  • shuffling for guppy kr is now much more memory efficient, and fixed bug that was throwing off significance estimation.
  • guppy pca now defaults to scaling eigenvalues to percent variance.
  • Re-added in JTT which had been mysteriously dropped.