v1.1.alpha07: SQL overhaul, multiplicity and fixes

23 May 2011, by Erick

Work continues on tuning up the core code. The biggest change is that guppy now makes actual SQLite databases rather than a collection of SQLite commands. This means database building is much much faster. For those of you compiling the code, you will now need godi-sqlite3.

We have also finished full support for multiplicity of placements now (i.e. > 1 sequence name per placement). They are supported in the database code. There is also a guppy redup command for re-adding duplicate sequences to placefiles generated from deduplicated sequence files. Deduplication will make your pipeline much faster, and it’s easy with seqmagick (the guppy redup documentation has some details).

Also

  • fixed all of the sequence parsers to be tail-recursive, so parsing large files no longer causes segfaults.
  • better consistency of output flags across all guppy commands.
  • renamed the --normal flag for guppy kr to --gaussian to avoid confusion with normalization.
  • shuffling for guppy kr is now much more memory efficient, and fixed bug that was throwing off significance estimation.
  • guppy pca now defaults to scaling eigenvalues to percent variance.
  • Re-added in JTT which had been mysteriously dropped.
all posts