Microbial community profiling using marker genes such as the 16S rRNA greatly expanded our knowledge of the diversity of the microbial world (Tringe and Hugenholtz, 2008
). Phylogenetic trees are key to our understanding of the microbial world (Pace, 1997
), and provide an important view into microbial data (Ludwig et al., 2004
). Software such as QIIME (Caporaso et al., 2010
) has kept pace with the increasing rate of sequence acquisition to allow statistical analysis of these data, but tools to visualize and manipulate phylogenetic trees and corresponding metadata (Huson et al., 2007
; Letunic and Bork, 2011
), which worked well on datasets that were characteristic a few years ago, are less suitable for datasets containing millions of sequences.
A key question in microbial ecology is which portions of a phylogenetic reference tree are differentially represented in specific groups of samples. To address this question, users should be able to load trees with thousands of tips, assign taxonomic labels to the tips, and color the branches based on data about each sample.
Here we present TopiaryExplorer, a software package that facilitates visual exploration of large phylogenetic trees, including information about each sample and each tip. This integration of what is often called ‘sequence metadata’ is crucial to understanding how sequences (and their source organisms) are distributed across environments, and the processes underlying the observed patterns. TopiaryExplorer additionally allows display and revision of the taxonomy (including multiple taxonomies for the same tree, facilitating taxonomic comparisons), and integration with databases that contain sample/tree information (an example database is provided to assist users in creating their own: see below). It also provides key user interface improvements including: the ability to dynamically collapse or expand the whole tree using several different tree layout algorithms, allowing rapid visual exploration of which lineages are shared among or unique to specific subsets of environments; the ability to spawn new windows for investigation of specific subtrees and to view multiple trees at the same time; control over labels and layout features critical for producing publication-quality graphics; the ability to export results in any of several graphical and machine-readable formats for further analysis; and the ability to handle datasets of hundreds of thousands of tips, which can easily be created from larger datasets by OTU picking with UCLUST (Edgar, 2010
) or related tools. TopiaryExplorer metadata is provided as tab-separated text, and trees are provided as Newick-formatted strings. These standard file formats allow data generated with different tools to be easily imported into TopiaryExplorer.
TopiaryExplorer is written in Java using Processing, which allows for rapid tree visualization and PDF export using OpenGL. The tree layout rendering algorithms in TopiaryExplorer were adapted from PyCogent (Knight et al., 2007
). Several strategies were applied to efficiently visualize large trees and associated metadata, including caching node lookups rather than running multiple lookups of the same node, and using sparse table representations for storing and accessing metadata.