|Home | About | Journals | Submit | Contact Us | Français|
The advent of next-generation DNA sequencing platforms has revolutionized molecular microbial ecology by making the detailed analysis of complex communities over time and space a tractable research pursuit for small research groups. However, the ability to generate 105–108 reads with relative ease brings with it many downstream complications. Beyond the computational resources and skills needed to process and analyze data, it is difficult to compare datasets in an intuitive and interactive manner that leads to hypothesis generation and testing.
We developed the free web service VAMPS (Visualization and Analysis of Microbial Population Structures, http://vamps.mbl.edu) to address these challenges and to facilitate research by individuals or collaborating groups working on projects with large-scale sequencing data. Users can upload marker gene sequences and associated metadata; reads are quality filtered and assigned to both taxonomic structures and to taxonomy-independent clusters. A simple point-and-click interface allows users to select for analysis any combination of their own or their collaborators’ private data and data from public projects, filter these by their choice of taxonomic and/or abundance criteria, and then explore these data using a wide range of analytic methods and visualizations. Each result is extensively hyperlinked to other analysis and visualization options, promoting data exploration and leading to a greater understanding of data relationships.
VAMPS allows researchers using marker gene sequence data to analyze the diversity of microbial communities and the relationships between communities, to explore these analyses in an intuitive visual context, and to download data, results, and images for publication. VAMPS obviates the need for individual research groups to make the considerable investment in computational infrastructure and bioinformatic support otherwise necessary to process, analyze, and interpret massive amounts of next-generation sequence data. Any web-capable device can be used to upload, process, explore, and extract data and results from VAMPS. VAMPS encourages researchers to share sequence and metadata, and fosters collaboration between researchers of disparate biomes who recognize common patterns in shared data.
The investigation of microbial communities has exploded in the past 10 years with the advent of next-generation DNA sequencing, uncovering an incredible diversity of microbes across different environments, from oceans to soils, from plant roots to the human body. The need to analyze marker gene datasets comprising 105–108 sequence reads has spawned a new generation of bioinformatics tools specifically designed for large-scale, sequence-based microbial ecology studies. Most of these tools target either quality filtering and clustering of sequences (AmpliconNoise , USEARCH ) or the assignment of taxonomy or gene function (RDP , SILVA , MG-RAST ). Two commonly used software packages (mothur  and QIIME ) provide a suite of programs for filtering, clustering and assigning taxonomy, with additional tools for downstream analysis. Both packages, however, require installation of software and rely on a command-line interface. Although command-line interfaces are more efficient and can be incorporated into batch processing scripts, they are not as intuitive to many users as a graphical user interface (GUI).
Ecologists and clinicians who design and conduct experiments utilizing next generation sequencing are relying more and more heavily on bioinformaticists and biostatisticians to analyze and interpret avalanches of data. So much so that the analysis of 'Big Data’ is becoming a specialized field distinct from biological interpretation. All too often, this leads to a disconnect between the researchers and their own data, relegating data visualization to the end-product of analysis, rather than an integral part of the analytical process itself . We developed a free web service, Visualization and Analysis of Microbial Population Structures (VAMPS, http://vamps.mbl.edu), to serve as a bridge over this chasm. VAMPS offers a simple point-and-click user interface to a wide-range of visualization and analysis tools for both interactive and iterative exploration of microbial communities through comparison of marker gene data.
There are no operating system, CPU, storage capacity, or memory capacity requirements: users need only a web browser and reasonable Internet connectivity. The VAMPS code is non-proprietary; however, the scale of the site and its use of multiple servers, cluster nodes, and multiple independent software packages make it infeasible for individual users to download and install locally. We welcome all users to take advantage of the computing and database storage capacity available at our website.
The interactive GUI encourages data exploration by enabling extensive control over sample and taxonomic selections and an intuitive path through iterative analyses and visualizations (Figure 1). This non-linear interface allows researchers to leverage their intuition and expertise in observing data patterns, leading to new insights, improved hypotheses, and a more thorough understanding of microbial communities.
VAMPS users generally start their analyses by uploading next-generation marker gene sequences, typically bacterial ribosomal RNA (rRNA) gene sequences, but also archaeal or protist rRNA or fungal ITS gene sequences. After quality filtering the uploaded sequences, VAMPS can assign taxonomic names and cluster the sequence data into OTUs using any of several commonly used algorithms (e.g., oligotyping , reference-based clustering , SLP with average linkage , or UCLUST ), with or without linking to taxonomic identifiers. Alternatively, users can perform their own quality filtering, taxonomy assignments, or OTU clustering, and upload these data as input to the VAMPS analytical tools.
Although the website can be used with a public account (username “guest”), researchers who choose to upload their own data need to establish a free personal account. This account means that access to private datasets is password-protected and not session dependent. Researchers can log in and out of the website freely over the course of their research project.
VAMPS includes most common alpha and beta diversity metrics and a variety of tunable visualization approaches to explore analysis results (Figure 1). These include:
• Heatmaps – color-coded matrices of community similarity that can be reordered to reveal patterns among datasets and can display different beta diversity metrics above and below the diagonal;
• Dendrograms – tree-like diagrams clustering datasets by community similarity using one of several user-selected algorithms;
• Principal Coordinate Analyses – 2- and 3-D graphical representations of the relative similarity of datasets and metadata, when available;
• Bar and Pie Charts – graphs depicting the relative abundance of taxa or OTUs in each dataset;
• Taxonomy and OTU Tables – tables of absolute counts or relative abundances of sequences associated with each taxon or OTU in the selected datasets, with taxonomic names linking to NCBI, Wikipedia, and the Encyclopedia of Life and the graphing of any particular taxon or OTU abundances across the datasets;
• Underlying sequences – links to the sequence distributions underlying the populations, how they were taxonomically classified, and tools to search for the presence of a query sequence in the other datasets.
Users can download the analyses and images they generate on VAMPS for inclusion in publications, or they can import results from VAMPS into other software for downstream analyses. They can designate their VAMPS datasets as public or private (password-protected), and selectively share their private data with specific collaborators. Once published, datasets on VAMPS are generally made public, facilitating the data sharing required by most granting agencies and scientific journals.
Unique to VAMPS is its level of flexibility in taxonomy selection. Users can analyze microbial communities at any taxonomic level (domain, phylum, class, order, etc.), and they can also combine multiple taxonomic levels and select taxa based on abundance thresholds. For example, a user can select only taxa from a particular class, or select all taxa except a particular genus, or they can mix and match, concurrently selecting different taxonomic levels from different parts of the phylogenetic tree. As an example, a user could analyze data at the phylum level for most phyla, but at the class level for Proteobacteria and at the genus level for Bacteroidetes and Firmicutes. Users can also select taxa and OTUs based on relative abundance thresholds. This facilitates the analysis of subtle patterns of diversity obscured by dominant taxa, or conversely, the analysis of dominant or moderately abundant taxa without the potential noise of low abundance or rare taxa (Figure 2).
VAMPS also diverges from other tools by empowering users with access to the underlying sequence distributions for a selected taxon or OTU. Sequence data can be used to design further experiments, cross-check taxonomy, or query external databases. Users can interrogate the internal database for the occurrence of specific sequences in other VAMPS datasets, revealing distribution patterns of individual sequences across projects and environments. When query sequences match private datasets, users are invited to contact the anonymous owners of the private data without other aspects of the dataset being revealed, fostering new collaborations.
To facilitate comparative studies, we have loaded into VAMPS over 2,500 ready-to-use public datasets. These data have already been quality-controlled and assigned taxonomy and automatically appear in the selection window alongside the users’ own private datasets. They include data from the Human Microbiome Project (HMP) [17,18] the International Census of Marine Microbes (ICoMM) , the Microbial Inventory of Aquatic Long Term Ecological Research Stations (MIRADA) , the Census of Deep Life (CoDL) , and the Microbiology of the Built Environment (MoBE) . Each of these projects has an entry portal with information about the projects and links to additional resources. Similar portals for other projects can be integrated into the VAMPS framework. In addition, published data from smaller projects are available for many environments including municipal water supplies, marine waters, ocean sediments, deep-sea hydrothermal vents, salt marshes, sand, and multiple biotic hosts such as humans, mice, chickens, tree leaves, and coral reefs.
As an illustrative vignette, Figure 3 demonstrates a simplified example analysis using water samples taken over the course of a year in Falmouth, Massachusetts, USA. We used VAMPS to explore the distribution pattern of one of the dominant taxa and its effect on the clustering of water quality samples. The example includes the use of bar charts, heatmaps, abundance graphing, dendrograms, and alpha diversity calculations.
VAMPS fills a critical niche by providing ecologists and clinicians with the ability to conduct analyses that they would otherwise rely on bioinformaticians to provide. Researchers can upload and process their own data, which is maintained on the website, and, once processed, is available to use each time they log in. Its taxonomic and abundance level selection capabilities offer advantages over other programs. The underlying database includes thousands of public datasets encompassing a range of environments including the International Census of Marine Microbes and the Human Microbiome Project. These public data are accessible immediately, without download or processing, and can be analyzed separately or together with users’ private data, facilitating comparative analyses and increasing the ability to recognize important diversity patterns.
VAMPS has been instrumental in various research publications. As an example, VAMPS was used to study the microbiota of the ileal pouch of patients undergoing treatment for ulcerative colitis . Analyses showed that the pouch microbiome of healthier patients evolved toward a state similar to patients with a healthy colon while the microbiome of patients prone to recurrent pouchitis tended to evolve in other directions. Additional projects have made use of the breadth of ICoMM datasets to evaluate global distributions of marine microbes [24,25], as well as the diversity of rare taxa in sand and salt marsh environments [26,27].
VAMPS is a simple-to-use website, providing universal access to microbial community marker gene data and to many visualization and analysis tools. VAMPS provides a much-needed interface for ecologists and clinicians to directly and intuitively analyze their microbial community data. The interactive nature of the website lends itself to the iterative exploratory processes so important in gaining insights into natural systems. Even for bioinformaticians well-versed with other common toolsets, the range of analyses and data visualization options and its non-linear approach makes VAMPS a valuable contribution to microbial community research.
Project name: VAMPS
Project home page:http://vamps.mbl.edu
Operating system(s): Platform independent
Other requirements: none
Any restrictions to use by non-academics: none
The authors declare they have no competing interests.
MLS conceived of the project, SMH, DMW, HGM, and MLS designed the project, evaluated software capabilities, and provided feedback on website implementation. AV, AS, and AME implemented the project. SMH, DMW, MLS and HGM wrote the manuscript. All authors read and approved the final manuscript.
Special thanks to both Richard Fox for providing the system administration support at the Bay Paul Center necessary to keep the VAMPS website up and running and to Philip Neal who participated in the early developmental stages of the website.
Funding provided by the National Science Foundation [grant NSF/BDI 0960626 to SMH] and the Sloan Foundation through a collaborative project with the Microbiology of the Built Environment program.