We have developed a computational method to generate in silico
amplifications from degenerate primer sets searched against user defined nucleotide databases. To illustrate the utility of De-MetaST-BLAST, we demonstrate its performance using a novel degenerate primer set designed for use on environmental samples. This primer set targets the bacterial boxB
gene, which encodes the oxygenase component of a multi-enzyme epoxidase (EC 1.14.13) that is specific to a benzoate catabolic pathway 
. Three metagenome libraries representing different environments, library size and DNA sequencing methods were searched and found to contain putative boxB
amplicons of the appropriate size (300 bp) (). shows the typical output of De-MetaST-BLAST for one of those database searches, which includes for each in silico
amplicon the top 10 BLASTx hits with their corresponding E-value and GenBank accession number.
boxB and 16S rRNA gene in silico amplicons identified in representative metagenomes using De-MetaST-BLAST.
Example of De-MetaST-BLAST output.
To retrieve an in silico amplicon, the program requires both primers to match their respective targets in a single sequence read or sequence assembly (contig). Thus, an important consideration in terms of selection of appropriate searchable databases is the average length of the sequence read or assembly contained within it, as well as the desired amplicon size. This concern may be alleviated as longer read sequencing technologies are developed and/or as sequence coverage and assembly algorithms improve. Interestingly, our analysis demonstrates that in silico amplicons of ~300 bp and ~190 bp, representing boxB and 16S rRNA gene amplicons, respectively, can be readily recovered from databases dominated by short read length sequences (e.g. AntarcticaAquatic; ). In fact, the 44 boxB amplicons derived from the AntarcticaAquatic dataset were found in reads that ranged from 348–541 bp in length. This result suggests that sequence coverage, or depth, is also a contributing factor to in silico amplicon recovery. Incidentally, all of the in silico amplicons recovered in this demonstration run were found to be homologous to the desired target (E-value ≤1e−4).
In terms of data mining, De-MetaST can provide complementary sequence data for gene diversity studies. As the De-MetaST output provides the sequence from the same genetic positions as that derived from a companion clone library, downstream analysis, such as sequence alignment and subsequent phylogenetic analysis, is streamlined. Thus, in silico amplicons retrieved from existing sequence datasets can be readily compared to experimentally derived clone library sequences. Furthermore, as the nucleotide sequences targeted by the primers are returned in the De-MetaST output, users can draw on that information to further refine their primers according to a desired level of functional and/or phylogenetic specificity. The program also has utility beyond searches of environmental sequence databases. It can be used to query any nucleotide dataset, including those derived from single organisms. Thus, it has use in assessing the specificity of primers targeting multi-copy or homologous genes within a single organism or group of organisms.
Benchmarks and System Requirements
De-MetaST-BLAST has been developed for the long-term support (LTS) Ubuntu operating systems 10.04 LTS and 12.04 LTS. While De-MetaST does not make use of multi-core processors, BLAST maintains that capability. Benchmarks were performed on an Intel i7-2600 processor (3.4 GHz quad-core, 8-thread) desktop using the developed degenerate boxB
primer set against the Waseca Farm Soil metagenome (AAFX01000000) 
. This search took approximately 11.7 s (). When the database size was artificially and incrementally increased up to five-fold (772 Mb) by replication of the original dataset, the processing time remained <1 min. Furthermore, to determine the effect of increased numbers of positive hits on run time, the libraries were seeded with additional sequences containing the target. Doubling of targets within the databases had no effect on run time (). In contrast to the relatively rapid processing speed of De-MetaST, implementation of the BLAST function can add significant processing time to the process, particularly if a local custom database is used. As an example, for the initial benchmark search against the locally installed Farm Soil metagenome that recovered two hits, the BLASTx function added 39.3 s using two threads. Thus, computational requirements and processing speed are primarily dictated by BLAST. When BLAST is performed remotely–the default setting (see below) –the return time is dependent upon availability and processing speeds of the NCBI servers.
Runtime duration of De-MetaST.
Both De-MetaST and De-MetaST-BLAST can be run on any operating system with a C++ compiler (e.g., standard Windows and Mac OS). However, users would need to ensure the BLAST installation is compatible with their processor.
Availability of De-MetaST-BLAST
The De-MetaST package and the De-MetaST-BLAST wrapper are made freely available at http://sourceforge.net/p/de-metast-blast/and
. These files are also provided as supplemental information to this publication (File S1 and File S2). Along with the program, screencast tutorial videos describe how to install the necessary programs as well as implement the software package with the example dataset provided. The De-MetaST package is self-contained and has no external dependencies, except a C++ compiler, such as g++. De-MetaST-BLAST requires a local BLAST+ suite installation that supports direct query of the NCBI nr protein database using NCBI servers via the –remote option. However, the program can also be configured to query a custom local database. Both approaches are described in tutorial videos provided. Installation of the De-MetaST program is estimated at 5 min, whereas installation of the BLAST+ suite is estimated to take 3 min, excluding download and extraction times, which are dependent on the user’s internet speed and processing power.
It was recently predicted that the increasing amounts of metagenome sequences will likely serve as a valuable resource in evaluation of the coverage and specificity of previously developed primer sets 
. De-MetaST-BLAST will provide users with a useful tool in such evaluations. De-MetaST is designed to provide in silico
amplicons generated by user defined degenerate primers found within a user defined nucleotide database. When paired with BLAST, the program returns the most homologous GenBank hits, which are useful in assessing the specificity of degenerate primers. However, the program does not evaluate PCR kinetics and efficiencies with degenerate primers. Thus, users are encouraged to consult appropriate references on the use and design of degenerate primers (e
), including those that discuss the merits of utilizing base analogs (e
., inosine; 
) that can reduce the overall degeneracy of primers.