|Home | About | Journals | Submit | Contact Us | Français|
Insects are commonly infected with multiple viruses including those that cause sublethal, asymptomatic, and latent infections. Traditional methods for virus isolation typically lack the sensitivity required for detection of such viruses that are present at low abundance. In this respect, next generation sequencing technologies have revolutionized methods for the discovery and identification of new viruses from insects. Here we review both traditional and modern methods for virus discovery, and outline analysis of transcriptome and small RNA data for identification of viral sequences. We will introduce methods for de novo assembly of viral sequences, identification of potential viral sequences from BLAST data, and bioinformatics for generating full-length or near full-length viral genome sequences. We will also discuss implications of the ubiquity of viruses in insects and in insect cell lines. All of the methods described in this article can also apply to the discovery of viruses in other organisms.
Viruses can be found wherever life is present and are likely to be the most abundant and diverse biological entities on earth [1–3]. In addition to increased understanding of their diversity and evolution, the viruses associated with insects are of particular interest from the standpoints of (a) protection of beneficial insects from virus infection (e.g., the honey bee, Apis mellifera L. ; silk moths, Bombyx mori L.), (b) practical use of insect viruses for management of pestiferous insects including invasive species, (e.g., various lepidopteran pests including the codling moth, Cydia pomonella Linnaeus and the velvet bean caterpillar, Anticarsia gemmatalis (Hübner) , (c) identification of insects that vector viruses important to human, animal and plant health , and (d) use of insect viruses as vectors for protein expression or gene silencing, and adaptation of virus-like particles for a variety of purposes [7,8]. Contrary to the typical view of viruses as pathogens, viruses may also have mutualistic or symbiotic relationships with their hosts, which are of fundamental interest . For example, polydnaviruses are required for the survival of parasitoid wasps as they develop in the host insect . A densovirus has been reported to function in wing morph determination of the host aphid . A bacteriophage that infects the aphid facultative endosymbiont, Hamiltonella defensa, protects the pea aphid from attack by the parasitoid Aphidius ervi by killing the developing wasp larva [12,13].
Traditionally, viruses were isolated from insects that displayed an abnormal phenotype as a result of virus infection. While infection with some insect viruses, such as baculoviruses, results in clear symptoms and ultimately death of the host, many virus infections are asymptomatic. In recent years, with the development of the Next Generation Sequencing (NGS) technologies, it has become evident that asymptomatic or covert virus infections are ubiquitous. These viruses may accumulate to relative low titers in the host organism (i.e., in a chronic infection), or become latent, such that virus production ceases altogether. These viruses would not readily be detected by use of traditional protocols for virus discovery.
The use of NGS over the past five years has revolutionized the discovery of microorganisms and viruses. The technology allows for rapid, inexpensive, high throughput and accurate sequencing for identification of viral sequences derived from whole insects or specific tissues, and for viruses present at low titers that do not cause symptoms in the host. NGS has also been used for virus surveillance, for arthropod-borne viruses for example .
The first virus discovered was Tobacco mosaic virus (TMV) discovered by Dmitri Iwanowsk, a Russian botanist in 1882. He showed that extracts from diseased tobacco plants could transmit disease to other plants after passage through ceramic filters that were sufficiently fine to retain bacteria. The first virus particles (TMV) were observed following the invention of the electron microscope in 1931. Although it was known from the 1930s that viruses consisted of a protein shell and nucleic acids, methods for detection of viral protein and viral genomic RNA or DNA were not developed until the late 1970s and early 1980s. Conventional approaches for virus discovery are to collect insects from multiple populations from multiple locations, and use several micrograms of material at minimum for virus purification. Virus purification protocols vary, but material is typically homogenized, centrifuged and filtered under sterile conditions. Further purification may include ultracentrifugation and sucrose or cesium chloride gradient steps. The sample may then be used for visualization of virus particles by electron microscopy, infection of cultured insect cells and observation for cytopathic effects [15–17], and infection of insects by spraying, injection or oral inoculation to fulfill Koch’s postulates. Viruses would then be identified and further characterized by use of serological methods and nucleic acid hybridization (where specific antisera or probes are available), molecular cloning and genomic sequencing . Use of a cell line is beneficial in that it allows for culture and amplification of viruses that cause either acute or covert infections in the host. However, the lack of appropriate insect cell lines for such virus screens is a common limiting factor and it cannot be assumed that all viruses present in an insect would replicate in a given cell line.
Electron microscopy (EM) has been and continues to be one of the most important techniques for virus discovery . Indeed early classification of viruses depended heavily on the morphology of the viral capsid as revealed by EM. One of the primary advantages for the use of EM is that organism or virus specific reagents are not required for virus identification. Although identification of a virus beyond the family level may not be possible, EM provides leads for more detailed characterization of the virus. In addition EM provides important confirmation of the presence of a virus following detection of viral sequences by molecular means. An additional advantage for the use of EM is that samples stored under conditions that would not allow for molecular testing or virus culture can be used for rapid EM visualization of viruses. There are numerous examples of insect-derived viruses depicted by EM in the published literature for which no further characterization has been undertaken . Subsequent studies on characterization of viruses from the same host insect commonly fail to refer back to the electron micrographs in older papers [21,22]. In some instances, initial classification based on morphology was subsequently revised based on molecular information. This was the case for White spot syndrome virus of shrimp, which was initially believed to be a baculovirus  (Figure 1).
Samples used for EM analysis for virus observation range from crudely extracted samples to viruses purified via ultracentrifugation and sucrose or cesium chloride gradients. Thin sectioning of tissues is also common for observing the tissue tropism of insect viruses. For examination of virus particles, specimens are placed on to grids and typically subjected to negative staining . Commonly used negative stains are 0.05 to 2% uranyl acetate, 1 to 2% phosphotungstic acid (PTA), and 0.05 to 5% ammonium molybdate. Immuno-electron microscopy (IEM) , which uses antibody-virus reactions for virus detection by EM was first developed in 1941 . For IEM, viruses in solution are mixed with the viral antiserum to form a virus-antibody complex or immunoaggregate. This antibody coated virus particle can be negatively stained and distinguished by EM.
Immunolocalization is used for observing virus in thin sectioned tissue specimens and for specifically identifying known viruses. The viruses are coated with viral antibodies followed by secondary antibody conjugated with colloidal gold (gold labeling) in the grid. The grid is then negatively stained.
Two serological methods that are commonly used for virus detection are enzyme-linked immunosorbant assay (ELISA) and western blot. Both methods employ antibodies that recognize viral coat or other viral proteins to detect the presence of a given virus either from samples of purified virus or from total proteins extracted from infected insects or tissues. ELISA and western blot are widely used for detection of known viruses and may also be used to assess the serological relationship of a new virus to known viruses within the same family.
Nucleic acid hybridization (Southern blot, northern blot and dot blot) are also useful for identification of viruses, but these methods also rely on prior knowledge of the target virus; Viral specific nucleic acid probes are labeled for hybridization to the target viral DNA or RNA to demonstrate the presence of the virus.
Polymerase chain reaction (PCR) and reverse transcription polymerase chain reaction (RT-PCR) are used to amplify viral DNA or RNA respectively, either full length genomic sequences, or parts thereof. The resulting DNA or cDNA fragments are cloned into vectors and used for sequencing, and in the case of full length sequences may also be used for screening for infectious virus clones . Comparison of sequence data for a given virus to those of known viruses will indicate whether the virus is novel or similar to known viruses, and facilitates virus classification and phylogenetic analyses.
Several insect viruses were discovered through analysis of expressed sequence tag (EST) libraries. EST libraries are produced by isolating total RNA from insects, purifying mRNA, and generating and sequencing a cDNA library. Hence ESTs are sequences (typically 500–800 nt) that represent the sequences of transcribed genes. Purification of mRNA to generate the cDNA library requires selection of RNAs that include polyA tails on a polyT column, thereby limiting sequence representation to viral RNAs with polyA tails. Although relatively little viral sequence with low coverage is provided by ESTs, there may be sufficient sequence for 5′ RACE to acquire the full genomic sequence. Given the low sequence coverage provided by EST libraries, virus detection via EST sequences is likely to be limited to viruses that are present at high titers in the host insect.
Valles et al.  detected six ESTs of putative viral origin in a cDNA library derived from the red imported fire ant, Solenopsis invicta, which causes significant economic damage in the U.S. Three of these ESTs exhibited significant homology to Acute bee paralysis virus (Dicistroviridae) and 5′ RACE was used to delineate the entire genome sequence of the virus, Solenopsis invicta virus 1 (SINV-1).
Hunnicutt et al.  isolated Homalodisca coagulata virus-1 (HoCV-1) following detection of viral sequences in EST libraries from the glassy-winged sharpshooter, H. vitripennis (also known as H. coagulata) . This insect was introduced from the southeastern U.S. to California in the late 1980s and wreaked havoc as a result of its polyphagy and in the absence of natural enemies such as parasitic wasps and entomopathogenic fungi. In addition, H. vitripennis vectors the bacterium Xylella fastidiosa, which negatively impacts numerous plant species. Viruses isolated from this insect may have potential for use in its management. Sequences derived from a phytoreovirus (plant virus) were also detected from an H. vitripennis salivary gland cDNA library .
Oliveria et al.  discovered three novel small RNA viruses (NvitV-1, -2 and -3) with the longest contig (i.e., series of overlapping DNA sequences used to reconstruct the original sequence) of 2789 nt (NvitV-1; not including the polyA) from the parasitoid wasp, Nasonia vitripennis, by data mining of EST libraries.
The use of microarrays was proposed and tested for detection and genotyping of pathogens . The DNA microarray-based platform was designed to include all viruses that had full-length sequences in GenBank and included the most highly conserved 70mer sequences from every fully sequenced reference viral genome. The microarray was used for both genotyping of viruses and for virus discovery. In addition to identifying viruses present in a sample, hybridized viral sequences were isolated from the spot in the microarray, amplified by PCR, cloned and sequenced for identification of novel viruses . The combination of array hybridization followed by direct viral sequence recovery allows for rapid characterization of novel viruses. Microarrays have not been adapted for invertebrate virus discovery, but offer an alternative approach; for example, an Arthropod Pathogen microarray was used for detection (but not discovery) of known viruses in honey bees .
Next Generation Sequencing is a non-Sanger-based and high-throughput methodology  which allows for generation of millions of sequences at once . Multiple high-throughput sequencing technologies have been developed [38–40]. The most common NGS platforms are Roche 454 pyrosequencing (454 Life Science), Illumina (Solexa) sequencing, and SOLiD sequencing (ABI Biosystems) (Table 1).
Viral sequences can be extracted from either total DNA (for DNA viruses only) or RNA isolated from insects . Alternatively, prior to viral DNA or RNA extraction, virus purification can be conducted to eliminate host nucleic acid contamination, followed by extraction of viral DNA or RNA . Insects collected from the field should ideally be processed rapidly with RNA stored in RNAlater (Qiagen) or TRIzol (Invitrogen), and DNA stored in DNAzol (Invitrogen) at −80 °C for later processing. However, viral RNA and DNA can also be stored safely in crushed insects in such stabilizing solutions: While the viral RNA and DNA under these conditions are stable at room temperature, it is recommended that samples be kept cold. Alternatively, insects can be stored directly at −80 °C, although some RNA viruses (e.g., some dicistroviruses) are unstable on freeze-thawing.
Methods used for library preparation vary according to the platform used for sequencing. Reagents, kits and methods for preparing libraries can be obtained from the corresponding companies. In general, there are three types of libraries that are most useful in the context of virus discovery: DNA, RNA (including transcriptome) and small RNA libraries. For transcriptome sequencing, mRNA is extracted from total RNA by polyT treatment or by methods for ribosomal RNA depletion  before being used for library construction. Procedures for library preparation normally include DNA or RNA fragmentation (DNA and transcriptome sequencing), size selection of fragments, addition of adapters, PCR or RT-PCR (for transcriptome and small RNA libraries), and amplification of sequences. Following library construction, sequencing is carried out.
There are no standard methods for analysis of sequences generated by NGS , although numerous bioinformatics methods and pipelines have been developed as dictated by the specific challenges of the datasets generated ; for example, data analysis is greatly simplified by the presence of a reference genome against which to align and compare NGS sequences. In general, the initial raw sequencing data (reads) are treated with programs provided by the manufacturers for base calling, removal of adaptor sequences (adaptors are usually a the 5′-end) and removal of low quality reads. For small RNA sequencing, the 3′-end adaptors are trimmed by either customer-developed programs or programs such as Cutadapt  which are freely available. Different researchers have used different approaches for data mining to find viral sequences: DNA or transcriptome sequence data can be used to conduct BLAST (Basic Local Alignment Search Tool) searches (blastx, tblastx, or blastn  against NCBI non-redundant (nr) databases, or a viral database  before the reads are assembled. The reads that hit viral sequences with given E-values are extracted and used for de novo assembly. Many programs for short-read assembly are available  and can be used for either de novo assembly or mapping the reads to known viral genomes. For small RNA sequencing data, the reads may be assembled de novo, and the contigs then used for BLAST analysis to find homologous viral sequences. The contigs with viral sequence hits may be extracted and reassembled for further characterization. The bioinformatics methods used for virus discovery by NGS data mining are summarized in Table 2.
Following detection of viral sequences by NGS technologies, the presence of viral sequences in the sample must be confirmed by PCR (DNA viruses) or RT-PCR (RNA viruses). Real time PCR/RT-PCR can be used to quantify the amount of virus present, and provide validation for the number of observed reads in the NGS datasets. To confirm whether the identified virus replicates in the host insect, RT-PCR for detection of viral transcripts, or negative strand-specific RT-PCR for ssRNA viruses can be performed. The use of tagged primers for enhanced specificity is recommended . Detection of negative strand RNA is used as an indicator of replication for positive strand RNA viruses. Acquisition of additional supporting evidence to confirm the presence of the virus is strongly recommended (e.g., virus purification, EM analysis, detection of viral coat proteins, isolating genomic DNA or RNA, or showing virus increase over time by using quantitative RT-PCR/PCR); On occasion, the sequences of viruses that do not replicate in the target insect, but are present in the diet of that insect , or have become incorporated into the host genome are detected. Hence, detection of viral sequences in a particular insect is not sufficient evidence for replication of the virus in that host. Ideally, it would be possible to purify the virus and infect other individuals of the same species that lack the virus (i.e., fulfill Koch’s postulates).
Next Generation Sequencing has been widely applied [38,48–50] including for the discovery of novel microbes and viruses from animals and plants [51,52]. To date, there are about a dozen reports of viruses discovered from insects or insect cell cultures by means of NGS.
The first application of NGS technology that demonstrated the potential use of this approach for virus discovery was a metagenomic analysis of the honey bee, Apis mellifera L., conducted to elucidate the causes of colony collapse disorder (CCD) [41,55]. The pathogens of bees, including more than 18 viruses , have been well studied. No new viruses were detected during the course of this analysis. For the metagenomic analysis, total RNA was extracted from bees taken from CCD and non-CCD colonies collected from the US, and Australia and also from royal jelly from China. The RNA libraries were subjected to 454 pyrosequencing, and raw reads were trimmed and assembled into contigs. Contigs were used for BLAST analysis (blastn and blastx)  against the NCBI nr database. Seven viruses were identified in bees derived from CCD colonies (Table 3), compared to five from non-CCD colonies. A wide range of other pathogens were also detected . The presence of the viruses was confirmed by RT-PCR and Sanger sequencing, and the presence of Israeli acute paralysis virus (IAPV) was found to be a significant indicator of CCD. Shortly thereafter, IAPV-like viruses were detected in a fresh water lake, in a metagenomic analysis of the viral community in fresh water .
Analysis of the microbiome of the honey bee over time was used to identify four novel viruses  including two which were the most abundant components of the microbiome at ~1011 viruses per bee. High frequency sampling along with molecular detection methods including a custom arthropod pathogen microarray, qPCR, and deep sequencing were used for episodic viral detection throughout the year. Total nucleic acids from 20 monitor hives (3 μg nucleic acids per hive) were pooled and three sequencing libraries prepared (one DNA, two RNA libraries). The RNA libraries were constructed with various modifications (e.g., with and without purification of mRNA) to optimize the detection of viruses, bacteria, fungi/protists, mites, and nematodes. Sequencing was performed with paired-end 65 nt reads by using an Illumina GAII. To analyze the sequencing data, a database was created that included all of the complete arthropod virus genome sequences available at the time. The entire sequencing dataset was queried against the arthropod virus library by using blastn and tblastx, and hits with a minimum e-value of 1 × 10−3 used for further analysis. Hits were assembled using the Geneious sequence analysis package. Contigs (>250 nt) were queried again against the dataset by tblastx with an e-value ≤ 1 × 10−5. The positive hits were then queried against the nr database with the same parameters to eliminate spurious hits. For the contigs that appeared divergent or that hit non-honey bee associated viruses, extension of the contigs was performed using the entire read dataset with a paired-end contig extension program. From this analysis, it appeared that overall virus incidence was sporadic, although the use of only five bees per sample from each hive and a virus detection limit of 1.9 × 105 virus genome copies may explain the apparent disappearance of some viruses over time. Four new viruses were discovered from the honey bees, including two dicistroviruses (named Aphid lethal paralysis virus strain Brookings, and Big Sioux River virus), and two viruses for which the complete genome sequence was acquired (Lake Sinai virus 1 and Lake Sinai virus 2; Table 3). Replication of LSV1 and LSV2 in the honey bee was confirmed by RT-PCR.
A so-called vector-enabled metagenomics (VEM) approach was used to examine the diversity of DNA viruses present in multiple species of mosquitoes from California, U.S. . In this approach, viruses were purified from mosquito samples by homogenization of the samples, filtration and a cesium chloride step gradient. The presence of viral particles and the absence of bacterial and eukaryotic cells were confirmed prior to further processing. Total DNA was extracted and amplified prior to 454 sequencing on a GS20 or GS FLX pyrosequencing platform. Short reads were removed from the sequencing dataset prior to blastn and tblastx analysis against the GenBank nr database for identification of viral sequences and further assembly and annotation. The presence of some of the viral sequences in the mosquitoes was confirmed by PCR. Remarkably, the sequences of 107 DNA viruses derived from 16 viral families were identified in the three mosquito samples. Viruses detected included viruses of animals, plants, insects and bacteria with the majority being densoviruses. Although novel viruses were detected, few full length genome sequences were acquired. The pooling of multiple species of mosquito also prevented immediate identification of the host species of novel viruses.
The first insect nidovirus, Cavally virus (CAVV), was discovered following virus isolation from mosquito heads, and amplification in the C6/36 mosquito cell line. The virus was titrated on insect cells and cell culture supernatant used as a source of pure virus . Virions were visualized by TEM and RNA extracted from purified virus for high through-put and conventional sequencing. The ssRNA genome is 20 kb in size. CAVV was present in 9% of mosquitoes sampled around the primary forest habitat in Ivory Coast, and virus incidence increased with increasing human habitation.
RNA interference (RNAi) plays a vital role in defense against RNA viruses in a wide range of organisms including insects [63–72]. The enzyme Dicer recognizes double stranded (ds) RNA (produced during the replication of RNA viruses) and cleaves it into small interfering RNAs (siRNAs) of about 22 nt in length . Argonaute, a protein component of the RNA-induced silencing complex (RISC), binds the antisense strand of the siRNA and degrades viral RNA complementary to the siRNA . Hence, sequencing of small RNAs (sRNA: 17–30 nt) and assembly of the virus-derived siRNAs can be used to reveal the sequences of RNA viruses present in an insect (Figure 2).
The first report of the use of sRNA sequencing for virus identification was for analysis of the sweet potato . In this case, the authors inoculated the plants with known RNA viruses, Sweet potato feathery mottle potyvirus (SPFMV) and Sweet potato chlorotic stunt closterovirus (SPCSV). Small RNA was isolated from the inoculated plants and sequenced by using Illumina GAII. The sRNA reads were assembled with three different assembly programs for sequence assembly, and contigs were reassembled to generate longer contigs using the program Contigexpress (Vector NTI, Invitrogen). The contigs were queried by searching the GenBank nr database for viral sequences. SPFMV and SPCSV sequences were successfully recovered from the sRNA, but only one full-length viral RNA sequence was recovered. In addition, ssDNA and dsDNA reverse transcribing viruses were identified from the small RNA sequences.
A similar strategy was used to identify viruses present in a Drosophila cell line, and from published sRNA datasets for mosquitoes and nematodes . Four viruses (two positive strand ssRNA and two dsRNA viruses) were identified from the Drosophila S2-GMR cell line. In addition, two viruses, including one new virus, were identified from the mosquito. However, full length genomes of the viruses could not be assembled from the sRNA datasets. RT-PCR, RACE-PCR and sequencing were used to fill gaps in the viral sequences.
The traditional approach for the sequencing of viral genomes involves PCR or RT-PCR amplification of DNA or cDNA fragments, and cloning prior to Sanger sequencing for DNA or RNA viruses respectively. For the large DNA viruses such as baculoviruses which have genomes of 80–180 kb, fragments of genomic DNA are generated using restriction enzymes rather than PCR, prior to cloning and sequencing. NGS now provides an alternative approach for the sequencing of large DNA viruses and was used successfully for sequencing of the Glossina pallidipes salivary gland hypertrophy virus (GpSGHV). This virus infects tsetse flies, and has been detrimental to laboratory colonies established for use in a sterile male release program for management of tsetse flies, which transmit the agents of both human and animal trypanosomiasis . GpSGHV is a double-stranded circular DNA virus with a genome of 190 kb with 160 non-overlapping ORFs. The genome sequence was assembled by a combination of (a) shotgun 454-pyrosequencing, (b) Sanger sequencing of a partial genomic cloned library of the viral DNA fragments, and (c) sequence gap filling using PCR products, followed by sequence assembly. NGS data also provided information about the genomic variation of this virus.
NGS was also applied to sequencing of the genome of the polydnavirus, Cotesia vestalis bracovirus (CvBV) . Polydnaviruses (PDV) are associated with parasitoid wasps and serve to suppress the immune system of the parasitized host insect. The genome of PDV is 540 kb and is composed of 35 dsDNA segments. CvBV virions and viral DNA were isolated from the ovaries of 400 to 500 C. vestalis. Genome sequencing was performed using 454 GS FLX . Assembled contigs were first compared with the genome of Cotesia plutellae bracovirus (CpBV) using blastn, and then validated by PCR. The relationship of the remaining contigs was determined by multiplex PCR  and the remaining gaps were sequenced by Sanger sequencing. Sequences were finally assembled using the Phred, Phrap and Consed software programs, and low quality regions of the genome were resequenced. Each circular segment was confirmed by multiplex PCR and sequencing.
We sequenced the soybean aphid transcriptome using Illumina GAII. Total RNA was extracted from aphids using TRIzol reagent (Invitrogen) and transcriptome libraries prepared for single read analysis according to Illumina protocols. Three samples were prepared, one of which had a single polyT purification step, as compared to the others that were subjected to two of these steps. The resulting reads of 75 nt were assembled using Velvet  and ABySS . Contigs of >100 nt were screened for viral sequences using blastx and blastn against the NCBI nr database. A novel ssRNA positive-strand virus (named Aphis glycines virus, AGV) that showed homology to tetravirus RdRP (RNA-dependent RNA polymerase) was identified. Two known aphid viruses, Aphid lethal paralysis virus (ALPV)  and Rhopalosiphum padi virus (RhPV)  were also detected. The contigs that included AGV sequence were used to screen the contig sets by blastn to identify additional contigs with AGV sequences. By using this approach, 95% of the AGV sequence was revealed. In contrast, less than 2% of the genomes of ALPV and RhPV were detected from the transcriptome sequence data, suggesting that these viruses were present at relatively low copy number. The presence of all virus sequences detected in the transcriptome was confirmed in the aphid colony by RT-PCR.
We then used small RNA sequencing using Illumina GAII for detection of RNA viruses in apparently healthy laboratory colonies of the pea aphid (Acyrthosiphon pisum), the green peach aphid (Myzus persicae) and the soybean aphid (Aphis glycines). Total RNA was extracted from aphids using TRIzol and the Illumina Small RNA Sample Prep Kit used for production of sRNA libraries. For the small RNA sequencing reads, 3′-adaptors were removed and the reads were then de novo assembled using Velvet . Contigs (>100 nt) were searched for viral sequences by blastx and blastn against the NCBI nr database. Sequences with homology to ALPV were found in the pea aphid and the soybean aphid samples, and a DNA virus, Myzus persicae densovirus (MpDNV) [82,83] was detected in the green peach aphid sample. The pea aphid small RNA reads were also mapped to the full-length Acyrthosiphon pisum virus (APV, unclassified ssRNA positive-strand virus) using a Perl script, but no reads with significant homology to APV were found, suggesting that the aphids did not harbor APV. The soybean aphid small RNA reads were also mapped to AGV, discovered from analysis of the transcriptome sequencing data, and sRNA-derived contigs of AGV were identified.
More than 95% of the ALPV-like genome sequence from A. pisum was assembled from siRNA reads into three contigs. Although more than 70% of the AGV genome sequence was covered by the siRNA sequences, none of the contigs generated were more than 300 nt in length. In the case of ALPV from the soybean aphid, and MpDNV from the green peach aphid, less than 30% of the genomes were covered by the assembled siRNA sequences.
Although NGS has been transformative for virus discovery, there are limitations. One limitation of the use of NGS methods is that it is not possible to identify novel viruses that lack homology to known viruses. An exception to this is when the DNA or RNA sequenced is extracted from purified virus, and hence the viral origin of the sequence has already been established. A second limitation to the use of NGS is that full length genome sequences are unlikely to be acquired unless the virus is present in the host insect at high titers. Further sequencing of the genome will likely be required. In some cases, although most of the sequence is acquired, the 5′ and 3′ end sequences are not found . Hence it is important, where possible, to retain frozen tissues for virus isolation and / or maintain a colony of the insect for virus extraction. A third challenge for NGS methods is the use of non-standardized methods for data analysis. There are no clearly established guidelines for acceptable read quality, parameters for short read data assembly, and significance of BLAST hits, for example. With the increasing use of NGS, there is a real need to develop tools and software to handle bioinformatics analysis for any organism, rather than just model organisms.
Next generation sequencing technologies have fundamentally changed the methodology for discovery of viruses from insects, for diagnostics and epidemiology of viral diseases, and for the study of virus-insect interactions. The sequencing of transcriptomes and small RNAs not only generates viral sequences for assembly into viral genome sequences, but also provides insight into host response to virus infection, the virus-insect interactome. In addition, NGS provides for a quantitative glimpse of both the repertoire of viruses present within an insect, and simultaneous insight into how the host transcriptome responds to virus presence. For example, comparison of the transcriptomes and sRNAs of infected and non-infected populations will indicate how virus infection impacts host gene transcription, and whether the viral RNA is susceptible to degradation by the host RNAi response. The third generation sequencing technology(ies) will likely further improve our ability to discover new insect viruses [84–86].
It is evident from this review that there are multiple approaches for identifying novel insect viruses and for assembly of viral genome sequences (Figure 3). Different methods used for library preparation (total DNA, RNA, small RNA or nucleotides isolated from purified viruses) and the libraries used may result in detection of different viruses.
The examples of novel insect viruses discovered by use of NGS technologies illustrate the ubiquity of viruses in field populations, laboratory colonies of insects and in insect-derived cell lines. These viruses may cause covert or overt infection, or be vectored by the insect to their primary plant or animal hosts. The ubiquity of viruses in laboratory colonies of insects and in cell lines has implications for the use of these tools for analysis of insect-virus interaction. Insects have several anti-viral defense pathways , including RNAi  and apoptosis , and the presence of covert viruses is likely to impact these pathways. Viruses have developed strategies to overcome or suppress host insect anti-viral immunity, by encoding for example, suppressors of RNA silencing [68,89,90] and inhibitors of apoptosis . Hence, as some viruses asymptomatically infect insects, it is possible that RNAi or other insect anti-viral immune pathways are already activated and/or impaired by those viruses. Such a scenario may explain the relative lack of response of the pea aphid to challenge with ALPV  for example.
While there is increasing interest in gene silencing approaches for insect pest management, with successful demonstration of the approach for beetles , results in other insects including the Lepidoptera have been mixed . It remains to be seen whether the presence of RNA viruses in these insects and their effect on the RNAi processing machinery (inhibition of Argonaute 2 for example ), impairs the use of dsRNA for gene silencing for pest management. NGS provides a powerful platform for analysis of pathogens present in test organisms, and the potential interference of covert viruses in experimental outcomes and physiological studies.
The authors thank Lyric Bartholomay for critical reading of the manuscript. This material is based upon work supported by the Iowa State University Plant Sciences Institute, Iowa State University Center for Integrated Animal Genomics, the USDA National Research Initiative grant number USDA 2009-35302-05266, as well as Hatch Act and State of Iowa funds.
Conflict of Interest
The authors declare no conflict of interest.