|Home | About | Journals | Submit | Contact Us | Français|
Viruses in the genus Bracovirus (BV) (Polydnaviridae) are symbionts of parasitoid wasps that specifically replicate in the ovaries of females. Recent analysis of expressed sequence tags from two wasp species, Cotesia congregata and Chelonus inanitus, identified transcripts related to 24 different nudivirus genes. These results together with other data strongly indicate that BVs evolved from a nudivirus ancestor. However, it remains unclear whether BV-carrying wasps contain other nudivirus-like genes and what types of wasp genes may also be required for BV replication. Microplitis demolitor carries Microplitis demolitor bracovirus (MdBV). Here we characterized MdBV replication and performed massively parallel sequencing of M. demolitor ovary transcripts. Our results indicated that MdBV replication begins in stage 2 pupae and continues in adults. Analysis of prereplication- and active-replication-stage ovary RNAs yielded 22 Gb of sequence that assembled into 66,425 transcripts. This breadth of sampling indicated that a large percentage of genes in the M. demolitor genome were sequenced. A total of 41 nudivirus-like transcripts were identified, of which a majority were highly expressed during MdBV replication. Our results also identified a suite of wasp genes that were highly expressed during MdBV replication. Among these products were several transcripts with conserved roles in regulating locus-specific DNA amplification by eukaryotes. Overall, our data set together with prior results likely identify the majority of nudivirus-related genes that are transcriptionally functional during BV replication. Our results also suggest that amplification of proviral DNAs for packaging into BV virions may depend upon the replication machinery of wasps.
Viruses in the genus Bracovirus (BV) (Polydnaviridae) are distinguished by their large, segmented double-stranded DNA (dsDNA) genomes and symbiotic association with parasitoid wasps (Hymenoptera) in the family Braconidae (summarized in references 10, 44, and 55). Each species of wasp carries a genetically unique BV which is transmitted only vertically to offspring as an integrated provirus. Replication occurs only in pupal- and adult-stage females in a region of the ovary called the calyx. Virions containing multiple circular dsDNAs accumulate to high density in the lumen of the calyx to form “calyx fluid.” Adult wasps then inject calyx fluid along with one or more eggs into each host insect they parasitize. BVs do not replicate in host insects, but infection is vital for wasp survival because viral gene products disable immune defenses and/or cause other alterations required for successful development of offspring (44).
The encapsidated genomes of 5 BVs have been fully sequenced (13, 15, 19, 52), and in all cases many genes of likely wasp origin have been identified that encode products involved in altering host physiology (44). In contrast, no homologs of any viral genes with predicted roles in genome replication, transcription, or virion formation are present. These findings fully explain why BVs do not replicate in the hosts that wasps parasitize. The lack of virus replication genes, however, raised the question of whether BVs evolved from a viral ancestor or are structures wasps evolved that resemble viruses but are not (10, 19, 44). Evidence strongly favoring a viral origin for BVs was recently generated by producing pupal ovary cDNA libraries from the BV-carrying wasps Cotesia congregata and Chelonus inanitus and Sanger sequencing approximately 5,000 expressed sequence tags (ESTs) (9). Among these ESTs were transcripts with homology to 24 known genes from nudiviruses; a genus of large dsDNA viruses that is closely related to the Baculoviridae and which often establishes persistent infections in arthropods (26). That several of the nudivirus-like transcripts identified from C. inanitus encode protein products present in C. inanitus bracovirus (CiBV) virions (54) is also consistent with earlier microscopy studies which noted that BV virions morphologically resemble nudiviruses (43). Since all BV-carrying wasps shared a common ancestor 100 million years ago (33, 55), these data compellingly argue that BVs evolved from an ancient nudivirus, which was associated with the ancestor of the ca. 18,000 wasp species that carry BVs today. They also show that several nudivirus-like genes required for virion formation are part of a proviral genome but are not contained within the episomal DNAs that are packaged into virions.
Since sequence data are limited to a relatively small pool of ESTs from only two wasp species, what remains unclear is whether other nudivirus-like genes exist or whether the genes identified to date are present in all BV-carrying wasps. The absence of any C. congregata or C. inanitus ESTs with homology to genes essential for replication of baculovirus/nudivirus DNA (39) is also notable, because if no such genes exist it would argue that BVs have evolved to rely on the replication machinery of wasps to amplify the DNA that is packaged into virions. To address these questions, we performed massively parallel sequencing of ovary transcripts from Microplitis demolitor, which carries MdBV. Our results likely identify the majority of nudivirus-related transcripts in M. demolitor as well as all wasp genes that are highly expressed during MdBV replication. As such, these data represent the first deep sequencing analysis of both viral and wasp genes during replication of a BV.
Like most BV-carrying wasps, M. demolitor parasitizes the larval stage of certain species of Lepidoptera, including the moth Pseudoplusia includens (45). Both species were reared as previously described at 27°C with a 16-h-light:8-h-dark photoperiod (47). The different developmental stages of M. demolitor can be precisely monitored under these conditions using time and morphological characteristics (45). In brief, M. demolitor takes a total of 11 days to develop from an egg to an adult. After parasitism of a host larva, the M. demolitor egg hatches and the wasp larva feeds for 6 days. The last-stage wasp larva then emerges from the host's body on the morning of day 7, spins a silken cocoon around itself in 4 h, and pupates between 9 and 12 h postemergence. Newly pupated wasps (stage 1 pupae) are entirely white with light red compound eyes. On day 8, pupae enter stage 2, which is distinguished by wasps having dark red eyes, 3 or 4 black stripes on the dorsal side of the thorax, and a white abdomen. On day 9, stage 3 pupae have an entirely black thorax, black eyes, and a partially darkened abdomen, while on day 10, stage 4 pupae have a fully black thorax and abdomen with brown accents present on each leg. On day 11, pupae emerge from the cocoon as day 1 adults, which in the current study were thereafter fed ad libitum 10% sucrose in water and maintained at 18°C in continuous dark. Females are easily distinguished from males as pupae and adults, because they have much shorter antennae.
We used quantitative PCR (qPCR) and previously developed methods (6) to determine the abundance of episomal MdBV segment B, integrated (proviral) segment B, and rejoined flanking DNA following segment B excision in ovaries from different wasp stages (see Fig. S1 in the supplemental material). Briefly, ovaries from stage 1 to 4 female pupae and adult female wasps were prepared by homogenizing them in 1× DNase buffer (0.5 mM CaCl2, 2.5 mM MgCl2, 10 mM Tris-HCl, pH 7.5). Following filtration through a 0.45-μm filter, 1 μl of Ambion Turbo DNase was added to some samples for 1 h at 37°C, while others were not DNase treated. After the addition of EDTA (10 mM) to inactivate the DNase, 25 μg of proteinase K (Roche) and 2% sarcosyl were added to samples, followed by incubation at 62°C for 1 h and by phenol-chloroform extraction and ethanol precipitation in the presence of 0.3 M sodium acetate, pH 5.2. DNA pellets were resuspended in 30 μl of H2O and diluted 1:50 with water for use as the template.
After quantifying DNA concentrations, PCRs were run using a Bio-Rad thermocycler and 25-μl reaction volumes containing DNA (10 ng) from day 5 adults as the template, DNA primers (6.25 pmol), and 1.25 units of Hotmaster Taq polymerase (5 Prime) under the following cycling conditions: initial denaturation at 94° for 2 min, followed by 35 cycles of denaturation at 94°C for 20 s, annealing for 20 s at the specified temperature for the primers used, and extension at 65°C for 30 s with a final extension at 65°C for 7 min. Primer sequences can be found in Table S1 in the supplemental material. Products were cloned into pSC-A-amp/kan (Stratagene) and propagated in Escherichia coli. Following overnight culture, plasmid DNA was then isolated using the Fermentas GeneJet plasmid miniprep kit and quantified, and their identity confirmed by sequencing. As a control, primers were also designed for M. demolitor elongation factor 1α (EF1α) (see Table S1 in the supplemental material), which was similarly amplified and cloned, and its corresponding plasmid purified. Each of these plasmids was then used to make absolute standard curves to determine the abundance of integrated genomic DNA B, episomal genomic B, the empty B locus, and EF1α in ovary DNA. Ten-microliter qPCR mixtures containing 1 μl of DNA template (wasp DNA or serially diluted amounts [102 to 107 copies] of each plasmid standard) and specific primers were then run using a Rotor-Gene 3000 cycler (Corbett) and previously described reaction conditions (6). Melting curves of products were checked for amplification specificity, and threshold cycle (CT) values for each sample were fit to the standard curve generated from the plasmid DNA template dilutions. Per-ovary copy numbers were calculated by multiplying the qPCR estimate of copy number by the dilution factor and elution volume and dividing by the number of ovaries used in each sample. Two or 3 independently acquired biological replicates were performed for each wasp developmental stage with each sample internally replicated 4 times. For transmission electron microscopy (TEM), ovaries were collected, processed, and examined as previously described (46).
Total RNA was isolated from 6 ovary samples collected from M. demolitor stage 1 pupae and day 1 adult females. Each ovary sample consisted of 15 to 20 ovary pairs with total RNA extracted using the Roche high pure RNA isolation kit followed by a second DNase treatment using Ambion Turbo DNA-free reagents. Sequencing libraries were prepared by the University of Georgia Genomics Facility using the Illumina TruSeq DNA sample preparation kit and the standard low-throughput protocol. RNA was fragmented thermally by ramping up to 94°C followed by immediate cooling to 4°C. Each biological replicate was labeled with an individual indexed adapter. Libraries were then pooled to equal concentrations using quantitative PCR data and size selected by gel extraction for an average library size of 441 bp. Libraries were clustered on one lane at 6 pM followed by 100 cycles of paired-end sequencing on the Illumina HiSeq system housed at the HudsonAlpha Institute for Biotechnology (Huntsville, AL).
Sequence data were demultiplexed using CASAVA v1.8 software (Illumina). Reads were filtered for quality by removing those that had <90% of bases with >Q10 quality scores (based upon the Illumina TruSeq quality scoring system). High-quality reads were assembled using velvet v1.1.04 and Oases v0.2.21 (60), with k-mer length 51, and parameters cov_cutoff = 3, min_trans_length = 200, ins_length = 441. Sequence reads were mapped to transcripts using the Burrows-Wheeler Aligner bwa-sw algorithm and samtools (27, 28). Use of the longer-read algorithm improved the percentage of mapped reads over short-read algorithms (data not shown). If multiple matches were found for a single read, bwa-sw chose the reported mapping location at random. Counts of aligned reads could therefore be summed for all possible alternative transcripts of a single locus to give the raw number of reads mapping to a locus for each sample. Raw counts were converted to reads per kilobase per million reads mapped (RPKM) to normalize for average transcript length and the total number of reads for a sample (32). Many loci had low RPKM values that did not allow meaningful statistical analyses for differences in expression. Loci with a cumulative RPKM value for all samples of <10 were considered unsuitable for further analyses due to their low transcriptional abundance and were removed from the data set. The difference between RPKM values for high-abundance loci in adult and pupal samples was tested by the t test function in R.
Some transcripts with homology to baculovirus or nudivirus genes (pif-1, p74, lef-8, HzNVorf9-1 and -2, and HzNVorf64) were fragmented in assembly into two or more different contigs, while others had frameshift mutations which suggested that they were inactivated. To assess whether these alterations reflected sequencing or assembly errors, we synthesized gene-specific primers that spanned an assembly gap or potential mutation site (see Table S1 in the supplemental material). Standard 25-μl reaction volumes containing DNA (10 ng) from stage 1 pupae and the appropriate primer pair were then run using a Bio-Rad thermocycler. Products diluted 1:2 in H2O were sequenced by the Sanger method followed by assembly of the resulting sequence reads with DNASTAR SeqMan software. Resequencing showed that each of these genes was intact in the M. demolitor genome. Thus, the fragmented transcript sequences were merged into one locus. Two ultra-high-abundance genes (vp39 and 17a) were fragmented during assembly due to excessive read coverage and were identified by reassembling 8.3% of all reads using the same assembly parameters described above.
Orthologous genes among the M. demolitor transcripts were identified by tblastn (orthologous protein candidates against translated transcripts) followed by reciprocal BLAST, using blastx to identify orthologs of translated transcripts in the NCBI nonredundant database of 3 June 2011 (3). Percent identity was calculated by tblastn alignment of the best query protein to the M. demolitor transcript. Coordinates for alignment boundaries of M. demolitor transcripts to protein orthologs to identify gene duplicates were gleaned from tblastn alignment results. The longest translated open reading frames (ORFs) were then aligned with orthologous protein-coding sequences from the NCBI protein database using MUSCLE (18). Phylogenetic reconstruction of the evolution of these sequences was performed by maximum likelihood analysis using RAxML PROTCATLG (with the LG substitution matrix) and 1,000 bootstrap replicates, with the CIPRES online resource (42).
To identify matches to a specific Hidden Markov model (HMM) in the M. demolitor transcripts, HMM profiles served as queries searched against a database of translated transcript sequences using HMMER 3.0 hmmsearch (http://hmmer.janelia.org). Translations were performed using EMBOSS getorf to find the longest open reading frames of >150 nucleotides in length between stop codons (38). DNA polymerase genes were identified using HMMER 3.0 hmmsearch with viral and eukaryotic DNA polymerase HMM queries (DNAPolymera_Pol, DNAP_B_exo_N, DNA_polA, DNA_pol_B, DNA_pol_B2, DNA_pol_B3, DNA_pol_B_exo1, DNA_pol_B_exo2, DNA_pol_lamd_f, DNA_pol_viral_C, DNA_pol_viral_N) against M. demolitor transcripts (20). To provide further evidence for the presence or absence of homologs to core nudivirus genes in the M. demolitor data set, the transcripts were searched with HMMs built for all nudivirus core genes using the nudivirus orthologs from all currently sequenced nudivirus genomes. Finally, conserved amino acid residues between nudivirus, baculovirus, and eukaryotic DNA polymerases were manually identified within regions of broad conservation (1). One region (III) was highly conserved and was characterized by the motif KXXXXSXYGXXGX[6–11]AXXXTXXGR in nudiviruses and baculovirus polymerases. Using perl programming, pattern matching searches were performed against all translated transcripts and also raw contigs from the velvet assembly that were used to build transcript sequences. Due to the short length of this motif, proteins should be identifiable using this method even if highly fragmented during assembly. To identify any possible conserved protein domains in a particular transcript of interest, the translated transcript sequence was searched against the Pfam-A database of HMMs using HMMER 3.0 hmmscan (20). Only hits with scores above the trusted cutoff were used.
Selected transcripts were analyzed by qPCR in time course studies. Total RNA was isolated per replicate from a single last-instar M. demolitor larva, a female pupa from stage 1 to 4, or an adult female wasp as described above. RNA concentrations were then quantified using a Nanodrop spectrophotometer followed by first-strand cDNA synthesis, using 100 or 500 ng of total RNA, poly(dT) primers, and Superscript III (Invitrogen) as outlined by the manufacturer. The corresponding cDNA for each gene of interest was PCR amplified using specific primers (see Table S1 in the supplemental material) and cloned into pSC-A-amp/kan, and its identity was confirmed by sequencing as outlined above. After propagating and isolating each plasmid from minipreps, standard curves were generated followed by determination of copy numbers for each transcript in ovary and whole-body samples with values standardized per 1 μg of total RNA. Three independently acquired biological replicates were analyzed per stage for each transcript, with each sample internally replicated 4 times.
A total of 66,425 transcripts were deposited in GenBank with accession numbers JO913492 through JO979916 and JR139425 through JR139430 (see below).
BV replication in calyx cells of the wasp ovary consists of three major events: (i) amplification of viral DNA, (ii) virion formation and packaging of DNAs that have been excised and circularized from amplified segments, and (iii) lysis of calyx cells and accumulation of virions in the calyx lumen to form calyx fluid (2, 4, 30, 40, 41, 58). The encapsidated genome of MdBV consists of 15 segments (A to O), which are individually packaged into virions that consist of a single nucleocapsid enveloped by a unit membrane (6, 46, 52). Prior results also report the complete sequence of each packaged segment and describe the wasp-viral boundary sequences for selected proviral segments including B (6, 8). We thus characterized the timing of MdBV proviral DNA amplification, segment excision/circulation, and packaging by conducting qPCR assays using genomic segment B as a marker (see Fig. S1 in the supplemental material). The copy number of each product was determined in ovaries from stage 1 to 4 pupae and adults, while the copy number of EF1α served as a genetically unlinked control gene in the wasp genome.
In stage 1 pupae, we detected approximately equal copy numbers of integrated segment B and EF1α but no circularized segment B or empty B locus (Fig. 1A). We also detected no circularized segment B or empty B locus in stage 2 pupae, but the copy number of integrated segment B was nearly an order of magnitude higher than that of EF1α, indicating specific amplification of the number of DNA copies of the region of the genome containing segment B (Fig. 1A). The copy number of integrated segment B then dramatically increased in stage 3 pupae, which was followed by a progressive decline during subsequent stages of wasp development (Fig. 1A). Reciprocally, the abundance of episomal segment B began to rise in stage 3 pupae but did not reach a maximum until adult wasps were 6 days old (Fig. 1A). We also first detected products for the segment B “empty locus” (Fig. 1A, Rejoined) in stage 3 pupae, but its abundance did not change thereafter (Fig. 1A). Including a DNase step before isolating DNA from ovary homogenates (6) further allowed us to determine the copy number of episomal segment B in virions, which protect packaged DNAs from DNase degradation. Packaged episomal segment B was first detectable in stage 3 pupae and reached a maximum in 6-day-old adult wasps (Fig. 1A, inset). Comparison of these data to the total amount of episomal segment B indicated that less than half was packaged in ovaries from stage 3 and 4 pupae and day 6 adults. However, in day 9 adults, most copies of episomal segment B were packaged into virions.
We linked our qPCR data to the timing of virion formation by conducting TEM studies. These observations showed that no capsids were visible in calyx cell nuclei from stage 1 and 2 pupae (data not shown), whereas calyx cells from stage 3 pupae had enlarged nuclei that contained MdBV virions (Fig. 1B). At higher magnification, MdBV particles at different stages of assembly were readily visible within these calyx cells (Fig. 1C). In stage 4 pupae and day 1 adults, we observed calyx cells where MdBV particles were in the process of being assembled and also calyx cells that were in the process of lysing (Fig. 1D). Calyx cell lysis in turn resulted in the accumulation of a high density of virions in the calyx lumen (Fig. 1E). Overall, these data showed that amplification of DNA containing proviral segment B began in stage 2 pupae, while virion formation and packaging of episomal segment B began in stage 3 pupae. Replication then continued into adulthood.
We used the preceding results to select stage 1 pupae as a source of prereplication ovary RNAs and day 1 adults as a source of ovary RNAs during active MdBV replication. While replication in stage 3 and 4 pupae was higher than that in day 1 adults, we selected the latter as a source of RNAs during active replication because replication remained high at this period and adults are easier to collect than pupae. We produced 6 libraries for each stage and sequenced them in a single reaction using the Illumina HiSeq system. Our results yielded a total 371 million paired 100-bp reads, of which 330 million had an identifiable index tag. The number of reads in each index group ranged from 19 to 41 million. After quality filtering, 94 million read pairs and 33 million single-end reads remained, for a total of 222 million reads representing 22 Gb of sequence. From this data set, 197 million reads assembled into 32,711 loci that were 200 bp in length or greater. Alternative splicing was also predicted for 9,973 loci for a total of 66,425 transcripts. These were deposited in GenBank with accession numbers JO913492 through JO979916 and JR139425 through JR139430. BLASTN identified 66 loci (0.2% of our total) that corresponded to known intergenic regions or ORFs in MdBV genomic DNAs that are packaged into virions, many of which were misassembled (see Table S2 in the supplemental material). Since mean RPKM values for these loci were much higher in our adult than in our pupal libraries, we reasoned most of these reads arose from low-level contamination by MdBV genomic segments that were massively amplified during replication and removed them from further analyses. However, recent results do indicate that some genes in the MdBV encapsidated genome are expressed in M. demolitor adults (11). While we could not distinguish DNA contamination from transcripts in our data set, we indicate in Table S2 in the supplemental material the viral transcripts identified by Bitra et al. (11) that potentially were not contaminants in this data set. In contrast, the low RPKM values for these loci in our stage 1 pupal samples argued that little or no contamination of our libraries by cellular genomic DNA occurred generally.
As the M. demolitor genome is not sequenced, we evaluated the breadth of gene sampling in our transcriptional data set by comparing it to the sequenced genomes of two other hymenopterans: Nasonia vitripennis and Apis mellifera (24, 53). Orthologs between genome/transcriptome pairs were determined using BLAST, and we counted shared genes as those with reciprocal hits between the data sets. Using this method, the N. vitripennis and A. mellifera genomes shared 3531 of 9252 and 10560 genes respectively, while the M. demolitor data set shared 2428 orthologs with N. vitripennis and 2697 with A. mellifera. Assuming each species pairing shares the same number of genes (approximately 3,500), these data suggested that 69–77% of genes shared among species were present in the data set we generated, and that a large percentage of genes in the M. demolitor genome were sequenced. This analysis also suggested that all of the highly expressed genes in ovaries were sampled, because random transcript sequencing results in highly expressed genes being sequenced more often by chance than genes expressed at low levels (32).
Given prior results with C. congregata and C. inanitus (9), our first priority was to screen the M. demolitor transcriptome for genes with similarity to known nudivirus and/or baculovirus genes. We identified a total of 41 such transcripts (Table 1; also see Table S3 in the supplemental material), whose known functions from studies of baculoviruses include RNA transcription, oral infectivity, and virion formation (39). Of particular note was the presence of all 4 genes (p47, lef-4, lef-8, and lef-9) for the subunits of the unique RNA polymerase that baculoviruses and nudiviruses contain, and two genes similar to lef-5 and vlf-1, which regulate the hyperexpression of baculovirus very-late genes. We also identified homologs of all six per os infectivity factors (i.e., pif genes p74, pif-1, pif-2, pif-3, 19K, odv-e56) which are envelope proteins, and odv-e66, which functions as both an envelope protein and hyaluronidase in baculoviruses that infect Lepidoptera (39). Last, we detected three genes similar to vp39, vp91, and 38K, which together with vlf-1 encode capsid proteins conserved between nudiviruses and baculoviruses. Four other nucleocapsid proteins present in all baculovirus genomes (gp41, vp1054, p6.9, and odv-ec27) but unknown in nudiviruses were absent from the M. demolitor ovary transcriptome. However, we did identify homologs of the predicted structural genes HzNVorf9, HzNVorf106, HzNVorf140, and PmV hypothetical protein, identified from Helicoverpa zea nudivirus 1 (Hz-1) (14) and Penaeus monodon baculovirus (MBV), which is in actuality a nudivirus recently renamed P. monodon nudivirus (PmNV) (26, 51). Homologs of these genes were also identified by proteomic analysis of CiBV particles from the wasp Chelonus inanitus (54).
We noted two other important features about our data set in relation to nudiviruses and baculoviruses. First, with the exception of a nudivirus/baculovirus-like helicase and integrase (also known as HzNVorf144), other genes considered essential for DNA replication (39) including a DNA polymerase (dnapol), DNA primases (lef-1, lef-2), DNA ligase, and the single-stranded binding (SSB) protein (lef-3) were absent from the M. demolitor transcriptome. Second, several nudivirus/baculovirus-like homologs (vlf-1, integrase, odv-e56, ac92, odv-e66, and HzNVorf9) were present in more than one contig and derive from duplicated genes (Table 1; also see Table S4 in the supplemental material). In most instances, we identified 2 or 3 distinct gene duplicates, but odv-e66 was represented in many different contigs (see Table S4). Although they were fragmented during assembly, we estimated that 8 or 9 paralogs of odv-e66 exist in the M. demolitor genome by counting the number of unique amino (9 peptide sequences) and carboxy (8 peptides) terminal ends present in our data set. We also were able to assemble 5 full-length transcripts (Table 1; also see Table S4). During our BLAST analyses, we noticed homology between vlf-1 and integrase (also known as vlf-1a and vlf-1b, respectively) and HzNVorf140 (16). To examine the evolution of BV integrase and vlf-1 duplicates in relation to nudivirus/baculovirus homologs, we conducted a phylogenetic analysis of these genes to assess whether the homologs identified in M. demolitor more likely arose prior to or after acquisition by the wasp. Our results strongly suggested that vlf-1 and integrase arose from an older duplication of vlf-1 in the nudivirus ancestor of BVs and that they have duplicated several times subsequently in M. demolitor (see Fig. S2).
Besides the breadth of gene sampling, a second strength of our sequencing approach was that it allowed us to quantify transcript abundance, because transcripts were sequenced in proportion to their representation in the original pool of mRNAs (32). By generating multiple independent samples and uniquely tagging each, we also could statistically compare the abundance of each transcript between our early pupal (prereplication) and adult (replication) treatments. Read mapping for each replicate was successful for 90.2% of reads and 15,135 of 32,817 loci were sufficiently abundant (RPKM ≥ 10) for further analysis. First, we compared the RPKM values in pupal and adult stages for the nudivirus/baculovirus-like transcripts we identified. Our results showed that most viral loci were expressed at significantly higher levels in adult ovaries (Table 1). Many of these loci, including the viral integrase and helicase, all of the pif genes, and all of the other predicted structural genes were expressed at near-zero levels in early pupae, which was fully consistent with their predicted functions in DNA replication, segment packaging, and virion formation. In contrast, lef-8, lef-9, lef-5, and vlf-1 were abundantly expressed in both stage 1 pupae and adults. Although values tended to be higher in adults, among sample variation for transcripts like lef-8 and lef-9 resulted in no statistical difference in expression between early pupae and adults.
We then compared the RPKM values for the nudivirus/baculovirus genes to the RPKM values for all other loci, which we designated as wasp genes (Fig. 2). This analysis showed that a much larger proportion of the viral loci were either more highly (100 < RPKM < 1,000) or much more highly (RPKM > 1,000) expressed in adult than in stage 1 pupal ovaries (X2 = 54.3; P > 0.0001). In contrast, only 15% of wasp loci were highly expressed in adults, which was no different than the proportion of wasp loci that were highly expressed in stage 1 pupae (X2 = 0.9; P > 0.64). This result showed a strong bias in the proportion of nudivirus-like genes that are abundantly expressed during MdBV replication relative to wasp genes.
We selected two viral RNA polymerase subunits (lef-4, lef-9) plus two other nudivirus-like genes (helicase, p74) for analysis by qPCR. This allowed us to broaden the temporal scale of our expression data, while also providing results that could be compared to the RPKM data generated by sequencing. Overall, these results corroborated our sequencing data by showing that transcript abundance for each gene was highest in day 1 adults (Fig. 3). They also showed that while expression of lef-4, lef-9, and the helicase gene began in last-day larvae or stage 1 pupae, expression of p74 did not begin until pupae reached stage 3 (Fig. 3), when virions were first observed (Fig. 1). We noted that transcript abundance for each gene also declined between day 1 and day 5 adults (Fig. 3). Collectively, these data suggested lef-4, lef-9, and the helicase behave as early genes whose products are required for viral DNA replication or transcription of other viral genes, while p74 behaved as an MdBV “late” gene whose expression coincides with particle formation. Reduced transcript abundance of these genes between day 1 and day 5 adults also suggested that MdBV replication declines with wasp age.
Previous studies of C. congregata and C. inanitus BVs provided a list of genes that are of uncertain origin (virus or wasp), but most likely have a virus-related function. The first category of genes in this regard consisted of orthologs for any of the 28 non-nudivirus genes recently identified by proteomic analysis of CiBV virions (54). We identified 7 orthologs in total with transcript abundance for 6 being more abundant in adult ovaries, as expected for predicted structural genes (Table 2). The 58b ortholog we identified did not differ in abundance between pupae and adults, and transcript abundance of 58b also did not differ between prereplication- and active-replication-stage ovaries in C. inanitus (54), suggesting that it is not a virion component. Unlike CiBV, the 35a locus we identified consisted of 3 transcripts, suggesting the duplication of this gene in M. demolitor (Table 2). The second category of genes consists of those located in C. congregata in a region of a wasp chromosome that contains many nudivirus-like genes: a nudivirus cluster (9). We identified 3 orthologs similar to genes located in this cluster in C. congregata, and all are highly upregulated in adult ovaries (Table 2).
The second priority in mining our data set was to identify wasp genes that were differentially expressed in our pupal and adult samples. We assigned all differentially expressed wasp loci to a category of genes with potential roles in BV replication. Excluding all of the nudivirus/baculovirus-like or BV-associated genes discussed above, 960 loci in our data set had mean RPKM values that differed 2-fold or more between our pupal and adult samples (two-sided t test, degrees of freedom [df] = 10, using a Bonferroni-corrected P value of <3.3 × 10−6). A total of 920 were upregulated in adults while 40 were downregulated. We further narrowed this list by focusing on transcripts with mean RPKM values that were ≥50 in adult calyces and that were >50-fold more abundant in adults than in pupae. This yielded 165 loci that we classified as strong candidates for having direct or indirect roles in MdBV replication. As shown in Table 3, 41 of these loci shared significant identity with known genes, 30 shared identity with other hypothetical genes in databases, and 94 were unique hypothetical genes. Among the known loci, 3 genes strongly upregulated in adult ovaries contained BTB/POZ domains characteristic of some zinc finger proteins, which suggested that they are unique transcription factors (61). Four other genes also had domains typical of transcription factors (zinc finger and leucine zipper domains), including locus 2319, which was highly similar to a nuclear receptor transcription factor in ants and other insects, and locus 3605, which contained a nuclear hormone receptor domain similar to tailless from Apis mellifera.
Another set of transcripts that were highly abundant in adult calyces encoded predicted ankyrin repeat proteins. All but one of these were most similar to ankyrin repeat genes from microorganisms, which suggested the possibility of horizontal acquisition by the wasp. Each was also distinct from known IκB-like (i.e., vankyrin) genes in the encapsidated genome of MdBV and other BVs whose functions include the inhibition of host NF-κB transcription factors (49). Ankyrin repeat domains are well known to mediate protein-protein interactions, but the function of the ankyrin repeat genes that these transcripts are most similar to is unknown. Finally, 5 upregulated transcripts encoded predicted proteases. Locus 1134 had high similarity to neprilysin, a metalloprotease found in the virus-like particles of the ichneumonid wasp Venturia canescens (5). The other four predicted proteases were also metalloprotease family members, which together with the ortholog for CiBV 97b (shown in Table 2) suggested a potentially critical role for metalloproteases in the formation of BV virions or virion function in parasitized host insects.
The absence of nearly all nudivirus/baculovirus genes considered essential for replication from our data set clearly suggested that replication depends on the DNA synthesis machinery of the wasp. No genes with clear roles in DNA synthesis met our criteria for inclusion in Table 3. Nonetheless, our data set contained all expected insect DNA polymerases (25), including the family B (replication) DNA polymerases α, δ, and ε and the family A mitochondrial DNA polymerase γ (Table 4). The mean RPKM value for dnapol γ was higher in our adult than in our pupal samples. In contrast, RPKM values for dnapol α were higher in pupae, while no significant differences were detected between treatments for dnapol δ and ε (Table 4). We also identified transcripts containing a DNA polymerase B2 domain, including several whose best BLAST hit was to a DNA polymerase B2-containing ORF from another BV-carrying wasp (Glyptapanteles flavicoxis) (15). Interestingly, the G. flavicoxis ORF is located in a Maverick transposable element-like sequence and also flanks a GiBV proviral segment (15). A tblastn search of the G. flavicoxis homolog against all transcripts identified 52 hits in the M. demolitor data set (see Table S5 in the supplemental material). Most of these were less than full length, suggesting fragmentation during assembly and that fewer copies of this gene actually exist in the M. demolitor genome. The 5 transcripts listed in Table 4 were the most intact because their DNA pol B2 domains were still recognizable by HMM searches. Locus 4897, hereafter named DNA pol B2, was also the only transcript with an intact DNA pol B2 domain whose mean RPKM value was higher in adults than in pupae (Table 4). Because of their higher RPKM values in adults, we selected dnapol δ and DNA pol B2 for further study by comparing their abundance in ovaries versus the wasp body without ovaries. Our results showed that the copy number of dnapol δ was very low in the ovaries and bodies of stage 1 and 3 pupae but that the copy number was significantly higher in ovaries than in the bodies of day 1 adults (Fig. 4). For DNA pol B2, we detected no expression in pupal- or adult-stage bodies, but in ovaries expression was also distinctly higher in day 1 adults than in stage 1 and 3 pupae (Fig. 4).
Prior studies suggest two models for BV replication. The first is that replication occurs by locus-specific amplification of the integrated proviral genome (30) in a manner akin to selective amplification of genes by diverse eukaryotes, including insects (50). The second is that a larger progenitor molecule(s) first excises from the wasp genome and is then amplified by a recombination-dependent or rolling circle mechanism (35, 40, 41, 50). In both scenarios, subsequent excision/circularization of the genomic DNAs that are packaged into virions is thought to occur by site-specific recombination across direct repeat junctions flanking individual segments (6, 8, 40).
Results presented in Fig. 1 clearly showed that amplification of MdBV proviral segment B precedes the formation of episomal B, but these assays do not distinguish between locus-specific amplification of the proviral genome versus excision and amplification of a larger precursor molecule. However, if locus-specific amplification is important, we would expect that M. demolitor contains the corresponding essential genes, because of their conserved roles in organisms that range from Drosophila melanogaster to Tetrahymena (50). Moreover, we would expect these transcripts to be more abundant in our adult than in our pupal samples. Of 16 essential genes in D. melanogaster, we identified 14 in M. demolitor of which 9 were more strongly expressed in adult ovaries (Table 5). In Drosophila, these genes are involved in chorion gene amplification, which is also a possibility in M. demolitor, and further functional experiments are needed to determine the role of locus-specific amplification genes in BV DNA replication. Discerning candidate genes with essential roles in rolling circle or recombination-dependent replication is more difficult. Recombination-dependent replication of lambda phage depends upon several essential genes that include the red α exonuclease and single-strand binding (SSB) protein. In turn, all baculoviruses encode homologs of red α exonuclease called alkaline nuclease (alk-exo), while both baculoviruses and at least some nudiviruses contain the SSB protein homolog lef-3 (34, 39; J. Burand, personal communication). However, we failed to identify homologs of either factor in the M. demolitor ovary transcriptome.
EST sequencing has long been the core method of approach for novel transcript discovery, but sequencing capacity limits the number of genes and gene variants this approach can identify and provides no information about transcript abundance (32). Ultra-high-throughput sequencing of cDNA is a far more comprehensive way to measure transcriptome composition, but its application to the study of viruses has primarily focused on characterizing within host population diversity (12, 17, 48, 56). However, this approach can also be extremely valuable for understanding viral-host transcriptome interactions (36, 59), especially in situations like the study of BVs where high throughput sequencing is likely the only way to determine how much of an ancestral nudivirus genome remains transcriptionally functional or to comprehensively identify wasp genes with potential roles in replication.
While all BVs initiate replication in the wasp pupal stage, the exact timing of key events during this life stage varies among species (2, 43, 57, 58). We therefore had to first establish the timing of major replication events in M. demolitor before initiating transcriptome studies. Our results show that amplification of genomic DNA containing proviral segment B begins in stage 2 pupae while virion formation begins in stage 3. MdBV replication also continues in newly emerged adults although declines in segment B abundance suggests that replication may slow or cease with age. These data justified our selection of day 1 pupae and day 1 adults as sources of ovary RNA before and during MdBV replication. A second consideration in regard to our results is the phylogenetic placement of M. demolitor relative to the two other species of wasps (C. congregata and C. inanitus) for which detailed data on BV replication are available (2, 10, 30, 35, 40, 41). All BV-carrying wasps reside within the microgastroid complex which is divided into six subfamilies (55). Both C. congregata and M. demolitor reside in the largest of these subfamilies, the Microgastrinae, but belong to the two genera that are the most distantly related to one another in this group (33). C. inanitus by contrast resides in the Cheloninae, which is the subfamily that is most distantly related to the Microgastrinae (33). Thus, comparisons between M. demolitor and C. congregata provide insights into BV replication that spans the breadth of the largest subfamily of BV-carrying braconids (Microgastrinae). Comparison of results generated in C. congregata and M. demolitor to C. inanitus in turn provides data that spans the breadth of the microgastroid complex.
With this background in mind, our data together with results from C. congregata and C. inanitus provide sufficient information to propose a conserved gene set that we hypothesize is present in all BV-carrying wasps. We present this gene set in Fig. 5 together with the 30 core genes present in all known baculoviruses and the 20 members of this baculovirus core set that are present in nudiviruses (23, 39, 51). Our list indicates that all of the nudivirus-like transcripts identified from C. congregata and/or C. inanitus (9) are expressed in M. demolitor ovaries with one exception (HzNVorf124). While homologs of vp91, vlf-1, lef-4, pif-1, and pif-2 were identified in C. inanitus but not C. congregata, their presence in M. demolitor suggests these genes are conserved among all BV-carrying braconids. Likewise, while one or more subunits of the nudivirus/baculovirus RNA polymerase complex (lef-4, lef-9 and p47), late transcription factor (vlf-1) and capping enzyme (lef-5) are missing from the C. congregata or C. inanitus data sets, the presence of each subunit in M. demolitor strongly suggests that all BV-carrying wasps likely have the genes necessary to form a complete RNA polymerase complex. Wetterwald et al. (54) identified 28 non-nudivirus-like proteins from CiBV virions, but the presence of orthologs for only 6 of these factors in M. demolitor suggests most of these proteins are not conserved among BVs generally (Fig. 5). We identified only two other nudivirus or baculovirus-like genes in the M. demolitor transcriptome (helicase and ac92) unknown from C. congregata or C. inanitus. The most notable absences from our proposed BV gene set are a baculovirus/nudivirus-like dnapol, primase (lef-1), primase-associated factor (lef-2), or single-stranded binding protein (lef-3), which together with helicase are potentially important in regulating viral DNA replication (39).
In undertaking our study, we did not expect to identify so few additional nudivirus- or baculovirus-like genes relative to previously generated EST data (9). The depth of our data set overall suggests that we identified most nudivirus-like genes transcribed in M. demolitor ovaries. However, our approaches also have some weaknesses that could result in some nudivirus-like genes not being identified. First, our data as well as prior EST studies rely on homology for the identification of nudivirus-like genes, but only four nudivirus genomes are available in the literature (51). Given the low similarity of the nudivirus-like genes identified from M. demolitor, C. congregata, and C. inanitus relative to the corresponding homologs in pathogenic nudiviruses, some genes of nudivirus origin could be overlooked because they are too poorly conserved to recognize. Second, it is possible that BVs contain genes of nudivirus origin that are absent in the genomes of the nudiviruses that have been sequenced to date. For example, the predicted 27b gene likely derives from a nudivirus, because it is located in a nudivirus gene cluster in C. congregata and is found in C. inanitus virions, yet there is no 27b homolog among known nudivirus genomes (9, 15).
Third, while our sequencing approach generated much greater coverage than previous EST studies, misassembly issues that result in failure to identify particular transcripts can occur. Ironically, using our method of assembly, this problem most often arises in the case of very-high-coverage transcripts, which can result in fragmentation during assembly. A case in point was our initial failure to identify a transcript in M. demolitor for vp39, which is the most abundantly expressed baculovirus-like gene in M. demolitor ovaries during MdBV replication. However, this initial failure also led us to carefully reanalyze our entire data set using computational approaches that correct for misassembly of high-coverage transcripts. This resulted in the identification of vp39, but we also identified only one other gene (17a) of unknown origin but potential importance in BV replication. Thus, we conclude that misassembly of high-coverage transcripts was not a pervasive problem in our data set. In contrast, our deep sequencing approach is very sensitive to the detection of transcripts of low and intermediate abundance. For example, eukaryotic DNA polymerases are well known to be transcribed at low levels, but we identified homologs for all of the expected eukaryotic dnapol genes in M. demolitor but no nudivirus-like dnapol. We also manually searched our data set using motifs that are highly conserved in all dnapol genes, but this approach did not identify any nudivirus-like dnapol. While we cannot categorically exclude the possibility that M. demolitor contains a nudivirus-like dnapol, our results overall indicate such a gene is not transcribed.
With these caveats in mind, we conclude the genes presented in Fig. 5 represent most of the nudivirus/baculovirus-related genes that are transcriptionally functional in M. demolitor during BV replication. Wasp genome sequence and data on the position of nudivirus-like genes in clusters will be one way that other nudivirus-like genes can be identified (15). Another could be through additional transcriptome data from wasps in subfamilies of the micrograstroid complex outside of the Microgastrinae and Cheloninae.
The presence of a nudivirus/baculovirus DNA helicase in BV-carrying wasps suggests that a small portion of the ancestral viral DNA replication machinery may remain functional during BV DNA replication, but the absence of a viral dnapol and primase suggests that the amplification of proviral loci depends upon wasp gene products. As previously noted, our data set identifies several wasp dnapol genes, but their expression profiles do not correlate well with our observations of the timing of MdBV DNA replication. This is not unexpected for the polymerase B2 domain-containing transcript we named DNA pol B2, which may not be a functional polymerase. Its location in a TE also suggests that it may not have a stable presence in the M. demolitor genome, as would be expected for a gene that is essential to the persistence of a symbiosis that has existed for 100 million years. In contrast, the differential expression of several genes with known functions in locus-specific amplification suggest that these factors could potentially play roles in regulating BV replication. These locus-specific amplification genes could also be important in regulating the amplification of chorion or other wasp genes that are similarly expressed in ovaries during MdBV replication.
A second important finding of our study is that several more nudivirus/baculovirus homologs are duplicated in the M. demolitor genome than previously found for C. congregata and C. inanitus (Fig. 5). Whether these gene variants have different functions is unknown, but their presence is notable because many genes in the encapsidated genomes of BVs are also duplicated (13, 15, 19, 21). In the case of MdBV, virtually all duplicated genes in the encapsidated genome are expressed in parasitized hosts (11), while experimental studies indicate that some gene family members are functionally distinct (7, 29, 49). Sequence analysis of gene families in other BVs also provide evidence for functional specialization (21, 37). Genes belonging to the vlf-1 family of genes in BVs (vlf-1, integrase, and HzNVorf140) could be functionally specialized to partition the roles attributed to VLF-1 in baculoviruses: viral integrase, transcription factor, structural protein, and packaging of DNA into capsids (39).
Our understanding of how host and symbiont genes interact to maintain eukaryotic-microbial symbioses is in its infancy. While most studies of beneficial symbionts focus on bacteria (22, 31), the symbiotic relationship between BVs and parasitoid wasps provides a powerful alternative system for understanding how pathogenic viruses can evolve to form a beneficial association with a eukaryotic host and how virus and wasp genes interact to developmentally synchronize a function like replication that each entity depends upon for survival. The use of massively parallel sequencing is an essential first step to producing a data set that captures a large percentage of the expressed regions of the M. demolitor genome. Designing experiments that functionally test the role of specific viral and wasp genes is obviously a critical next step, as are experiments to distinguish between whether locus-specific amplification or an excision/amplification model underlies replication by BVs more generally.
The study was supported by grants from the U.S. Department of Agriculture and National Science Foundation to M.R.S.
We also appreciate the feedback of two anonymous reviewers whose suggestions improved the manuscript.
Published ahead of print 11 January 2012
Supplemental material for this article may be found at http://jvi.asm.org/.