|Home | About | Journals | Submit | Contact Us | Français|
Characterization of arboviruses at the interface of pristine habitats and anthropogenic landscapes is crucial to comprehensive emergent disease surveillance and forecasting efforts. In context of surveillance campaign in and around a West African rainforest, particles morphologically consistent with rhabdoviruses were identified in cell cultures infected with homogenates of trapped mosquitoes. RNA recovered from these cultures was used to derive the first complete genome sequence of a rhabdovirus isolated from Culex decens mosquitoes in Côte d’Ivoire, tentatively named Moussa virus (MOUV). MOUV shows the classical genome organization of rhabdoviruses, with five open reading frames (ORF) in a linear order. However, sequences show only limited conservation (12–33% identity at amino acid level), and ORF2 and ORF3 have no significant similarity to sequences deposited in GenBank. Phylogenetic analysis indicates a potential new species with distant relationship to Tupaia and Tibrogargan virus.
The family Rhabdoviridae includes six genera whose members infect a wide range of animals and plants (Tordo et al., 2005). Viruses in the genus Lyssavirus, such as rabies virus, infect humans and other mammals, frequently causing fatal encephalopathy (Lyles and Rupprecht, 2007). Major vertebrate pathogens are found in the genera Vesiculovirus and Ephemerovirus (livestock), and Novirhabdovirus (fish). The Cytorhabdovirus and Nucleorhabdovirus genera include plant pathogens. The Rhadoviridae also include six unassigned serogroups as well as non-classified species (Bourhy et al., 2005; Tordo et al., 2005).
Rhabdoviruses have negative, single-stranded RNA genomes encoding at least five proteins: a nucleoprotein (N), phosphoprotein (P), matrix protein (M), glycoprotein (G) and a large RNA-dependent RNA polymerase (L). Genomes of viruses in the Ephemerovirus, Novirhabdovirus, Cytorhabdovirus and Nucleorhabdovirus genera include some additional short genes (Lyles and Rupprecht, 2007).
Arthropods play an essential role in the transmission of many rhabdoviruses (Tesh et al., 1972). Numerous plant-infecting rhabdoviruses are transmitted by leafhoppers, aphids or lacebugs (Nault, 1997). Arthropods can also transmit animal and human rhabdoviruses; mosquitoes transmit bovine ephemeral fever virus, and sandflies can transmit Chandipura virus (Dhanda et al., 1970; Tordo et al., 2005). Thus, identification of rhabdoviruses that disperse through arthropod vectors and may emerge as vertebrate pathogens (Aitken et al., 1984; Shope, 1982), is vital to comprehensive virus surveillance, and provides the foundation for improved disease prediction and forecasting efforts.
In the course of a surveillance campaign to assess the distribution of arboviruses and their vectors in rainforest edge habitats in West Africa, bullet-shaped particles were detected by electron microscopy in cell culture supernatants of cells infected with homogenates of trapped mosquitoes (Junglen et al. EcoHealth submitted). Although particle-associated nucleic acid PCR (Stang et al., 2005) allowed identification of small L sequence fragments consistent with the presence of a rhabdovirus, its genome structure and phylogeny remained elusive. Unbiased high-throughput pyrosequencing (UHTS), in combination with specific amplification using primers designed from the UHTS data enabled determination of the complete genome sequence. Sequence and phylogenetic analyses indicated a new member of the family Rhabdoviridae, tentatively named Moussa virus (MOUV), after the coffee plantation from which the first isolate (D24) originated.
Mosquitoes were collected from February to June 2004 in different habitats in rainforest edge regions of Taï National Park, Côte d’Ivoire. Mosquitoes trapped in primary forest, secondary forest, coffee plantations, two villages and at a camp site in the primary forest were used for cell culture infection (Junglen et al., 2009). Filtered homogenates of female mosquitoes were used to inoculate Aedes albopictus C6/36 cell cultures (Igarashi, 1978) and 200 µL supernatant was passaged onto fresh cells when 70–90% of cells showed cytopathic effect (CPE). After 3 passages, viral stocks were generated, stored at −70°C, and used for further analyses.
Infectious C6/36 culture supernatant was applied to African green monkey (Vero), baby hamster kidney (BHK), porcine stable equine kidney (PSEK), 293, A549, and Hep2 cell lines, as well as to primary chicken embryo fibroblasts (PCF) at 1/10 and 1/100 dilutions. Every seven days 200 µL supernatant was transferred to fresh cells for a period of 21 days and cells were observed for an additional 8 days for signs of CPE. Culture supernatants were used for PCR and re-infection of C6/36 cell.
Culture supernatant of one T175 flask was harvested except for 5 mL that were left behind to cover cells for one cycle of freeze-thaw. Virus from the total clarified supernatant was pelleted through 5 mL of 36% sucrose (28,000 rpm, 4 h, 4°C, SW32 rotor; Beckman, Fullerton, CA, USA). Virus pellet was suspended overnight at 4°C in 140 µL phosphate buffered saline (PBS).
Aliquots of purified virus (10 µL) were fixed with 2% paraformaldehyde and processed for direct negative staining electron microscopy (Biel and Gelderblom, 1999; Hayat et al., 2000). 400-mesh copper grids covered with pioloform F and carbon were floated on sample drops, washed twice on drops of double-distilled water and contrasted with 1% uranyl acetate (60 mM, pH 4). Prepared grids were then examined by transmission electron microscopy with an FEI Tecnai G2 (FEI, Hillsboro, OR,USA).
Either 140 µL culture supernatant or 140 µL purified virus were subjected to RNA extraction with a Viral RNA Kit (Qiagen, Hilden, Germany). cDNA was generated using 0.5 µg total RNA, 10 pmol random hexamers and a Superscript cDNA synthesis Kit (Invitrogen, Carlsbad, CA, USA). PCR reactions to characterize multiple isolates were performed in 25 µL reactions containing 2 µL cDNA, 7.5 pmol of each primer (FWD-10143 and REV-10788; Table S1), and 1 unit Platinum Taq polymerase (Invitrogen). Cycle conditions were 1× 95°C/10 min, 35× 95°C/30 sec, 54°C/30 sec, 72°C/5 min, and 1× 72°C/10 min. PCR products were size-fractionated by electrophoresis in agarose gels, extracted with ExoSAP-IT® (USB, Cleveland, OH, USA), and then directly sequenced with ABI PRISM Big Dye Terminator Cycle Sequencing kits on ABI PRISM DNA Analyzers (Applied Biosystems, Foster City, CA, USA).
Purified RNA (0.5 µg) was DNase I-digested (DNA-free; Ambion, Austin, TX, USA) and reverse transcribed using a Superscript II kit (Invitrogen) with random octamer primers linked to an arbitrary, defined 17-mer primer sequence (MWG, Huntsville, AL, USA). cDNA was RNase H-treated prior to random amplification by PCR, applying a 9:1 mixture of a primer corresponding to the defined 17-mer sequence, and the octamer-linked 17-mer sequence primer, respectively (Palacios et al., 2007). Products >70 bp were purified (MinElute, Qiagen) and ligated to linkers for sequencing on a GSL FLX Sequencer (454 Life Sciences, Branford, CT, USA)(Margulies et al., 2005). After trimming primer sequences and eliminating highly repetitive elements, sequences were clustered and assembled into contiguous fragments (contigs) for comparison by the Basic Local Alignment Search Tool (blast; (Altschul et al., 1990)) to the GenBank database at nucleotide (nt) and translated amino acid (aa) level applying blastn and blastx software (Cox-Foster et al., 2007; Palacios et al., 2008). The resulting alignments were further processed using custom software applications written in Perl (BioPerl 5.8.5) and programs accessible through the GreenePortal website (http://184.108.40.206/Tools).
PCR primers for amplification across sequence gaps were designed based on the UHTS data and the draft genome was re-sequenced by overlapping PCR products that covered the entire genome except for terminal sequences (Table S1). RNA was transcribed with Superscript II (Invitrogen; 20 µL assay volume) using random hexamers. PCR primers were applied at 0.2 µM concentration with 1 µL cDNA and HotStar polymerase (Qiagen). Products were purified (QIAquick PCR purification kit; Qiagen) and directly dideoxy-sequenced on both strands.
Genomic termini were characterized with 5’- and 3’-RACE kits (Invitrogen). Virus-specific primers for isolate D24 were: 5’-CCC CAG TTG GTA AGG GAA AT, located 913 nt from the 3’-genomic terminus for reverse transcription, 5’-GCC TTG TCC CTG AGG AAA CT, located 703 nt from the 3’-genomic terminus for first PCR with primer UAP (Invitrogen), and 5’-GTT CGT GCC GTT TTC GTA CT, located 528 nt from the 3’-genomic terminus for second PCR with primer AUAP (Invitrogen). RNA for the second RACE was tailed with poly(A) polymerase (Ambion) and purified using MEGAclear (Ambion). cDNA synthesis was primed with oligo d(T)-adapter primer AP (Invitrogen), and first PCR used primer 5’-TAG TCC ACA GCC TGG CTC AT, located 996 nt from the 3’-antigenomic terminus and primer UAP (Invitrogen), the second PCR used primer 5’-AAC TAC GGG TCC CCT TCA CT, located 738 nt from the 3’-antigenomic terminus and primer AUAP (Invitrogen). All primers were at 0.2 µM final concentration. PCR products were purified with QIAquick PCR purification kits (Qiagen) and directly dideoxy-sequenced in both directions.
Deduced full-length and partial L aa sequence of MOUV was aligned with L-sequences of selected members of the family Rhabdoviridae and other Mononegavirales (see Table S2) using ClustalW (Thompson et al., 2002). Phylogenetic analyses were performed with programs from the MEGA package (http://www.megasoftware.net), applying a neighbor-joining model. Statistical significance was assessed by bootstrap re-sampling of 1000 pseudoreplicate data sets. Pairwise sequence similarity and identity were calculated using the Smith-Waterman algorithm implemented in the European Molecular Biology Open Source Software suite (EMBOSS)(Rice et al., 2000).
Complete MOUV genome sequence was determined using C6/36 cell culture supernatant from two isolates, C23 and D24, which showed characteristic morphology of a rhabdovirus (Fig. 1A). UHTS yielded approximately 90,000 sequence reads with a mean length of 170 nt. Nine contigs (858 sequence reads, mean length 227 nt), ranging in size from 142 nt to 2,681 nt were assembled that showed homology to L, G and N sequence of rhabdoviruses when aligned to the GenBank database (http://www.ncbi.nlm.nih.gov/Genbank) by blastn and blastx (Table 1, Fig. 1B); except for contig C, all other contigs were identified only by blastx search. The UHTS reads covered 8.8 kilobases (kb), representing 76.3% of a prototypic rhabdoviral genome. PCR amplification using specific primers designed on the UHTS data confirmed the presence of two isolates of MOUV, C23 and D24 (GenBank accession numbers FJ985748 and FJ985749). The 3’- and 5’-termini of the MOUV genome were characterized by RACE. Twenty clones for each isolate were sequenced. Both draft sequences were re-sequenced by overlapping PCR across the whole genome, indicating 96% sequence identity between both isolates at nt level.
The MOUV genome comprises 11,526 nt with a genomic organization characteristic for known members of the genera Lyssavirus and Vesiculovirus (Fig. 1B), including three open reading frames (ORF) with homology to rhabdoviral N, G and L proteins; two ORFs, located in the position of P and M genes in other rhabdoviruses showed no nt or aa homology in blast search to sequences deposited in GenBank. After completion of MOUV sequencing by PCR, retrospective analysis of the UHTS results revealed that a total of 1,572 reads matched the final sequence, covering 93.5 % of the genome including 31 sequence reads in ORF-2 and 352 sequence reads in ORF-3 (Table 2).
The N protein of rhabdoviruses binds genomic RNA to form an RNase-resistant ribonucleocapsid that serves as template for both transcription and replication by the large viral RNA-dependent RNA polymerase (Thomas et al., 1985). ORF-1 is predicted to code for an N of 468 aa (51 kDa). Pairwise comparison of deduced protein sequence indicated highest homology to N of vesicular stomatitis Indiana virus (VSIV; 23.5%) and Tupaia rhabdovirus (TUPV; 23%)(Table 3). Pairwise alignment of MOUV N with that of VSV and rabies virus also aligned residues identified as interacting with RNA (Albertini et al., 2006; Green et al., 2006; Luo et al., 2007) (data not shown); the highly conserved motif SPYS (Kouznetzoff et al., 1998) was present as S292PYT in MOUV (Fig. S1). The alignment also indicates conserved aa clusters identified in rhabdoviruses infecting both, insects and vertebrates (dimarhabdoviruses; Kuzmin et al., 2006).
ORF-2 of MOUV is composed of 870 nt that encode a polypeptide of 290 aa (32 kDa). The sequence is not similar to others in GenBank by blastn or blastx, or by hidden markov model alignments to the Pfam database (http://pfam.janelia.org). ORF-2 is located in the genomic position of P, which is an essential cofactor of the active viral transcriptase/replicase complex. Compared to P of other rhabdoviruses, ORF-2 shows highest aa identities to those of Wongabel virus (WONV; 20.9 %) and Flanders virus (FLAV; 19.6%)(Table 3). ORF-2 also comprises potential phosphorylation sites, including one tyrosine phosphorylation site and 30 serine/threonine phosphorylation sites. However, MOUV P lacks a casein kinase II motif found in domain I of VSV P (Barik and Banerjee, 1992; Chattopadhyay and Banerjee, 1987). A small ORF overlapping that of P may code for a putative C protein of 56 residues with a predicted pI of 12. Analysis of MOUV P using PROSITE, a database of functional protein domains and sites (http://ca.expasy.org/prosite), identified a C-terminal zinc finger C2H2-domain (C251EKYGCQEPVTPSNWLHHHRSCH). To date, no zinc finger domain has been reported for a rhabdoviral P.
ORF-3 is composed of 726 nt, encoding a 242 aa polypeptide (27 kDa). ORF-3 does not have apparent similarity or motifs common to other sequences. No functional domains were identified using PROSITE. ORF-3 is located in the genomic position of M, a gene that in other rhabdoviruses encodes a protein critical to virus morphology, assembly and budding (Desforges et al., 2001; Kopecky et al., 2001; Schmitt and Lamb, 2004). The highest aa identity was found with M of WONV (20.7 %) and FLAV (19%)(Table 3). It has been previously shown that M of rhabdoviruses can be phosphorylated (Kaptur et al., 1992; Tuffereau et al., 1985); ORF-3 of MOUV includes 14 serine, 4 threonine, and 2 tyrosine potential phosphorylation sites. Late domain motifs PPxY, P(T/S)AP, and YxxL (Chen and Lamb, 2008; Demirov and Freed, 2004) identified in the M proteins of rabies virus and VSV (Harty et al., 1999; Irie et al., 2004) and implicated in viral budding, are not present in MOUV M.
The G of rhabdoviruses mediates attachment to, and fusion with, the cell membrane during entry, and particle budding during exit. G is predicted to be coded by the 1,581 nt ORF-4 of MOUV, yielding a protein of 527 aa (60 kDa). Pairwise alignments with corresponding proteins of other rhabdoviruses indicated highest aa identity with G of infectious hematopoietic necrosis virus (IHNV; 23.2%) and VSIV (22.9%)(Table 3). Despite limited primary sequence conservation amongst G proteins of rhabdoviruses, general structural features including glycosylation sites, antigenic domains, and cysteines residues are commonly conserved (Coll, 1995; Walker and Kongsuwan, 1999). Accordingly, the G of MOUV shows features of type I glycoproteins, including an N-terminal 18 aa signal peptide predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP), a COOH-terminal hydrophobic anchor located at aa 463–485 and followed by a short cytoplasmic tail (aa 486–526), six potential glycosylation sites, and fourteen cysteine residues of which twelve comprise the conserved cysteines found in animal rhabdoviruses (CI to CXII; (Walker and Kongsuwan, 1999)).
L is a component of the active ribonucleoprotein and necessary for transcription and replication. The L of MOUV is predicted to derive from the 2,142 aa ORF-5 (245 kDa). Comparison of deduced aa sequence of ORF-5 with L of other rhabdoviruses indicated highest identity with that of VSIV (32.6%) and TUPV (31.4%)(Table 3). RNA-dependent RNA polymerases (RdRp) of non-segmented negative-strand RNA viruses contain conserved residues clustered in six blocks designated I to VI (Poch et al., 1990). The four highly conserved motifs A through D (Poch et al., 1990; Poch et al., 1989) in block III are also conserved in L of MOUV (Table 4). In addition, conserved aa in block V, including the motif GxxT[n]HR involved in capping activity (Li et al., 2008), are present in MOUV L (G1166xxTHR1240)). Conserved aa in block VI shown to be involved in S-adenosyl-L-methionine interaction and cap methylation (Bujnicki and Rychlewski, 2002; Li et al., 2006) are also present in MOUV L (G1700xGxGGD1766).
Gene junctions in rhabdoviruses are composed of the polyadenylation signal for the upstream gene, a short intergenic region and the transcription initiation sequence for the downstream gene. In MOUV a putative polyadenylation signal was identified as 3’-GAACUUUUUUU, separated by 2–3 nt from a putative transcription initiation sequence 3’-UUGU(U/G)UG(G/C/A)U (Fig. 2A). The 5’-nontranslated regions of mRNA transcripts are 22 to 60 nt long, while the 3’-nontranslated regions are 10 to 83 nt long (Fig. 2A).
A common feature of rhabdovirus genomes is complementary of the 3’- and 5’-termini of the genome (Whelan et al., 2004). The 3’-leader and the 5’-trailer sequences of MOUV comprise 53 nt and 56 nt, respectively; with seventeen of the twenty-one terminal nt being complementary (Fig. 2B). The MOUV leader sequence contains the first three, and the tenth nt that are conserved in rhabdoviruses known to infect mammals (Fig. 2C).
Phylogenetic analysis of short sequences can be misleading. Thus, we evaluated the complete 2,142 aa L ORF in comparison to available complete L sequences of other rhabdoviruses. The derived phylogenetic tree, which also includes selected members of other families of the order Mononegavirales, shows TUPV, isolated from tree shrews, as the closest match to MOUV (Fig. 3A). However, complete L sequence is only available for a limited number of rhabdoviruses. Thus, we also examined a partial L sequence dataset comprising 158 aa that represents not only the six recognized genera but also four additional monophyletic groups (Bourhy et al., 2005). Phylogenetic analysis in comparison to these sequences puts MOUV in a close relationship to the Australian Tibrogargan virus (TIBV)(Fig. 3B).
In addition to isolates C23 (from Culex decens mosquitoes of the secondary forest and D24 (from Culex mosquitoes of the adjacent Moussa coffee plantations), MOUV was also identified in five other of the 97 CPE-positive supernatants obtained from 437 pools tested (4,839 mosquitoes). These included two additional isolates from the secondary forest, two from the primary rainforest, and one from a camp in the primary rainforest (Table 5). All isolates induced comparable CPE with syncytia formation in C6/36 cells 3 to 22 days after inoculation. Sequence comparison of a 600 bp L gene amplification product generated from these isolates indicated a higher degree of conservation within isolates from the forest (98.6 – 100%), or the camp and the plantation (98.6%), than between those two groups (94.8 – 96.4%). Although the isolates replicated stably in C6/36 cells, inoculation of vertebrate cell lines did not indicate signs of CPE (Vero, BHK, PSEK, 293, A549, Hep2, and PCF), or virus replication as assessed by PCR (PSEK).
The 11,526 nt genome of MOUV comprises at least five ORFs. Three ORFs correspond to the N, G and L proteins characteristic of rhabdoviruses; two correspond in genome position to P and M, but are so dissimilar to known rhabdovirus sequences that we failed to appreciate their presence during the initial analysis. To our knowledge this is the first description of a complete genome sequence for a rhabdovirus isolated from mosquitoes.
Phylogenetic analyses performed with complete and partial L polymerase sequence indicates that although MOUV is related to the dimarhabdovirus supergroup (Bourhy et al., 2005), MOUV appears to be distinct from recognized genera and species. Complete L polymerase sequence suggests a relationship to TUPV, an as yet unclassified rhabdovirus identified in tree shrews tentatively assigned to the vesiculovirus genus (Springfeld et al., 2005; Tordo et al., 2005); however, analysis of a more comprehensive set of partial L polymerase sequences suggests a closer relationship to TIBV. TIBV, the sole member of the Tibrogargan group, was initially isolated in Australia from biting midges and cattle (Bourhy et al., 2005). Further genomic characterization of the TIBV genome may provide insights into the phylogenetic relationship between these viruses. The only rhabdovirus previously known to originate from Côte d’Ivoire is Nkolbisson virus (NKOV), a member of the Kern Canyon group that was isolated from Aedes mosquitoes and characterized based on serology and electron microscopy (http://www.pasteur.fr/recherche/banques/CRORA); no sequence data is available.
Differences in genome organization of rhabdoviruses correlate with taxonomy. The simplest organization is found in lyssaviruses with the classical gene order N-P-M-G-L. Other genera may include additional genes (Tordo et al., 2005). C proteins have been identified in members of the vesiculovirus genus that infect vertebrates (Barik, 1992; Kretzschmar et al., 1996; Marriott, 2005; Peluso et al., 1996; Springfeld et al., 2005). C ORFs were also identified in TUPV (Springfeld et al., 2005), in some novirhabdoviruses (Nishizawa et al., 1997; Schutze et al., 1999) and the unclassified Siniperca chuatsi rhabdovirus (SCRV)(Tao et al., 2008). However, in contrast to MOUV, TUPV and novirhabdoviruses possess additional ORFs between their M and G (SH, (Springfeld et al., 2005)), or G and L genes (NV, (Kurath et al., 1997; Schutze et al., 1999)), respectively. Despite divergence in genome organization, dimarhabdoviruses share biological characteristics; they replicate in both invertebrate (dipteran) and vertebrate (mammal) hosts in transmission cycles maintained through hematophagous arthropods (Bourhy et al., 2005). Sequence analysis reveals features consistent with the identification of MOUV as a dimarhabdovirus. L sequence alignment indicates closest relationship to TUPV and TIBV; N shows conservation of aa motifs found in dimarhabdoviruses (Kuzmin et al., 2006); G contains the twelve cysteines conserved in animal rhabdoviruses (Walker and Kongsuwan, 1999); and the 3’-leader sequence conserves the characteristic first three and tenth nt present in rhabdoviruses that infect mammals. Furthermore, MOUV was isolated from Culex decens mosquitoes, a species that is primarily ornithophilic but also feeds on bats (Boreham and Snow, 1973). Although we have not been able to replicate MOUV in mammalian cell lines this may only reflect limitations in our repertoire.
Arthropod transmitted diseases remain a major threat to human health worldwide. Chandipura virus, isolated in the late 1960s and transmitted to humans by sandflies, has become a recognized public health threat since the Indian Chandipura encephalitis outbreak in 2003 (Bhatt and Rodrigues, 1967; Rao et al., 2004). Characterizing the wealth of arthropod-borne agents remains of crucial importance to surveillance efforts and the prediction of disease emergence.
We thank the Ivorian authorities for their long-term support, the Ministry of Environment and Forest, as well as the Ministry of Research, the directorship of the Taï National Park and the Swiss Research Center in Abidjan. The authors are also grateful to the Taï chimpanzee project and Christophe Boesch for logistic support, Stephen Rich for mosquito traps, Germain Gagné for help during field work, Georg Pauli for helpful comments on virus culturing and analyses, Omar Jabado for help with phylogenetic analyses, Craig Street, Raul Rabadan, and Alexander Solovyov for bioinformatic analyses, and Jennifer Decuir for excellent experimental assistance. This work was supported by the Robert Koch-Institute, the Max-Planck-Society, Google.org and National Institutes of Health awards AI051292 and AI57158 (Northeast Biodefense Center - Lipkin).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.