|Home | About | Journals | Submit | Contact Us | Français|
Bats are reservoirs for emerging zoonotic viruses that can have a profound impact on human and animal health, including lyssaviruses, filoviruses, paramyxoviruses, and severe acute respiratory syndrome coronaviruses (SARS-CoVs). In the course of a project focused on pathogen discovery in contexts where human-bat contact might facilitate more efficient interspecies transmission of viruses, we surveyed gastrointestinal tissue obtained from bats collected in caves in Nigeria that are frequented by humans. Coronavirus consensus PCR and unbiased high-throughput pyrosequencing revealed the presence of coronavirus sequences related to those of SARS-CoV in a Commerson’s leaf-nosed bat (Hipposideros commersoni). Additional genomic sequencing indicated that this virus, unlike subgroup 2b CoVs, which includes SARS-CoV, is unique, comprising three overlapping open reading frames between the M and N genes and two conserved stem-loop II motifs. Phylogenetic analyses in conjunction with these features suggest that this virus represents a new subgroup within group 2 CoVs.
Bats (order Chiroptera, suborders Megachiroptera and Microchiroptera) are reservoirs for a wide range of viruses that cause diseases in humans and livestock, including the severe acute respiratory syndrome coronavirus (SARS-CoV), responsible for the global SARS outbreak in 2003. The diversity of viruses harbored by bats is only just beginning to be understood because of expanded wildlife surveillance and the development and application of new tools for pathogen discovery. This paper describes a new coronavirus, one with a distinctive genomic organization that may provide insights into coronavirus evolution and biology.
Coronaviruses (order Nidovirales, family Coronaviridae, subfamily Coronavirinae) infect a wide range of vertebrates and cause respiratory, enteric, or less frequently, neurological diseases (1, 2). Coronaviruses were originally divided into three groups based on their antigenic cross-reactivities and nucleotide sequences (3). They have been recently reclassified by the International Committee on Taxonomy of Viruses into 3 genera, designated Alphacoronavirus (former group 1), Betacoronavirus (former group 2), and Gammacoronavirus (former group 3) (4). Whereas the alphacoronaviruses and betacoronaviruses are associated with diseases of mammals, including humans, the gammacoronaviruses are implicated chiefly in diseases of birds. Interest in coronaviruses was largely focused on their impact on domestic porcine and avian husbandry and their utility in animal models of virus-induced demyelination (5) until the emergence of severe acute respiratory syndrome (SARS) in 2003 (6). Thereafter, with recognition of the causative agent SARS coronavirus (SARS-CoV) (7–10) and of the presence of SARS-CoV-like viruses in Chinese horseshoe bats (Rhinolophus spp.) (11), efforts to explore the genetic diversity of coronaviruses and their host range intensified (12).
Bats are suggested to be important reservoir hosts of many zoonotic viruses with significant impact on human and animal health, including lyssaviruses, henipaviruses, filoviruses, and coronaviruses (13–17). Viruses of bats may be transmitted to humans directly through bites or via exposure to saliva, fecal aerosols, or infected tissues as well as indirectly through contact with infected intermediate hosts, such as swine (18). In the course of a project focused on pathogen discovery in situations where human-bat contact might facilitate more efficient interspecies transmission of emerging viruses, we surveyed bats in Nigeria. Through consensus PCR (cPCR) and unbiased high-throughput pyrosequencing (UHTS) of bat tissue samples, we identified a coronavirus that is most closely related to the genus Betacoronavirus (subgroup 2b), which includes SARS-CoV and SARS-CoV-like viruses. However, the genomic organization of this coronavirus, obtained from a Commerson’s leaf-nosed bat (Hipposideros commersoni), is unique in that it is comprised of three overlapping open reading frames (ORFs) between the M and N genes and two conserved stem-loop II motifs (s2m). Based on these observations and phylogenetic analyses, we propose that this new member of the family Coronaviridae, tentatively named Zaria bat coronavirus (ZBCoV) after the city near to where the bat was captured, represents a new subgroup of group 2 CoVs.
Total RNA extracts from gastrointestinal tract (GIT) specimens obtained from 33 bats of 6 different species (Eidolon helvum, Hipposideros commersoni, Pipistrellus sp., Rousettus aegyptiacus, Scotophilus nigrita, and Scotophilus leucogaster) captured at 2 different sites from a roost inside a cave in Nigeria (Fig. 1A) were screened for the presence of coronaviruses by consensus PCRs of a 400-nucleotide (nt) fragment of the RNA-dependent RNA polymerase (RdRp) gene. One specimen obtained from a Commerson’s leaf-nosed bat (Fig. 1B) yielded products that shared no more than 70% nt identity to any known coronavirus. RNA from ZBCoV was submitted for UHTS, resulting in a library comprising 74,133 sequence reads. Alignment of unique singleton and assembled contiguous sequences to the GenBank database (http://www.ncbi.nlm.nih.gov/) using the Basic Local Alignment Search Tool (Blastn and Blastx) (19) indicated coverage of approximately 6,500 nt of sequence distributed along coronavirus genome scaffolds and homology to regions of replicase, spike (S), and nucleocapsid (N) sequences.
The additional genomic sequence of ZBCoV was determined by filling in gaps between UTHS reads, applying consensus PCRs, and 3′ and 5′ rapid amplification of cDNA ends (RACE). Overlapping primer sets based on the draft genome were synthesized to facilitate sequence validation by conventional dideoxy sequencing. Due to exhaustion of the sample, we were unable to completely sequence the open reading frame 1ab (ORF 1ab) region (Fig. 2A).
ZBCoV has a genome organization similar to that of other coronaviruses, with the following characteristic gene order: 5′-replicase ORF 1ab-spike (S)-envelope (E)-membrane (M)-nucleocapsid (N)-3′. Both the 5′ and 3′ ends contain short untranslated regions of 297 nt and 363 nt, respectively. The conserved putative transcription regulatory sequence (TRS) motif 5′-ACGAAC-3′ identified in subgroup 2b, 2c, and 2d viruses (2) is present in ZBCoV at the 3′ end of the leader sequence and upstream of potential initiating methionine residues of each ORF except ORF 6 (Table 1).
All domains within replicase polyproteins of coronaviruses that are implicated in viral replication are found in ZBCoV, including the papain-like protease (PLpro), 3C-like protease (3CLpro), RNA-dependent RNA polymerase (RdRp), and helicase (Hel) domains (Fig. 2A). ORFs consistent with the S, E, M, and N proteins present in all other coronaviruses are also present in ZBCoV (Table 1; Fig. 2). Pairwise identity (I) and similarity (S) comparisons of a deduced amino acid sequence of ZBCoV to that of representative coronaviruses in other groups showed that the predicted proteins of ZBCoV are more similar to those of subgroup 2b CoVs than to those of other subgroups, with Hel and RdRp having the highest homologies (Hel: I, 80%; S, 90%; RdRp: I, 74%; S, 85%) and the S protein having the lowest (I, 36 to 38%; S, 50 to 53%) (http://cait.cumc.columbia.edu:88/dept/greeneidlab/IdentificationofaSARS-Coronavirus-likevirusinaleaf-nosedbatinNigeria.html).
The putative spike (S) protein of ZBCoV, comprising 1,299 amino acids (aa) in length, is slightly larger than those of other subgroup 2b CoVs (see Table S8 in the supplemental material). ZBCoV showed the highest amino acid conservation to human and civet SARS-CoV (I, 38%; S, 53%) (http://cait.cumc.columbia.edu:88/dept/greeneidlab/IdentificationofaSARS-Coronavirus-likevirusinaleaf-nosedbatinNigeria.html). Pfam (20) analysis identified a spike receptor binding domain (PF09408) that corresponds to the immunogenic receptor binding domain that binds to angiotensin-converting enzyme 2 (ACE2) and the coronavirus S1 (PF01600) and S2 (PF01601) spike glycoprotein domains. Transmembrane region prediction (TMHMM 2.0) (21) revealed a long ectodomain (aa 1 to 1240), a transmembrane domain near the C-terminal end (aa 1241 to 1263), and a short cytoplasmic tail (aa 1264 to 1298). A predicted signal peptide (SignalP 3.0) (P = 1) (22) was identified with a cleavage site (P = 0.768) between residues A16 and A17. NetNGlyc 1.0 identified 25 putative N-linked glycosylation sites. The S protein of ZBCoV displays major sequence differences compared to that of subgroup 2b CoVs, especially in the S1 domain involved in receptor binding. The critical residues suggested to be important for the cleavage of the SARS-CoV S protein are present in the S protein of ZBCoV (23–25) (see Fig. S1A in the supplemental material). Motifs at the carboxyl terminus of the S protein that are conserved among coronaviruses are also found in the ZBCoV S protein, including the conserved motif Y(X)KWPW(Y/W)(V/I)WL present as Y1237EKWPWYIWL and the cysteine-rich cytoplasmic tail (10) (see Fig. S1B in the supplemental material).
In addition to the five genes present in all genomes, coronaviruses also have several group-specific genes between the S gene and the 3′ end of the genome that encode accessory proteins (Fig. 2) (26, 27).
An ORF (ORF 3) encoding a putative 250-aa protein was observed between the S and E proteins of ZBCoV (Table 1). ORF 3 corresponds to the genomic position of ORF 3a in subgroup 2b CoVs. Similar to subgroup 2b CoVs, ORF 3 is the largest accessory gene of ZBCoV and is 75 nt shorter than ORF 3a of subgroup 2b CoVs (see Table S8 in the supplemental material). ORF 3 shows 21 to 23% aa identity and 31 to 35% aa similarity to the ORF 3a protein of subgroup 2b CoVs (see Table S9 in the supplemental material). Pfam analysis showed a relationship with PF11289, a viral family protein of an unknown function; TMHMM analysis predicts the presence of 4 transmembrane regions, spanning residues P43 to L65, A72 to E94, V99 to L121, and Y196 to V218. NetOGlyc 3.1 predicted two potential O glycosylation sites in ZBCoV. ORF 3 contains only a portion of the cysteine-rich domain identified in the ORF 3a protein of SARS-CoV; however, the cysteine potentially involved in ORF 3a protein polymerization (28) is present in ORF 3. No signal peptide, YXXΦ, or diacidic motifs were identified in ORF 3 of ZBCoV (29).
ZBCoV has a set of ORFs located between the M and N genes that are not shared by any of the known coronaviruses. These ORFs, ORF 6, ORF 7, and ORF 8, encode predicted proteins of 49, 79, and 218 aa, respectively (Table 1). A TRS was identified upstream of ORF 7 and ORF 8 but not ORF 6. ORF 6 overlaps with the M gene at the 3′ end by 101 nt, ORF 7 overlaps with ORF 6 by 31 nt, and ORF 8 overlaps with ORF 7 and the N gene by 83 and 35 nt, respectively. Blastx and Pfam analyses of ORF 6, ORF 7, and ORF 8 revealed no significant similarities or functional domains. Pfam analysis of ORF 7 indicated nonsignificant associations to the PRA1 (prenylated Rab acceptor 1) proteins (PF03208) (E value = 0.02) and the 7 transmembrane G-protein-coupled-receptor protein families (PF10323) (E value = 0.025). TMHMM analysis of ORF 7 suggested the presence of a transmembrane region between residues L10 and I32. No signal peptide was predicted.
TMHMM and SignalP analyses of ORF 6 indicated no transmembrane region or signal peptide. TMHMM analysis of ORF 8 predicted 2 transmembrane regions, and a third transmembrane region located downstream was predicted by TMpred (30). SignalP revealed a signal peptide (P = 0.988) with a putative cleaved signal sequence (P = 0.804) between residues G29 and A30.
At only 788 nt, the region in ZBCoV between the M and N genes is significantly shorter than those observed for subgroup 2b CoVs (see Table S8 in the supplemental material). Alignment of the region between the M and N genes of ZBCoV with those of subgroup 2b CoVs indicated large deletions in ZBCoV (see Fig. S2 in the supplemental material).
Another distinctive genomic feature of ZBCoV is the presence downstream from the N gene of two conserved motifs corresponding to the conserved stem-loop II motif (s2m) (31). A unique s2m is observed in coronaviruses from subgroups 2b, 3a, and 3c and in astroviruses and in the picornavirus equine rhinitis B virus (ERBV) (31–33) (see Fig. S3A in the supplemental material). Alignment of the 3′ end of ZBCoV with subgroup 2b CoVs showed deletions in the genome of subgroup 2b CoVs where the second s2m of ZBCoV is identified (see Fig. S3B). The s2m of ZBCoV are almost identical in sequence and are separated by 19 nt (see Fig. S3B). mfold prediction (34) of RNA secondary structure indicated that both s2m fold into RNA stem-loop motifs (see Fig. S3C).
Phylogenetic trees constructed from 3CLpro, RdRp, Hel, S, M and N amino acid sequences of ZBCoV and representative coronaviruses show that ZBCoV is most closely related to but distinct from the subgroup 2b CoVs, which include SARS-CoV and SARS-CoV-like viruses (Fig. 3). This finding is in accord with results obtained from pairwise amino acid comparisons of ZBCoV and other coronaviruses (http://cait.cumc.columbia.edu:88/dept/greeneidlab/IdentificationofaSARS-Coronavirus-likevirusinaleaf-nosedbatinNigeria.html). To further define the phylogenetic position of ZBCoV, an additional phylogeny was constructed using a conserved 659-nt sequence of RdRp, and the time to the most recent common ancestor (TMRCA) between ZBCoV and related coronaviruses was estimated. Based on the best-fit model (SRD06 with informative rate prior), the results of this analysis indicated that ZBCoV is most closely related to GhanaBt-CoV, a recently identified coronavirus found in bats in Ghana (35) (Fig. 4). Furthermore, ZBCoV and GhanaBt-CoV together form a well-supported clade distinct from that of the subgroup 2b CoVs. The TMRCA between ZBCoV and GhanaBt-CoV was estimated at 1,417 years before present (ybp) (95% highest population density [HPD] = 267 to 3,061 ybp). The TMRCA between the ZBCoV/GhanaBt-CoV clade and subgroup 2b CoVs was estimated at 3,047 ybp (95% HPD = 714 to 6,205 ybp), whereas the TMRCA between SARS-CoVs and SARS-CoV-like viruses was only 515 ybp (95% HPD = 132 to 1,067 ybp). Estimates of the TMRCAs between subgroup 2b CoVs and the rest of the coronavirus groups are not provided due to the potential for nucleotide site saturation at deeper phylogenetic levels to artificially create too recent TMRCA estimates.
Whereas the mean pairwise nucleotide similarity of the partial RdRp gene region was 85% (standard deviation [SD] = 9.75) within coronavirus subgroups (excluding ZBCoV/GhanaBt-CoV), the mean pairwise similarity between coronavirus subgroups was 66% (SD = 5.14) (see Fig. S4 in the supplemental material). Based on the results of the Mann-Whitney U test, these distributions are statistically different (P < 0.0001). Additionally, whereas the mean pairwise similarity within the clade ZBCoV/GhanaBt-CoV was 85% (SD = 9.01), the pairwise similarity between the clade ZBCoV/GhanaBt-CoV and subgroup 2b CoVs was only 73% (SD = 0.84). Based on the results of the Mann-Whitney U test, these distributions are statistically different (P = 0.0092). Together, these findings indicate that the clade containing ZBCoV and GhanaBt-CoV should be considered a separate subgroup within group 2 CoVs, distinct from subgroup 2b CoVs (see Fig. S4 in the supplemental material).
Differences in phylogenetic relationships and genomic organization and the low amino acid similarities of ORF 3 and the S protein of ZBCoV compared to the ORF 3a and S proteins of subgroup 2b CoVs suggest that ZBCoV represents a new subgroup of coronaviruses within the group 2 CoVs. Although ZBCoV has features found in subgroup 2b CoVs, including the TRS, a unique PLPro, ORFs between the M and N genes, and the presence of the s2m, ZBCoV forms a unique branch distinct from subgroup 2b CoVs in all phylogenetic trees analyzed. Furthermore, it differs from subgroup 2b CoVs in that ZBCoV contains three (versus four to five) ORFs between the M and N genes and has two (versus one) s2m.
Whereas the S proteins of subgroup 2b CoVs share 78 to 98% aa sequence identity, the S protein of ZBCoV has only 36 to 38% identity in the deduced amino acid sequence with those of subgroup 2b CoVs. Despite limited primary sequence conservation of the spike protein among ZBCoV and subgroup 2b CoVs, particularly in the S1 domain, Pfam analyses indicated the presence of a receptor domain that binds to the receptor ACE2, the cellular receptor for SARS-CoV (36). However, the residues in SARS-CoV that interact with the human ACE2 molecule are not conserved in ZBCoV, suggesting that human ACE2 is not a bona fide receptor for ZBCoV (37).
ORF 3, located between the S and E proteins of ZBCoV, is slightly shorter than the 3a proteins of subgroup 2b CoVs and has at most only 22% aa identity to the 3a proteins of subgroup 2b CoVs. In contrast, the 3a proteins of subgroup 2b CoVs share 81 to 98% aa identity. ORF 3 is predicted to contain four transmembrane domains with extracellular N and C termini. In contrast, ORF 3a of SARS-CoV is predicted to contain three transmembrane domains with extracellular N termini and intracellular C termini (28, 29). Whereas four O glycosylation sites are predicted in the ORF 3a protein of SARS-CoV (38), only two putative O glycosylation sites were identified in the ORF 3 of ZBCoV. The 3a protein of SARS-CoV has a cysteine-rich region important for polymerization and ion channel activity (28), as well as YXXΦ and diacidic motifs suggested to be involved in the intracellular trafficking (29). These domains were recently suggested to be important for the proapoptotic function of ORF 3a of SARS-CoV (39). However, ORF 3 of ZBCoV contains only a portion of the cysteine-rich domain and has no YXXΦ diacidic motifs. In contrast to human and civet SARS-CoV and bat RF1/2004, there is no ORF 3b in ZBCoV. The 3b protein may function as an interferon antagonist (40).
ZBCoV contains a unique set of ORFs located between the M and N genes. In subgroup 2b CoVs, ORF 6, ORF 7, and ORF 8 between the M and N genes do not overlap. In contrast, the three ORFs between the M and N genes overlap in ZBCoV. Alignment with subgroup 2b CoVs indicated deletions in ZBCoV, and as a result, one continuous ORF, ORF 8, is present in ZBCoV in place of ORFs 7a, 7b, 8, 8a, and 8b of subgroup 2b CoVs.
Similar to SARS-CoV, the putative products of ORF 6, ORF 7, and ORF 8 of ZBCoV show no sequence homology to other viral proteins. No TRS upstream of ORF 6 is found, suggesting that if ORF 6 encodes a bona fide protein, that protein is likely expressed by the subgenomic RNA M. There is precedent in SARS-CoV for functional bicistronic RNAs in the expression of ORF 3b, ORF 7b, ORF 8b, and ORF 9b (26, 41). Coronaviruses possess accessory genes, the size and location of which are group specific (2). By analogy to SARS-CoV, ORF 6, ORF 7, and ORF 8 of ZBCoV may encode accessory proteins important for virus-host interactions that may contribute to virulence and pathogenesis (26). Recent studies suggest that the SARS-CoV accessory proteins 6 and 7b are incorporated into virus particles and that 3a, 7a, and 9b are structural components of the virion (26, 41, 42). The SARS-CoV accessory proteins are suggested to have biological functions that include virus release, interferon antagonism, apoptosis induction, and inhibition of cellular protein synthesis (26, 41).
Another unique feature of ZBCoV is the presence of two highly conserved RNA sequences (s2m) downstream of the N gene. A single s2m is identified at the 3′ end of the genomes of members of several RNA virus families, including the Coronaviridae and Astroviridae, as well as the picornavirus ERBV (31–33). Recent data suggest that the SARS-CoV s2m RNA is a functional molecular mimic of the 530 stem-loop region in small-subunit ribosomal RNA, which could facilitate viral hijacking of the host’s protein synthesis machinery (43). The presence of a second s2m in ZBCoV may further increase the efficiency of this process. Interestingly, secondary structures downstream of the N gene, including bulged stem-loop and pseudoknot structures, are also identified in the genomes of subgroup 2a and 2c CoVs (44, 45).
Lagos bat virus (family Rhabdoviridae, genus Lyssavirus) was initially identified in Nigeria in the 1950s. The discovery of ZBCoV in a bat of the genus Hipposideros (family Hipposideridae), is the first identification of a coronavirus in wildlife from Nigeria. Recently, bat coronaviruses closely related to ZBCoV were isolated from roundleaf bats (Hipposideros caffer and Hipposideros ruber) in Ghana, a country that is close to Nigeria (35). Phylogenetic analysis indicates that ZBCoV and GhanaBt-CoV form a unique clade that is distinct from those in subgroup 2b CoVs. However, as the only sequence available for GhanaBt-CoV is a fragment of the RdRp gene, a comparison of the genome organization between ZBCoV and GhanaBt-CoV is not possible. Our findings and recent published data, wherein a SARS-CoV-like virus was found to lack ORF 8, suggest that there is considerable diversity in the genome organization of SARS-CoV-like viruses (46).
SARS-CoV-like viruses have been isolated from various rhinolophid bats (family Rhinolophidae, genus Rhinolophus), common insectivorous bats found in Africa and Eurasia. However, despite extensive studies, no SARS-CoV-like viruses have been reported in Hipposideros sp. bats in China (32). The Rhinolophus species suggested as reservoirs of SARS-CoV-like viruses are not present in Africa. A sequence fragment of a SARS-CoV-like virus was identified in Kenya in bats of the Chaerephon genus (family Molossidae) (47), and antibodies reactive with SARS-CoV antigen have also been detected in the sera of seven different genera of insectivorous and fruit bats sampled in central and southern Africa (48). In concert, these findings suggest that there may be no strict species-specific host restriction of SARS-CoV-like viruses in African bats.
Our phylogenetic analysis indicates that the clade containing ZBCoV and GhanaBt-CoV occupies an ancestral position to the group 2b CoVs, which include SARS-CoV and SARS-CoV-like viruses. Similar to previous estimates, the TMRCA of these two clades was estimated at ~3,047 ybp (although with large 95% HPDs). Although SARS-CoV-like viruses have been identified exclusively in bats in China, a recent sequence fragment (~120 bp) recovered from a Kenyan bat was found to occupy a position just outside subgroup 2b and may represent the ancestral African lineage of all subgroup 2b CoVs (47). Together with the position of the African clade of ZBCoV/GhanaBt-CoV relative to subgroup 2b CoVs, this finding suggests that a migration event from Africa to China within the last 100 to 1,000 years may have resulted in the subgroup 2b lineage of CoVs. Indeed, the geographic distribution and the phylogenetic relationships of bat coronaviruses seen both here (Fig. 4) and in previous work (35) suggest the presence of multiple independent migration events between Africa and Asia throughout the history of bat coronaviruses. Additional sequence data for the bat coronaviruses identified in Kenya along with increased sampling for coronaviruses in Africa as well as central and eastern Asia will likely be necessary to unveil the timing and origin of this diverse group of coronaviruses.
Bats are important reservoir hosts of zoonotic viruses with significant impact on human health, including rabies, Nipah virus, Hendra virus, Zaire Ebola virus, Marburg virus, and SARS-CoV. The wide genetic diversity that exists among zoonotic viruses in bats may allow an increased emergent potential of interspecies variants that may cause outbreaks of disease in humans and domestic animals. The giant leaf-nosed bat, Hipposideros commersoni, is widespread in sub-Saharan Africa, from Gambia to Ethiopia, Mozambique, and Madagascar, but little is known concerning its ecology, population biology, or vector competence. Clearly, in order to enhance our knowledge of the diversity and cooccurrence of potential reservoir hosts, it is essential to better understand emerging pathogen dynamics and public health relevance as a means to prevent and control future disease outbreaks.
During June 2008, bats were collected with mist netting in caves and around human dwellings or manually from roost locations near Idanre and Zaria, Nigeria. All bats appeared clinically normal. Captured bats were anesthetized by intramuscular inoculation with ketamine hydrochloride (0.05 to 0.1 mg/g of body weight) and euthanized under sedation by intracardiac exsanguination and cervical dislocation. The species of each captured bat was recorded, as well as the sex, forearm and body lengths (in cm), and weight. All samples were initially stored, transported on ice packs, and stored thereafter at −20°C, until shipment on dry ice and final storage at −80°C. No lyssavirus-specific antigens were identified in bat brains by use of direct fluorescent antibody testing.
Coronavirus screening was performed by nested PCR, amplifying a 400-nt fragment of the RdRp genes of coronaviruses using consensus primer sequences 5′-CGTTGGIACWAAYBTVCCWYTICARBTRGG-3′ and 5′-GGTCATKATAGCRTCAVMASWWGCNACNACATG-3′ for the first PCR and consensus primer sequences 5′-GGCWCCWCCHGGNGARCAATT-3′ and 5′-GGWAWCCCCAYTGYTGWAYRTC-3′ for the second PCR. Primers were designed by multiple alignments of the nucleotide sequences of available RdRp genes of known coronaviruses. Reverse transcription was performed using the SuperScript III kit (Invitrogen, San Diego, CA). PCR primers were applied at 0.2-µM concentrations with 1 µl cDNA and HotStar polymerase (Qiagen, Valencia, CA). Cycle conditions used were as follows: 1 cycle at 95°C for 15 min; 15 cycles at 95°C for 30 s, 65°C for 30 s (−1°C/cycle), and 72°C for 45 s; 35 cycles at 94°C for 30 s, 50°C for 30 s, and 72°C for 45 s; and 1 cycle at 72°C for 5 min.
Total RNA obtained from the gastrointestinal tract specimen positive for coronavirus was extracted for UHTS. Purified RNA (0.5 µg) was DNase I digested (DNA-free; Ambion, Austin, TX) and reverse transcribed using a Superscript II kit (Invitrogen) with random octamer primers linked to an arbitrary, defined 17-mer primer sequence (MWG, Huntsville, AL). cDNA was RNase H treated prior to random amplification by PCR, applying a 9:1 dilution mixture of a primer corresponding to the defined 17-mer sequence and the octamer-linked 17-mer sequence primer, respectively. Products of >70 bp were purified (MinElute; Qiagen) and ligated to linkers for sequencing on a GS FLX sequencer (454 Life Sciences, Branford, CT).
PCR primers for amplification across sequence gaps were designed (available upon request) based on the UTHS data, and the draft genome was sequenced by overlapping PCR products. Products were purified (QIAquick PCR purification kit; Qiagen) and directly dideoxy sequenced in both directions with ABI Prism BigDye Terminator 1.1 cycle sequencing kits (PerkinElmer Applied Biosystems, Foster City, CA). Additional methods applied to obtain the genome sequence included additional consensus PCR and 3′ and 5′ RACE (Invitrogen).
Alignments were constructed using MUSCLE 3.7 (49) and adjusted manually using Se-Al (50). Maximum likelihood (ML) phylogenetic trees containing representative taxa from each coronavirus genus (n = 31) (Fig. 3, legend) were constructed using the subtree pruning and regrafting (SPR) method of branch swapping in PhyML (51). Phylogenies were constructed using amino acid alignments for the complete proteins of 3CL, Hel, M, and N and partial protein alignments for the available RdRp protein sequence and for the S protein after regions with low alignment confidence were removed. In all cases, the Whelan and Goldman model of amino acid replacement was used (52), with a gamma distribution of rate heterogeneity. The value of the shape parameter for gamma (α) was estimated from the data and approximated by six rate categories. The reliability of each branch in all phylogenies was estimated using a bootstrap resampling procedure, with 100 ML replications.
To estimate the time to the most recent common ancestor (TMRCA) for the taxa contained within subgroup 2b CoVs and including ZBCoV, an additional 659-nt alignment of the RdRp gene was constructed and chosen for homology to the gene region sequenced for the coronaviruses most closely related to ZBCoV (GhanaBt-CoV). All sequences for which time-of-sampling information was available were included (n = 64). TMRCAs were estimated using the Bayesian Markov chain Monte Carlo (MCMC) method with the BEAST package, version 1.5.2 (53), and both the general time-reversible (GTR) model plus Γ distribution and the SRD06 model of nucleotide substitution. A relaxed uncorrelated lognormal molecular clock was used, calibrated by the time-stamped sequences, both with and without informative rates prior on the molecular clock of 2.0 × 10−4 ± 0.0009 nt substitutions/site/year (35). This analysis was run until all parameters converged, with 10% of the MCMC chains discarded as burn-in. Statistical confidence in the TMRCA estimates is given by the 95% highest probability density (HPD) interval around the marginal posterior parameter mean.
The classification of ZBCoV and GhanaBt-CoV as a putative new subgroup within group 2 CoVs was determined by first calculating the percent pairwise nucleotide similarity of the same 659-nt region of RdRp genes between and within the existing subgroups of coronaviruses and then extending this comparison to include the clade ZBCoV/GhanaBt-CoV. To verify this approach, a nonparametric Mann-Whitney U test was used to assess if the pairwise nucleotide similarity within the currently accepted subgroups is different from that between subgroups. This test was then used to determine if the percent pairwise similarity within the clade ZBCoV/GhanaBt-CoV is statistically different from that of the most closely related subgroup 2b CoVs.
Protein family analysis was performed using Pfam (http://pfam.sanger.ac.uk/). Predictions of signal peptide cleavage sites, glycosylation sites, and transmembrane domains were performed using respective prediction servers available at the Center for Biological Sequence Analysis (http://www.cbs.dtu.dk/services/ and http://www.ch.embnet.org/software/TMPRED_form.html). The percent amino acid sequence identity and similarity were calculated using the Needleman algorithm with an EBLOSUM62 substitution matrix (gap open/extension penalties of 10/0.1 for nucleotide and amino acid alignments; EMBOSS ), using a Perl script to iterate the process for all versus all comparisons. Prediction of RNA secondary structures was performed with the mfold program (http://mfold.bioinfo.rpi.edu/).
The GenBank accession number for the ZBCoV sequence is HQ166910.
We thank J. D. Kirby (U.S. Department of Agriculture); E. Ajoke, S. Wuyah, M. Lawal, and others of the staff of the Department of Veterinary Surgery and Medicine (Ahmadu Bello University [ABU], Zaria, Nigeria); the Vice Chancellor and Management of ABU; the Federal Ministry of Health (Abuja, Nigeria); the King and Chiefs of the Idanre community, Ondo State, Nigeria, for their helpful comments and assistance with logistics; and I. Kuzmin for the photograph of the Commerson’s leaf-nosed bat. We also thank D. Palmer (Rabies Program, Centers for Disease Control and Prevention [CDC], Atlanta, GA); Robert Serge (Center for Infection and Immunity, Columbia University, New York, NY) for statistical assistance; and Charles H. Calisher, Colorado State University, and Eric Brouzes for editorial comments.
This work was supported by National Institutes of Health grants AI051292 and AI57158 (Northeast Biodefense Center; to W. I. Lipkin), a National Institute of Allergy and Infectious Diseases grant (5R01AI079231-02), a U.S. Agency for International Development grant (PREDICT grant GHNA 0009 0001 000), and an award from the U.S. Department of Defense.
Citation Quan, P.-L., C. Firth, C. Street, J. A. Henriquez, A. Petrosov, et al. 2010. Identification of a severe acute respiratory syndrome coronavirus-like virus in a leaf-nosed bat in Nigeria. mBio 1(4):e00208-10. doi:10.1128/mBio.00208-10.