The genome of CoV-HKU1 is a 29,926-nucleotide, polyadenylated RNA. The G+C content is 32%, the lowest among all known coronaviruses with genome sequence available (Table ). The genome organization is the same as that of other coronaviruses, with the characteristic gene order 5′-replicase, spike (S), envelope (E), membrane (M), nucleocapsid (N)-3′. Both 5′ and 3′ ends contain short untranslated regions. The 5′ end of the genome consists of a putative 5′ leader sequence (17
). A putative transcription regulatory sequence (TRS) motif, 5′-AAUCUAAAC-3′ (as in MHV and BCoV), or alternatively, 5′-UAAAUCUAAAC-3′, was found at the 3′ end of the leader sequence and precedes each translated ORF except ORF5 (Table ). As in SDAV and MHV, ORF5, which encodes the putative E protein, may share the same TRS with ORF4, suggesting that the translation of the E protein is cap independent, possibly via an internal ribosomal entry site (IRES) (34
). A stretch of 13 nucleotides, AUUUAUUGUUUGG (similar to the IRES element, UUUUAUUCUUUUU, in MHV), upstream of the initiation codon of the E protein is present in CoV-HKU1 (12
). Further experiments would determine if this sequence acts as an IRES for this ORF and whether 5′-UAAAUCUAAAC-3′ or 5′-AAUCUAAAC-3′ is the real TRS for CoV-HKU1. Of note is that 5′-AAUCUAAAC-3′ and 5′-UAAAUCUAAAC-3′ are also observed at nucleotide positions 19528 and 22518 of the genome, respectively, neither of which precedes an ORF of obvious significance. Analysis of more genomes of CoV-HKU1 would reveal whether this is a consistent feature and its possible role in recombination of the CoV-HKU1 genome. The 3′ untranslated region contains a predicted bulged stem-loop structure 2 to 66 nucleotides downstream of N gene (nucleotide position 29647 to 29711). This bulged stem-loop structure is conserved in group 2 coronaviruses (8
). Downstream to the bulged stem-loop structure, 63 to 115 nucleotides downstream of the N gene (nucleotide position 29708 to 29760), a pseudoknot structure is present. This pseudoknot structure is conserved among coronaviruses and plays a role in coronavirus RNA replication (42
Comparison of genomic features of CoV-HKU1 and other coronaviruses and amino acid identities
Coding potential and putative transcription regulatory sequences of the CoV-HKU1 genome sequence
The coding potential of the CoV-HKU1 genome is shown in Fig. and Table , and the phylogenetic analysis of the chymotrypsin-like protease (3CLpro), Pol, helicase, hemagglutinin-esterase (HE), S, E, M, and N is shown in Fig. .
FIG. 1. Genome organization of CoV-HKU1. Overall organization of the 29,926-nucleotide CoV-HKU1 genomic RNA. Predicted ORFs 1a and 1b, encoding the nonstructural polyproteins (p28, p65, and nsp1 to -13) and those encoding the hemagglutinin-esterase, spike, envelope, (more ...)
FIG. 2. Phylogenetic analysis of chymotrypsin-like protease (3CLpro), RNA-dependent RNA polymerase (Pol), helicase, hemagglutinin-esterase (HE), spike (S), envelope (E), membrane (M), and nucleocapsid (N) of CoV-HKU1. The trees were constructed by the neighbor-joining (more ...)
The replicase 1a ORF (nucleotide position 206 to 13600) and replicase 1b ORF (nucleotide position 13600 to 21753) occupy 21.5 kb of the CoV-HKU1 genome. Similar to the case with other coronaviruses, a frame shift interrupts the protein-coding regions and separates ORFs 1a and 1b. This ORF encodes a number of putative proteins, including nsp1 (which contains the putative papain-like proteases), nsp2 (the putative 3CLpro
), nsp9 (the putative Pol), nsp10 (the putative helicase), and other proteins with unknown functions. These proteins are produced by proteolytic cleavage of the large replicase polyprotein. The arrangement of the resulting putative proteins is the same as that in the MHV genome (Fig. ). This polyprotein is synthesized by a −1 ribosomal frameshift at a conserved site (UUUAAAC) upstream of a pseudoknot structure at the junction of ORF 1a and ORF 1b. This ribosomal frameshift would result in a polyprotein of 7,182 amino acids, which has 75 to 77% amino acid identities with the polyproteins of other group 2 coronaviruses and 43 to 47% amino acid identities with the polyproteins of non-group 2 coronaviruses. The Pol of CoV-HKU1, with 928 amino acids, has 87 to 90% amino acid identities with the Pol of other group 2 coronaviruses and 54 to 65% amino acid identities with the Pol of non-group 2 coronaviruses (Table and Fig. ). The catalytic histidine and cysteine amino acid residues, conserved among the 3CLpro
in all coronaviruses, are present in the predicted 3CLpro
of CoV-HKU1 (amino acids His3375
of ORF 1a). nsp1, which corresponds to p210 in MHV, contains two papain-like proteases (PLpro
. In the N terminus of nsp1 (amino acid residues 945 to 1104 of ORF 1a), there are 14 tandem copies of a 30-base repeat which encodes NDDEDVVTGD, followed by two 30-base regions that encode NNDEEIVTGD and NDDQIVVTGD, located inside the acidic domain upstream of PL1pro
(Fig. ). This acidic tandem repeat (ATR) is not observed in other coronaviruses. The presence of this ATR is confirmed by sequencing the corresponding part of the genome from two NPAs collected 1 week apart. The presence of the repeat does not result in a marked change in the isoelectric point of the acidic domain (3.31 in CoV-HKU1 versus 3.92 in MHV) or the predicted secondary structure (random coil in both CoV-HKU1 and MHV). Moreover, the characteristic amino acid residues for proteolytic cleavage by the two PLpro
, determined by mutagenesis studies, located at the junctions of p28/p65, p65/nsp1, and nsp1/nsp2 in MHV, are all present in the corresponding positions in CoV-HKU1 (13
). Furthermore, the zinc finger domain proposed to possess nonproteolytic activity in other coronaviruses is also present in PL1pro
of CoV-HKU1 (10
FIG. 3. Arrangements of proteins in replicase polyprotein in HKU1 compared with those in HCoV-OC43, BCoV, and MHV. Alignment of the AC domains of HCoV-OC43, BCoV, and MHV and the AC domains and ATR (underlined) of CoV-HKU1 in the two patients was generated with (more ...)
ORF 2 (nucleotide position 21773 to 22933) encodes the predicted HE glycoprotein with 386 amino acids. HE is present in group 2 coronaviruses and influenza C virus. The HE of CoV-HKU1 has 50 to 57% amino acid identities with the HE of other group 2 coronaviruses (Table and Fig. ). PFAM and InterProScan analysis of the ORF shows that amino acid residues 1 to 349 of the predicted protein constitute a member of the hemagglutinin esterase family (PFAM accession no. PF03996
and INTERPRO accession no. IPR007142
). Furthermore, PFAM and InterProScan analysis shows that amino acid residues 122 to 236 of the predicted protein constitute the hemagglutinin domain of the HE fusion glycoprotein family (PFAM accession no. PF02710
and INTERPRO accession no. IPR003860
). SignalP analysis reveals a signal peptide probability of 0.738, with a cleavage site between residues 13 and 14. Although TMpred and TMHMM analysis of the ORF shows four and three transmembrane domains, respectively, PHDhtm analysis shows only one transmembrane domain, at positions 354 to 376. This concurs with only one transmembrane region reported in the C terminus of the HE of BCoV and puffinosis virus (14
). PrositeScan analysis of the HE protein of CoV-HKU1 reveals eight potential N-linked glycosylation (six NXS and two NXT) sites. These are located at positions 83 (NYT), 110 (NGS), 145 (NVS), 168 (NYS), 193 (NFS), 286 (NSS), 314 (NVS), and 328 (NFT). The putative active site for neuraminate O
-acetyl-esterase activity, FGDS, is located at positions 31 to 34 (39
). In BCoV, it has been shown that HE is required for viral replication in one study (38
) but is not essential for viral infection under some specific experimental conditions (26
). In MHV, the expression of HE is heterogeneous, depending on the number of copies of UCUAA in the leader sequence, the presence of initiation codon, upstream promoter, and a complete ORF with C-terminal transmembrane anchor (49
), and appears to be related to central nervous system tropism (50
). In CoV-HKU1, the initiation codon and a complete ORF are present. Since the HE of CoV-HKU1 is quite distantly related to the HE of MHV and BCoV/HCoV-OC43 (Fig. ), further experiments have to be performed to determine the essentiality and function of HE in CoV-HKU1.
ORF 3 (nucleotide position 22942 to 27012) encodes the predicted S glycoprotein (PFAM accession no. PF01601
) with 1,356 amino acids. The S protein of CoV-HKU1 has 60 to 61% amino acid identities with the S proteins of other group 2 coronaviruses but less than 35% amino acid identities with the S proteins of non-group 2 coronaviruses (Table and Fig. ). InterProScan analysis predicts it as a type I membrane glycoprotein. Important features of the S protein of CoV-HKU1 are depicted in Fig. . PrositeScan of the S protein of CoV-HKU1 revealed 28 potential N-linked glycosylation (12 NXS and 16 NXT) sites. SignalP analysis revealed a signal peptide probability of 0.909, with a cleavage site between residues 13 and 14. By multiple alignments with the S proteins of other group 2 coronaviruses, a potential cleavage site located after RRKRR, between residues 760 and 761, where S will be cleaved into S1 and S2, was identified. Immediately upstream to RRKRR, there is a series of five serine residues that are not present in any other known coronaviruses (Fig. ). Most of the S protein (residues 15 to 1300) is exposed on the outside of the virus, with a transmembrane domain at the C terminus (TMHMM analysis of the ORF shows one transmembrane domain at positions 1301 to 1356), followed by a cytoplasmic tail rich in cysteine residues. Two heptad repeats, located at residues 982 to 1083 (HR1) and 1250 to 1297 (HR2), identified by multiple alignments with other coronaviruses, are present. The receptor for S protein binding in MHV and HCoV-OC43 are CEACAM1 and sialic acid, respectively (15
). While the three conserved regions (sites I, II, and III) and amino acid residues (Thr62
, and Tyr216
) in the N-terminal of the MHV S protein important for receptor-binding activity (33
) are present in CoV-HKU1 (Fig. ), the amino acid residues on the S protein of HCoV-OC43 that are important for receptor binding are not well defined. Further experiments should be performed to delineate the receptor for CoV-HKU1.
FIG. 4. Spike protein of CoV-HKU1. The spike protein (1,356 amino acids) of CoV-HKU1 is depicted by the horizontal bar. SS, N terminal signal sequence (amino acid residues 1 to 13); HR1, heptad repeat 1 (amino acid residues 982 to 1083); HR2, heptad repeat 2 (more ...)
ORF 4 (nucleotide position 27051 to 27380) encodes a predicted protein with 109 amino acids. This ORF overlaps with the ORF that encodes the E protein. PFAM analysis of the ORF shows that the predicted protein is a member of the coronavirus nonstructural protein NS2 family (PFAM accession no. PF04753). TMpred and TMHMM analysis does not reveal any transmembrane helix. This predicted protein of CoV-HKU1 has 44 to 51% amino acid identities with the corresponding proteins of other group 2 coronaviruses.
ORF 5 (nucleotide position 27373 to 27621) encodes the predicted E protein with 82 amino acids. The E protein of CoV-HKU1 has 54 to 60% amino acid identities with the E proteins of other group 2 coronaviruses but less than 35% amino acid identities with the E proteins of non-group 2 coronaviruses (Table and Fig. ). PFAM and InterProScan analysis of the ORF shows that the predicted E protein is a member of the nonstructural protein NS3/small envelope protein E family (PFAM accession no. PF02723). SignalP analysis predicts the presence of a transmembrane anchor (probability 0.995). TMpred analysis of the ORF shows two transmembrane domains at positions 16 to 34 and 39 to 59, and TMHMM analysis of the ORF shows two transmembrane domains at positions 10 to 32 and 39 to 58, consistent with the anticipated association of the E protein with the viral envelope.
ORF 6 (nucleotide position 27633 to 28304) encodes the predicted M protein with 223 amino acids. The M protein of CoV-HKU1 has 76 to 84% amino acid identities with the M proteins of other group 2 coronaviruses but less than 40% amino acid identities with the M proteins of non-group 2 coronaviruses (Table and Fig. ). PFAM analysis of the ORF shows that the predicted M protein is a member of the coronavirus matrix glycoprotein family (PFAM accession no. PF01635). SignalP analysis predicts the presence of a transmembrane anchor (probability, 0.926). TMpred analysis of the ORF shows three transmembrane domains at positions 21 to 42, 53 to 74, and 77 to 98. TMHMM analysis of the ORF shows three transmembrane domains at positions 20 to 39, 46 to 68, and 78 to 100. The N-terminal 19 to 20 amino acids are located on the outside, and the C-terminal 123- to 125-amino-acid hydrophilic domain is located on the inside of the virus.
ORF 7 (nucleotide position 28320 to 29645) encodes the predicted N protein (PFAM accession no. PF00937) with 441 amino acids. The N protein of CoV-HKU1 has 57 to 68% amino acid identities with the N proteins of other group 2 coronaviruses but less than 40% amino acid identities with the N proteins of non-group 2 coronaviruses (Table and Fig. ).
ORF 8 (nucleotide position 28342 to 28959) encodes a hypothetical protein (N2) of 205 amino acids within the ORF that encodes the predicted N protein. PFAM analysis of the ORF shows that the predicted protein is a member of the coronavirus nucleocapsid I protein family (PFAM accession no. PF03187
). This hypothetical N2 protein of CoV-HKU1 has 32 to 39% amino acid identities with the N2 proteins of other group 2 coronaviruses. This protein has been shown to be nonessential for viral replication in MHV (5
ELISA using recombinant N protein of CoV-HKU1.
An ELISA-based antibody test was developed with this recombinant N protein for the detection of specific antibodies against this protein. Box titration was carried out with serial dilutions of recombinant N protein coating antigen (in one axis) and serum (in the other axis) obtained from the fourth week of the patient's illness. The results identified 20 and 80 ng of purified recombinant N protein per well as the ideal amounts for plate coating and 1:1,000 and 1:20 as the most optimal serum dilutions for IgG and IgM detection, respectively.
To establish the baseline for the ELISA tests, serum samples (diluted at 1:1,000 and 1:20 for IgG and IgM, respectively) from 100 healthy blood donors were tested. The mean ELISA optical densities at 450 nm for IgG and IgM detection were 0.178 and 0.224, with standard deviations of 0.070 and 0.117, respectively. Absorbance values of 0.387 and 0.576 were selected as the cutoff values (means plus three standard deviations) for IgG and IgM, respectively. Using these cutoffs, the titers for IgG of the patient's sera obtained during the first, second, and fourth weeks of the illness were <1:1,000, 1:2,000, and 1:8,000, respectively, and those for IgM were 1:20, 1:40, and 1:80, respectively (Fig. ).