|Home | About | Journals | Submit | Contact Us | Français|
Human adenovirus C (HAdV-C) species are a common cause of respiratory infections and can occasionally produce severe clinical manifestations. A deeper understanding of the variation and evolution in species HAdV-C is especially important since these viruses, including HAdV-C6, are used as gene delivery vectors for human gene therapy and in other biotechnological applications. Here, the full-genome analysis of the prototype HAdV-C6 and a recently identified virus provisionally termed HAdV-C57 are reported. Although the genomes of all species HAdV-C members are very similar to each other, the E3 region, hexon and fiber (ten proteins total) present a wide range of identity values at the amino acid level. Studies of these viruses in comparison to the other three HAdV-C prototypes (1, 2, and 5) comprise a comprehensive analysis of the diversity and conservation within HAdV-C species. HAdV-C6 contains a recombination event within the constant region of the hexon gene. HAdV-C57 is a recombinant virus with a fiber gene nearly identical to HAdV-C6 and a unique hexon distinguished by its loop 2 motif.
Human adenovirus species C type 6 (HAdV-C6) was one of the first viruses identified and assigned a name among a group of respiratory viral pathogens isolated and recognized as HAdVs in the 1950s (33, 34). Within a short period, all of the original species C members (HAdV-C1, -C2, -C5, and -C6) were characterized and grouped together, based on their biological similarities and serotyping characteristics. In the intervening years, despite the isolation and identification of many types of HAdV and numerous circulating species C field strains using serum neutralization (SN) methods, no novel species C type has been observed until recently (23). The present study provides the genomic and computational analyses of the HAdV-C6 prototype, as well as the emergent HAdV-C57.
Although less commonly reported in the literature than other HAdVs, species C members are in fact the most prevalent HAdV species in some samplings (15–17, 23). They are important human pathogens, particularly in the immunocompromised (20, 21), and are reported primarily as respiratory agents; HAdV-C infections may be mild or asymptomatic (7) or cause serious acute respiratory disease (5, 7, 15–17, 32). HAdV-C species members may also establish as latent infections and are capable of long-term persistence, presumably evading the major histocompatibility complex class I immune response (10). As an example, species C members have constituted up to 28.5% of the typed isolates in a epidemiologic survey of HAdV acute respiratory infections of children in South America in a 4-year period (17), indicating a significant presence in that population. Another report observed that 23% of the pathogens causing acute respiratory disease in 100 Mexican children were HAdVs, with all of the isolates identified as species C exclusively (32), including HAdV-C6. This was also observed in a recent survey of Malaysian HAdV respiratory tract infections, which noted that species C were the most commonly isolated HAdV among pediatric patients in a 7-year period (1).
HAdV-C strains are used as gene delivery vectors for human gene therapy protocols (18, 36). The less prevalent HAdV-C6 is an alternative that may circumvent preexisting anti-HAdV-C5 immunity (4). As an indication of their importance as pathogens and as biotechnology tools, HAdV-C2 and -C5 were among the first five HAdV genomes sequenced (6, 30), and a resource and reference strain of HAdV-C5 has been established (37). We describe here a genomic and bioinformatic analysis of the remaining member of this group, HAdV-C6, along with the analysis of emergent HAdV-C57, an isolate formerly known as “strain 16700,” which has been noted as representing “a novel serotype of AdC” based on its earlier characterization by SN analysis and limited phylogenetic analysis (23). This is now supported by additional data and analysis using a genomics-based method that is applied as a new algorithm for understanding, defining and naming novel HAdVs (31, 40, 41).
HAdV-C6 was obtained from the American Type Culture Collection (Manassas, VA) as stock number VR-1083. It was originally isolated in the early 1950s from “spontaneously degenerating tissue culture of tonsil tissue cultures” and designated Tonsil 99 (34). This prototype was characterized and grouped into species HAdV-C using a variety of characteristics, including SN (13, 33). Growth of HADV-C6 in A549 cells and DNA production were outsourced to Virapur, LLC (San Diego, CA), using methods described earlier (29).
Isolation, growth, and preparation of HAdV-C57 (formerly designated strain 16700) were performed as described previously (23). HAdV-C57 was isolated from the feces of a healthy child as part of an acute flaccid paralysis surveillance program (12 December 2001; Baku, Azerbaijan). It was serotyped by SN and provided an ambiguous typing result.
Commonwealth Biotechnologies, Inc. (Richmond, VA), provided the DNA sequencing of HAdV-C6, using protocols and strategies for a series of HAdV genomes reported earlier (29). In brief, Sanger-based DNA sequencing reactions following PCR amplification of the genome using DYEnamic ET terminator cycle sequencing kits (Amersham Biosciences, Piscataway, NJ) generated sequence ladders. These were resolved on an ABI Prism 377 sequencer (Applied Biosystems, Foster City, CA). This provided an average coverage of 5-fold with a minimum of 3-fold redundancy and with both strands sequenced for higher quality data. Regions yielding unreliable data, such as base-call discrepancies, were resequenced for better resolution. Additional sequencing quality control was provided by genome annotation, resulting in additional PCR-based, primer-driven resequencing to clarify any ambiguities. HAdV-C57 was sequenced using a similar strategy, with the sequencing ladders resolved on an ABI 3130x. Eightfold redundancy with both strands sequenced ensured high-quality sequence data. The GenBank accession numbers for the sequences are FJ349096 (HAdV-C6) and HQ003817 (HAdV-C57).
Computational analyses were performed using publicly accessible software tools as described in earlier publications (40). All of the archived HAdVs and some simian adenoviruses were used in the analyses, with accession numbers available from an AdenovirusWiki site (http://www.binf.gmu.edu/wiki/index.php/databases). For detailed comparisons, the following species C genomes were used: HAdV-C1 (FJ349096), HAdV-C2 (FJ349096), and HAdV-C5 (AC_000008).
Whole-genome alignments were performed using Multiple Alignment, which utilizes Fast Fourier Transforms (MAFFT) (19). This program is available online (http://www.ebi.ac.uk/Tools/mafft/) and was applied using the default gap parameters in all alignments. Alignments, comparisons and visualization of genomes were performed using the zPicture software (http://zpicture.dcode.org).
Protein and noncoding annotations were completed as described previously (29, 40). Global alignments of the sequences from HAdV-C6 and -C57 were performed using the Needle program of EMBOSS (28). As noted by Madisch et al. (24), a BLOSUM62 matrix was used for the amino acid sequence analysis, and a DNA full matrix was used for nucleotide sequence analysis. For the hexon loop analysis, the primer sequences and protocols were from Madisch et al. (24). The coding sequences of HAdV-C6 and -C57 were compared to homologs found in all other HAdV-C genomes, with the percent identities for the proteins calculated as part of the EMBOSS analysis.
Hexon and fiber genes from the HAdV-C genomes were by first aligned with MAFFT. SimPlot (22) was used to complete a Bootscan (22) analysis of the aligned hexon and fiber genes. Default settings were used for the window size (200 nucleotides [nt]), step size (20 nt), replicates used (n = 100), gap stripping (on), distance model (Kimura), and tree model (neighbor joining).
Whole genomes were analyzed similarly, starting with an alignment using MAFFT and following with recombination analysis using SimPlot. Only the window size and step size were altered (1,000 and 200, respectively), with the other default parameters left unchanged.
Sequence alignments for phylogenetic trees were constructed using MAFFT. Selected portions of the alignments of hexon and fiber were extracted according to genome regions used by Madisch et al. (24). Bootstrapped, neighbor-joining trees with 1,000 replicates were constructed using MEGA4 software via the maximum-composite-likelihood method (38). All of the other parameters used were set by default.
The genome lengths of HAdV-C6 and -C57 are 35,758 and 35,818 bp, respectively, with GC contents of 55.35 and 55.25%, respectively. The ca. 50 putative coding regions that were identified are organized in a similar manner as the genomes of other mastadenoviruses (data not shown).
Pairwise whole-genome alignment visualizations of HAdV-C6 and -C57 were compared to the other members of species HAdV-C using the zPicture software (Fig. 1). HAdV-C6 had the greatest similarity to HAdV-C57 across the entire genome, with >95% similarity in most genome regions. The major difference between the two genomes occurred in the hexon gene, specifically within the SN epitope region, and probably accounted for its novel SN properties (23). This may be a response to selection pressures, for example, “immune escape.” Both HAdV-C6 and HAdV-C57 both displayed elevated similarity to HAdV-C2. The major differences between the three genomes were found at the hexon and fiber genes (Fig. 1).
Analysis of the in silico proteomes of the HAdV-C species is informative in comparing the amino acid differences between the members. Coding regions for HAdV-C6 and HAdV-C57 genomes were compiled and presented in Table 1, along with re-annotation of the other species HAdV-C members (HAdV-C1, -C2, and -C5) (data not shown). All contained homologous open reading frames (ORFs) at similar genome locations (data not shown), reflecting the high degree of similarity among these viruses. A graphical presentation of identities between HAdV-C6 and other HAdV-C proteins is provided in Fig. 2. The proteome of HAdV-C6 was most similar to that of HAdV-C2 and only slightly less similar to HAdV-C57. This pattern was unexpected since the nucleotide sequence of HAdV-C6 was more similar to that of HAdV-C57. Several of the proteins encoded by the 5′ portion of the genome of HAdV-C6 were identical to their homologs in HAdV-C2 (Fig. 2, top portion of y axis). The lowest identity value between these two viruses was observed in the fiber protein.
Quantitative relationships between hexon loops 1 and 2 regions for the viruses in species HAdV-C were generated based on sequence alignments of divergent sequences that are bracketed by conserved regions (primers) as defined by Madisch et al. (24). These authors calculated the values for pairs of closely related hexon loop 2 motif sequences in order to define the percent amino acids difference defining a new prototype as ≥1.2%. This value was calculated for HAdV-D39 and -D43. In the present study, the hexon loop 2 motif amino acid percent identity differences for HAdV-C57 and the two closest types, HAdV-C2 and -C6, are 10.9 and 13.3%, respectively. These values clearly establish a new type. The corresponding minimal nucleotide identity difference is 2.5% for HAdV-D39 and -D43 (reported by Madisch et al. ), and the same analysis yields 16.7 and 19.4%, respectively, for HAdV-C2 and -C6 (against HAV-C57). These metrics were calculated by using the EMBOSS Needle software with a BLOSUM62 matrix for the amino acid percent identity analysis and DNA full matrix for the nucleotide percent identity analysis. The data reported originally for HAdV-D39 and -D43 hexon loop 2 motif (24) were reconfirmed, as a control.
Recombination analysis was performed using SimPlot, which includes the Bootscan software for two different types of analysis. SimPlots are based on any given alignment method was used to align the input sequences; in the present study we used MAFFT. MAFFT is more discriminatory than Bootscan analysis. Bootscan analyses are based on a phylogeny algorithm (unweighted pair-group method with arithmetic averages). The presence of recombination in both analyses suggests an event has occurred. Although many suggestive recombination events were detectable in the genome analysis, the high similarity of HAdV-C members in most genome regions resulted in low phylogenetic signal and made it difficult to state conclusively the origin of these regions. Since it is difficult to interpret the recombination data with absolute certainty, one conservative criterion for reliably defining recombination is the presence of a plateau rather than a series of peaks. Under this more stringent criterion, one possible recombination event was contained within each of the genomes of HAdV-C6 and -C57 in the hexon gene (Fig. 3A and B). The whole-genome analysis of HAdV-C6 (Fig. 3A) identified a recombination event between HAdV-C6 and -C2. This is supported in a higher-resolution analysis (Fig. 3C), as involving the third conserved region (C3) of the hexon gene and by the Simplot analysis (data not shown). The putative recombination in the conserved region of the HAdV-C57 hexon gene is less convincing (Fig. 3D). The higher-resolution analysis suggests that HAdV-C57 contains a portion of the conserved hexon region similar to HAdV-C1. However, the presence of multiple peaks rather than a single plateau renders this inconclusive and reflects the highly similar sequences among species HAdV-C members or multiple recombination events with an unresolvable pattern. This may be interpreted also as an “ancient” recombination event that has undergone subsequent genetic drift.
Another recombination event was found in the fiber gene between HAdV-C6 and -C57, detectable in a whole-genome analysis (Fig. 3A and B). A closer inspection of the fiber gene shows a plateau spanning the entire gene (see Fig. S1 in the supplemental material). If HAdV-C57 is a recently emerging type, then the parent of this sequence is a HAdV-C6-like virus.
The phylogenomic examination of HAdV-C57 is shown in Fig. 4, with partial views of the complete phylogenetic trees presented for brevity. A full analysis of all sequenced HAdVs is available at AdenovirusWiki (http://www.binf.gmu.edu/wiki/index.php/databases). The whole-genome phylogenetic analysis clearly demonstrates that the members of species HAdV-C are closely related to each other, forming a clade. Within this, the grouping of HAdV-C2, -C6 and -C57 together as a subclade, and with high confidence values, underscores a potential lineage, as well as reflects the recombination events that transferred genome fragments between ancestors of these viruses.
Examination of select individual genes across the genome provides additional support for the uniqueness of HAdV-C57. The penton base gene phylogenetic tree analysis depicts a subclade that contains HAdV-C57 and -C1 and is supported by a robust bootstrap value of 96. The fiber knob region (hemagglutinin [HA] epitope) analysis shows that HAdV-C57 forms a group with HAdV-C6 with a reliable score of 100 and confirms the recombination analysis that shows both sharing a highly similar fiber gene (see above). Of particular interest are the hexon loop 1 and 2 regions, which contain the SN epitope. These regions are responsible for the serotype differentiation by serological tests, which were the gold standards in HAdV type identification. Phylogenomic analysis of the species hexon loop 2 (see Fig. S2 in the supplemental material) provides a graphical view of the amino acid and nucleotide percent identities supporting HAdV-C57 as a new type, as defined by metrics published by Madisch et al. (24) and as noted above.
General species-scale trends may be noted from the analysis of the genomes and the in vitro proteomes (Fig. 2). First, the majority of the proteins within HAdV-C species were highly similar between viruses of different types. Second, the hexon, E3 region and fiber proteins (total of 10 proteins) showed a wide range of identity values, indicating a higher degree of variability at the amino acid level. These proteins are involved in interaction with cellular receptors and host immune system (3). High variability among the major immunogenic proteins, hexons, and fibers can be explained by immune pressure. A comparable degree of divergence among E3 proteins suggests a similar degree of evolutionary pressure on these viruses and implies that selection and conservation in these proteins is markedly different from other adenovirus nonstructural proteins. It is conceivable that these proteins are involved in host cell adaptation and correspond in function and evolution to virus “security proteins,” which are proposed to form “a distinct class” and are “dedicated specifically to counteracting host defenses” (2).
The in silico analysis presented in Fig. 2 suggests that other genome regions, in addition to the ones corresponding to the major coat proteins, may be useful as an additional metric for typing HAdVs and for determining novel types. Presently, the partial amino acid sequences used for molecular typing derive from loops 1 and 2 of the hexon protein (which are involved in SN) and the fiber knob (which is responsible for hemagglutination). Collectively, these represent ca. 5 to 6% of the genome. The additional region identified here encompasses genes that are encoded contiguously and includes the E3-encoded genes (which may vary in number) and the fiber gene. The E3 and fiber sequences may be extracted from the genome as a single sequence fragment, ca. 4,000 to 5,000 nt in length, and used as a metric for typing. The advantage is that it spans ca. 13% of the genome, represents the 3′ end of the genome, and contains variability that is useful for parsing types. The hexon, E3, and fiber regions could be amplified in two PCR amplicons. These amplicons could be sequenced with 8 to 10 Sanger sequencing reactions to serve as a cost-effective and preliminary alternative to whole-genome data, as requested by some researchers who do not have access to whole-genome sequencing. In all, the aforementioned scheme would provide, in one glance, information on both the two regions desired for molecular typing (hexon and fiber) and the sequence of the most variable part of the genome, which may carry the most phylogenetic information. It is important to note that such a preliminary study should be confirmed eventually with a whole-genome determination for a thorough description, since possible recombination events would not be surveyed.
Genome recombination requires coinfection, which is observed in HAdV infections (8, 26, 39). Putative recombinants, based on the neutralization epsilon and hemagglutination gamma determinants, are recognized as different prototypes by the community, with additional field strains described based on these two markers (SN and hemagglutination inhibition assays) as case studies in the literature (9, 11, 12). Recently, these have been thoroughly characterized using high-resolution genomics-based and bioinformatics-based methods in great detail (31, 40, 41). It has been suggested that recombination is common in species HAdV-C and, probably, in other HAdV species (23). The newly completed genome sequences of HAdV-C6 and -C57 allow for a more detailed analysis of recombination in species HAdV-C.
Both putative hexon recombination events in these two genomes are unique because, unlike other reported HAdV hexon recombination events (40, 41), they do not involve the variable loops of the hexon. Instead, these events occur in the C3 (conserved) region of the hexon. It should be noted that the recombinant areas do not have identical lengths and positions. This area of the hexon gene is highly conserved among HAdVs and usually interferes with the recombination scans of the region. However, the C3 region of HAdV-C species shows a relatively high degree of variability relative to one another. It is possible that recombination events in this area are common but can only be observed in species HAdV-C owing to sufficient sequence variability.
The zPicture and genome percent identity data show that the HAdV-C2, -C6, and -C57 sequences were similar throughout most of their genomes. These genomes differed significantly in only two regions: the hexon and fiber genes. This pattern suggests that HAdV-C2, -C6, and -C57 share an ancestor relative to the other HAdV-C types. This pattern also reveals the possibility that the evolution of these three viruses occurred through a gradual path of divergence. On the contrary, the Bootscan data presented here suggests that recombination was commonly involved in the evolution of HAdV-C2, -C6, and -C57. In this recombinant history scenario, HAdV-C6 could result from a HAdV-C2-like hexon recombination, and HAdV-C57 is the result of HAdV-C6 fiber recombination.
HAdV-C57 is a new type based on phylogenomics and computational analysis of the hexon loop 2 motif. The novel but proven algorithm for HAdV typing calls for the use of genomics-based analysis and genome metrics to identify, characterize, and establish novel HAdVs, along with differences in the biology and/or pathogenicity of the virus (14, 31, 40, 41). One key component is phylogenomics: examining several genes spanning the genome and including the genome regions associated with serological properties and other key virus features. These landmarks include DNA sequences representing the SN epitopes (hexon loops 1 and 2) and the HA epitope (fiber knob) (24, 25). In the past, differences in SN were used to establish a new serotype. Currently, the hexon epitopes are sequenced and commonly substituted for SN. However, this is not identical to SN and should be referred to as, more appropriately, “imputed serum neutralization” (24, 27, 35, 42). The genomic and computational data presented here for HAdV-C6 and HAdV-C57 may be correlated with published serology data (13, 23) and allow a deeper understanding of how these diverse data and research approaches complement and potentially conflict with each other.
Recently, laboratories are sequencing the loops 1 and 2 motifs of the hexon gene and using qualitative phylogenetic approaches to type a particular HAdV rather than the serological methods. There is no clarity as to what degree of sequence divergence could distinguish a new serotype from a previously known one, if based solely on the qualitative interpretation of the phylogenetic data. Quantitatively, Madisch et al. (24) explored the relationships between the hexon loops 1 and 2 regions from all of the prototypes. These authors calculated an amino acid sequence percent identity difference of ≥1.2% as defining a new type. This was based on the difference between the two most closely related HAdV-D39 and -D43 hexon loop 2 motifs. For this report, the amino acid percent identity differences of 10.9 and 13.3%, respectively, are calculated for the loop 2 motif of HAdV-C2 and -C6 (each against HAdV-C57), and these values clearly establish HAdV-C57 as a new type. The corresponding minimal nucleotide sequence identity difference is 2.5% for HAdV-D39 and -D43 (reported by Madisch et al. ), and the same analysis yields 16.7 and 19.4% for HAdV-C2 and -C6, respectively (against HAdV-C57).
This study was supported in part by U.S. Public Health Service National Institutes of Health (NIH) grants EY013124 (D.S., M.P.W., M.S.J. and J.C.) and P30EY014104 (J.C.). M.S.J. was also supported in part by the U.S. Air Force Surgeon General (Clinical Investigation no. FDG20040024E). J.C. was also funded by an unrestricted grant to the Department of Ophthalmology, Harvard Medical School, from Research to Prevent Blindness, Inc. The sequencing of the genome from HAdV-C6 was undertaken with support from the Department of Defense in which one of the authors (D.S.; 2002 to 2004) was affiliated with the USAF Surgeon General Office, Directorate of Modernization (SGR), and the Epidemic Outbreak Surveillance (EOS) Program, Falls Church, VA. Portions from the HAdV-C6 work were funded specifically, during these time periods, by a grant from the U.S. Army Medical Research and Material Command (USAMRMC; DAMD17-03-2-0089), and additional support was provided through the EOS Project, funded by HQ USAF Surgeon General Office, Directorate of Modernization (SGR), and the Defense Threat Reduction Agency (DTRA).
D.S. thanks Clark Tibbetts (EOS; 2001 to 2005) for initial discussions and for providing the impetus and opportunity to pursue this line of research in adenovirus genomics. We thank David Schnurr for critical readings and thoughtful discussions on the serotyping.
The views expressed in this material are those of the authors and do not reflect the official policy or position of the U.S. Government, the Department of Defense, or the Department of the Air Force.
‡Supplemental material for this article may be found at http://jcm.asm.org/.
Published ahead of print on 17 August 2011.