General species-scale trends may be noted from the analysis of the genomes and the in vitro
proteomes (). First, the majority of the proteins within HAdV-C species were highly similar between viruses of different types. Second, the hexon, E3 region and fiber proteins (total of 10 proteins) showed a wide range of identity values, indicating a higher degree of variability at the amino acid level. These proteins are involved in interaction with cellular receptors and host immune system (3
). High variability among the major immunogenic proteins, hexons, and fibers can be explained by immune pressure. A comparable degree of divergence among E3 proteins suggests a similar degree of evolutionary pressure on these viruses and implies that selection and conservation in these proteins is markedly different from other adenovirus nonstructural proteins. It is conceivable that these proteins are involved in host cell adaptation and correspond in function and evolution to virus “security proteins,” which are proposed to form “a distinct class” and are “dedicated specifically to counteracting host defenses” (2
The in silico analysis presented in suggests that other genome regions, in addition to the ones corresponding to the major coat proteins, may be useful as an additional metric for typing HAdVs and for determining novel types. Presently, the partial amino acid sequences used for molecular typing derive from loops 1 and 2 of the hexon protein (which are involved in SN) and the fiber knob (which is responsible for hemagglutination). Collectively, these represent ca. 5 to 6% of the genome. The additional region identified here encompasses genes that are encoded contiguously and includes the E3-encoded genes (which may vary in number) and the fiber gene. The E3 and fiber sequences may be extracted from the genome as a single sequence fragment, ca. 4,000 to 5,000 nt in length, and used as a metric for typing. The advantage is that it spans ca. 13% of the genome, represents the 3′ end of the genome, and contains variability that is useful for parsing types. The hexon, E3, and fiber regions could be amplified in two PCR amplicons. These amplicons could be sequenced with 8 to 10 Sanger sequencing reactions to serve as a cost-effective and preliminary alternative to whole-genome data, as requested by some researchers who do not have access to whole-genome sequencing. In all, the aforementioned scheme would provide, in one glance, information on both the two regions desired for molecular typing (hexon and fiber) and the sequence of the most variable part of the genome, which may carry the most phylogenetic information. It is important to note that such a preliminary study should be confirmed eventually with a whole-genome determination for a thorough description, since possible recombination events would not be surveyed.
Genome recombination requires coinfection, which is observed in HAdV infections (8
). Putative recombinants, based on the neutralization epsilon and hemagglutination gamma determinants, are recognized as different prototypes by the community, with additional field strains described based on these two markers (SN and hemagglutination inhibition assays) as case studies in the literature (9
). Recently, these have been thoroughly characterized using high-resolution genomics-based and bioinformatics-based methods in great detail (31
). It has been suggested that recombination is common in species HAdV-C and, probably, in other HAdV species (23
). The newly completed genome sequences of HAdV-C6 and -C57 allow for a more detailed analysis of recombination in species HAdV-C.
Both putative hexon recombination events in these two genomes are unique because, unlike other reported HAdV hexon recombination events (40
), they do not involve the variable loops of the hexon. Instead, these events occur in the C3 (conserved) region of the hexon. It should be noted that the recombinant areas do not have identical lengths and positions. This area of the hexon gene is highly conserved among HAdVs and usually interferes with the recombination scans of the region. However, the C3 region of HAdV-C species shows a relatively high degree of variability relative to one another. It is possible that recombination events in this area are common but can only be observed in species HAdV-C owing to sufficient sequence variability.
The zPicture and genome percent identity data show that the HAdV-C2, -C6, and -C57 sequences were similar throughout most of their genomes. These genomes differed significantly in only two regions: the hexon and fiber genes. This pattern suggests that HAdV-C2, -C6, and -C57 share an ancestor relative to the other HAdV-C types. This pattern also reveals the possibility that the evolution of these three viruses occurred through a gradual path of divergence. On the contrary, the Bootscan data presented here suggests that recombination was commonly involved in the evolution of HAdV-C2, -C6, and -C57. In this recombinant history scenario, HAdV-C6 could result from a HAdV-C2-like hexon recombination, and HAdV-C57 is the result of HAdV-C6 fiber recombination.
HAdV-C57 is a new type based on phylogenomics and computational analysis of the hexon loop 2 motif. The novel but proven algorithm for HAdV typing calls for the use of genomics-based analysis and genome metrics to identify, characterize, and establish novel HAdVs, along with differences in the biology and/or pathogenicity of the virus (14
). One key component is phylogenomics: examining several genes spanning the genome and including the genome regions associated with serological properties and other key virus features. These landmarks include DNA sequences representing the SN epitopes (hexon loops 1 and 2) and the HA epitope (fiber knob) (24
). In the past, differences in SN were used to establish a new serotype. Currently, the hexon epitopes are sequenced and commonly substituted for SN. However, this is not identical to SN and should be referred to as, more appropriately, “imputed serum neutralization” (24
). The genomic and computational data presented here for HAdV-C6 and HAdV-C57 may be correlated with published serology data (13
) and allow a deeper understanding of how these diverse data and research approaches complement and potentially conflict with each other.
Recently, laboratories are sequencing the loops 1 and 2 motifs of the hexon gene and using qualitative phylogenetic approaches to type a particular HAdV rather than the serological methods. There is no clarity as to what degree of sequence divergence could distinguish a new serotype from a previously known one, if based solely on the qualitative interpretation of the phylogenetic data. Quantitatively, Madisch et al. (24
) explored the relationships between the hexon loops 1 and 2 regions from all of the prototypes. These authors calculated an amino acid sequence percent identity difference of ≥1.2% as defining a new type. This was based on the difference between the two most closely related HAdV-D39 and -D43 hexon loop 2 motifs. For this report, the amino acid percent identity differences of 10.9 and 13.3%, respectively, are calculated for the loop 2 motif of HAdV-C2 and -C6 (each against HAdV-C57), and these values clearly establish HAdV-C57 as a new type. The corresponding minimal nucleotide sequence identity difference is 2.5% for HAdV-D39 and -D43 (reported by Madisch et al. [24
]), and the same analysis yields 16.7 and 19.4% for HAdV-C2 and -C6, respectively (against HAdV-C57).