Because of the indels seen in the HEV PPR genotype 3 (), it was assumed that much of the hypervariability seen in the PPR is due to insertions and deletions (23
). The current study shows instead that much of the variability seen in the PPR is due to higher rates of nucleotide substitution at the first and second codon positions in the PPR.
Although the PPR is hypervariable, this hypervariability is not due to a higher substitution rate in the PPR compared to the nPPR. The same substitution rate appears to be operational in both regions (). The difference is that fewer mutations in the first and second codon positions are lethal in the PPR. Most likely this higher promiscuity, seen at the first and second codon positions in the PPR, is due to its intrinsically disordered structure. The lack of a well-defined tertiary protein structure means that substitutions in the first and second codon positions, which are more likely to result in nonsynonymous amino acid switches, are allowed more often than in the nPPR, where a tertiary structure must be maintained constitutively for proper function. However, the PPR does have constraints, as suggested by the higher usage of structure-breaking Pro codons (6
). The bias toward transitional substitutions may be because these substitutions are less prone than transversional substitutions to generate stop codons and because transversions lead to more diverse amino acid substitutions and significantly different chemical composition in the resultant peptide (31
Codon usage in the nPPR and the PPR shows there is a shift toward using C at the first and second codon positions in the PPR (). This is due to a shift away from using A and T at these positions and a reduction in the use of G at the second codon position (). Although the usage of G at the first codon position does not change much, the usage of C increases with the decreases in A and T (). The shift at codon position 2 is even more dramatic: from about equal usage of all nucleotides at the second codon position in the nPPR to C occurring at >50% of the second codon positions in the PPR (). This in turn results in a shift toward high usages of Pro, Ala, Ser, and Thr in the PPR, so marked that the most frequently used codon in genotypes 3 and 4 and avian HEV is Pro (). Even in genotype 1 and rubivirus, >22% of all codons in this region are Pro codons (). The decrease in A/T usage leads to a decrease in His, Phe, Trp, and Tyr. These are the patterns of amino acid usage typical of IDRs (7
). The decrease in A at the first codon position and A and G at the second codon position of the PPR means that transversional substitutions have occurred; these transversions appear to be more common among genotypes and subgenotypes (, , and ).
Although the first and second codon positions are more promiscuous in the PPR than the nPPR, alignments of zoonotic HEVs suggest that this promiscuity is greater in the carboxyl half of the PPR than in the amino half (). The carboxyl half of the PPR is also where most of the recognized indel activity occurs in the PPR (). This difference suggests that the carboxyl half of the PPR is more mutable than the amino end, and the carboxyl half of the PPR may be more involved in binding multiple ligands (23
Evolution is more easily traced in the nPPR because of the tertiary structural constraints required by the nonstructural genes for them to function properly. In contrast, because of the higher promiscuity toward substitutions and the lack of intrinsic structure or active-site amino acids, it is much more difficult to trace evolution in the PPR alone. However, an alignment of zoonotic HEVs shows that there is a similarity in purine/pyrimidine (transitional substitution) banding in the amino half of the PPR, suggesting that these isolates share an ancestor (). This commonality is not seen in the carboxyl half of these PPR sequences, perhaps due to higher mutability in that domain. An alignment of the PPR for the anthroponotic genotypes 1 and 2 does not exhibit an easily recognized similarity of purine/pyrimidine banding, perhaps because only one example of the genotype 2 PPR sequence exists; nonetheless, out-of frame shifting of the alignment implies a common ancestor ().
The similarity of sequence () and lower nucleotide diversity () seen in genotype 1 suggest that less substitution occurs in genotype 1 than in genotypes 3 or 4. This could be because the zoonotic HEVs have a wider host range, and higher nucleotide diversity is required for adaptation of these strains to their hosts. Another explanation is that modern genotype 1 is actually composed of a subset of subgenotypes from a genotype 1 ancestor. Paleoepidemiological research indicates that epidemic HEV was more common in Australia, North America, and Europe in the 18th and 19th centuries than today (26
). An analysis of the evolution of HEV suggests further that genotype 1 went through an evolutionary bottleneck about 80 to 90 years ago (22
). Improvements in sanitation in developed countries from the early 20th century could have forced genotype 1 through an evolutionary bottleneck that led to the extinction of genotype 1 in Australia, North America, and Europe, with the only surviving subtypes of genotype 1 being found in developing countries. More isolates of genotypes 1 and 2 are needed to better define the evolution of these genotypes and of the PPR in mammalian HEVs.
The hypervariability seen in the HEV PPR appears to be due to increased rates of substitution in the PPR compared to the nPPR, but the impetus for this hypervariability is increased promiscuity toward substitution at the first and second codon positions in the PPR. In conjunction with this promiscuity is a shift in nucleotide usage toward increased usage of C such that Pro codons are among the most favored in the PPR, and the decreased usage of A and T results in decreased use of His, Phe, Trp, and Typ codons. This shift leads to a region with a high number of structure-breaking Pro residues and few aromatic residues, thereby accounting for the proline richness seen in IDRs.