We describe here an analysis of sequence variation and evolution of the HEV HVR in acutely infected individuals, amongst epidemiologically or phylogenetically linked isolates, and within and between virus genotypes. Despite the considerable variation observed between HVRs from different genotypes and even within genotypes, very little variation was observed amongst variants co-circulating within acutely infected individuals. In addition, no bias towards non-synonymous substitutions was observed amongst these datasets. Similar conservation of the HVR has recently been described in a high-density sequencing study of virus from an acutely infected HEV patient (Bouquet et al., 2012
), although in this case the number of viral sequences in the population that were actually sampled was unknown and probably lower than the number of sequences characterized. Variation was also restricted amongst sequences within a transmission network and amongst phylogenetic groupings of HVR sequences. In all of these groupings, there was no evidence for an excess of non-synonymous substitutions over synonymous substitutions that would suggest adaptive evolution or immune escape from B- or T-cell responses.
There is, however, more consistent evidence of a bias towards non-synonymous substitutions during the passage of HEV. For example, three substitutions, all non-synonymous, were observed in the HVR of a second passage of human type 1 virus in a rhesus monkey (Arankalle et al., 1999
). Cell-culture passage of a type 3 virus in AF549 cells resulted in no changes in the HVR, whilst virus passaged in PLC/PRF/5 cells accumulated one non-synonymous change and, in a separate experiment, the same non-synonymous change and two synonymous substitutions (Lorenzo et al., 2008
). Five non-synonymous and two synonymous substitutions and the insertion of a host sequence differentiate virus from a chronically infected individual and after passage in cell culture (Shukla et al., 2011
), although these substitutions may have been pre-existing in the inoculum. Finally, virus from another chronically infected patient and containing a host-derived insertion in the HVR appeared to be unstable, with a variety of deleted forms appearing after passage (Nguyen et al., 2012
A recent study used the one-rate fixed effects likelihood method (fel
) in HyPhy (Pond et al., 2005
) to provide evidence of positive selection at a total of four, five or ten codons within the HVR in genotypes 1, 3 and 4, respectively (Purdy et al., 2012
). However, analysis of the same genotype 1 and genotype 4 datasets using other tests for positive selection within the HyPhy suite (two-rate fel
) failed to identify some or all of these sites (data not shown).
Our comparison of HVR diversity within different virus genotypes is consistent with HVR evolution occurring through the processes of substitution and duplication/deletion within the HVR. No evidence was obtained among the variants characterized in the current study for the acquisition and incorporation of exogenous sequences into the HVR by non-homologous recombination. The HVRs of all genotypes shared the property of being relatively rich in proline and serine residues, giving rise to, rather than being a consequence of, an increased frequency of cytosine in this part of the genome. We have not found consistent evidence of positive selection at any amino acid site within the HVR.
These observations are relevant to the evaluation of the various hypotheses advanced in the literature as to the function of the HVR in virus replication. An early suggestion that the HVR might simply be the result of error-prone replication (Gouvea et al., 1998
) seems unlikely, given the consistent biases in amino acid composition () and the codon position-specific distortion in nucleotide frequency (). In addition, we observed a peak in non-synonymous distances but not of synonymous distances at the HVR in a sliding-window comparison of type 1 ORF1 sequences. These observations imply that the presence of amino acid substitutions in the HVR is not the result of hypermutation in this part of the genome.
Others have interpreted the lack of identity and extensive length variation between HVR sequences from different genotypes as evidence that the HVR is not essential for virus replication (Pudupakam et al., 2009
). However, there are several lines of evidence that the HVR does have a biological function. Firstly, a proline- and serine-rich HVR is present at a similar location in all published HEV sequences in both human- and pig-derived isolates and in the more divergent rat and avian HEV genomes. This region is hypervariable in the avian sequences but not the rat sequences, although this may reflect the paucity of sequence data currently available for rat variants. A proline-rich region is also present in the more distantly related cutthroat trout virus (Batts et al., 2011
). Secondly, in vitro
experiments suggest that complete deletion of the HVR of type 1 or type 3 virus impairs the ability of virus to replicate in transfected cells or intrahepatically infected pigs (Pudupakam et al., 2009
The simplest potential function for the HVR would be as an inert spacer or hinge between two functional domains (Koonin et al., 1992
). Relevant to this possibility is the suggestion that three type 1 HVRs share a common hydropathy profile (Tsarev et al., 1992
), although this conclusion is difficult to quantify. More convincingly, a recent report suggests that the HVR is part of a larger region that is intrinsically disordered (Purdy et al., 2012
). Such a passive role for the HVR is consistent with the extensive length variation observed between HVRs of different genotypes (and within genotype 3) and by the lack of sequence identity between HVRs from different genotypes. However, it would appear that there is a lower limit to such length variation, as artificial constructs containing a series of deletions of the HVR display luciferase activity approximately in proportion to the length of HVR remaining (Pudupakam et al., 2011
). Similar features have been described for the env protein of type C retroviruses, which contains a C-terminal unstructured 40–49 residue HVR with an excess of proline (>30
%) and serine relative to the rest of env. This HVR is preceded by a 15 residue proline-rich region that is well-conserved within Moloney murine leukemia virus (MuLV) isolates. Deletions or insertions within the MuLV env protein HVR do not affect virus growth (Ott et al., 1990
; Kayman et al., 1999
), and it has been suggested that the overall amino acid composition of this region is important for the processing of env into two subunits and for their subsequent interaction (Wu et al., 1998
). Parenthetically, a similar function as an inert spacer might be proposed for the 31 aa insertion in genotype 3 isolates from rabbits, which occurs close to the junction between the ADP-ribose phosphorylase (X or macro domain) and the helicase domains. This insertion also contains an excess of proline residues relative to the rest of ORF1. The more distantly related rat HEV genome contains an insertion in a similar position, containing 24
The possibility that the HVR of HEV is the target of neutralizing immune responses arises by analogy with the HVRs of HCV and HIV. Another example is the proline-rich HVR of the MSA1 and MSA2 proteins of the intracellular parasite Babesia bovis
, which may play a role in immune-mediated escape (Berens et al., 2005
; LeRoith et al., 2006
). The HVR of HEV does appear to be immunogenic, as an antigen derived from the HVR of a genotype 1 virus had a sensitivity of 75
% against a serological panel comprising individuals infected with types 1, 3 and 4 (Osterman et al., 2012
). However, we found no consistent evidence for positive selection of HVR variants in our analysis of HVR variation in acutely infected individuals (), upon transmission between humans, from animals to humans, or amongst epidemiologically linked sequences (). We were unable to study earlier or later samples from the acutely infected patients reported here, and it remains possible that virus diversification occurs during the 1–2 months of viraemia. However, in this case one might expect to see diversity within virus populations, yet the diversity within infected individuals was very restricted (mean distances of 0–0.004
%), apart from one case in which the individual appeared to be multiply infected (). Hence, the timescale over which HVR substitution occurs appears to be longer than that of single transmission events, even if these are between different species or during defined outbreaks.
Specific functions for the HVR have also been proposed based on the presence of linear motifs, possibly with a role in the shuttling of virus between different hosts (Purdy et al., 2012
). However, our analysis provides no evidence for specific alteration of the HVR following transmission between different host species () and we have been unable to detect any phylogenetic segregation of human and non-human HVR sequences (data not shown). Hence, it does not appear that the HVR contains determinants of host range. PxxP motifs within the HVR might represent SH3-binding domains (Pudupakam et al., 2011
), as demonstrated for HEV ORF-3 (Korkaya et al., 2001
), the alphavirus nsP3 HVR (Neuvonen et al., 2011
) and the P150 replicase protein of rubella (Suppiah et al., 2012
). Although these motifs are present in all but one of currently described HVR sequences, they would be expected to arise at a similar frequency by chance in any similarly proline-rich region. This suggests that the presence of PxxP motifs is a consequence of the proline content of the HVR; whether some or all of these motifs are actually SH3-binding domains requires experimental proof.
A variety of linear motifs have been identified associated with the HVR (Purdy et al., 2012
), but these all occur in the conserved regions flanking the HVR rather than the HVR itself. A number of very general functions (enzymic activity, ligand, nucleotide binding, catalysis, ion binding) have been proposed for the HVR based upon the predicted secondary structure of a genotype 3 isolate (Purdy et al., 2012
). Peptides containing a high proportion of proline are often ligands, as the cyclized side-chain restricts movement of the backbone (Williamson, 1994
; Kay et al., 2000
), but we have been unable to identify motifs that are conserved amongst the divergent HVR sequences of different genotypes.
Finally, the suggestion has been made that divergent HVR sequences might represent evolved host-derived sequences acquired during chronic infection (Shukla et al., 2011
; Nguyen et al., 2012
). A blast
search of GenBank with representatives of types 1, 2, 3 and 4 failed to reveal any significant matches with host sequences (data not shown), although homology with host sequences could have been masked by subsequent adaptive evolution. A more convincing argument against HVR diversity being host-derived comes from our comparison of epidemiologically unrelated sequences of genotypes 1, 3 and 4, which revealed that these sequence groups were related to each other by the processes of substitution, duplication and deletion.
In conclusion, our analysis of HEV HVR evolution and variation suggests that the HVR is important for virus replication and may have a structural rather than a regulatory or enzymic function. More detailed investigation of the processing and structure of ORF1 proteins in in vitro systems will be required in order to define the role of the HVR in HEV replication and pathogenesis. In this regard, the dissection of HVR function may be facilitated by the availability of HVRs from different virus genotypes with dissimilar amino acid sequences, but presumably sharing common functionality. For example, the specificity of host-cell proteins binding to artificially expressed fusion proteins containing the HVR could be assessed by comparison with binding to a divergent HVR from a different genotype.