Influenza virus hemagglutinin is a trimeric glycoprotein that constitutes most of the surface protein of the virus particle. The protein binds to sialic acid residues on host cells and mediates entry of the virus into the cell (reviewed in
[1]). Hemagglutinin is the most important protein for the host's protective antibody response. Changes to the hemagglutinin sequence can therefore change the antigenic properties of the virus, allowing it to escape antibodies produced in response to earlier infections or vaccinations ("antigenic drift"). The human H3N2 hemagglutinin appears to be subject to strong selection for antigenic novelty, which leads to rapid evolution at many positions in the sequence.
Certain asparagine residues on the protein are modified by addition of oligosaccharide chains. The number and location of these N-glycosylation sites varies with the strain and substrain. The glycosylation state has been observed directly for only a few substrains. However, likely glycosylation sites can be inferred from the protein sequence on the basis of a simple sequence pattern (N-X-[ST]-X, where X is not proline).
The glycosylation state of the protein may have a variety of selectively important consequences for the virus. Oligosaccharides may shield the protein from the humoral immune response, providing an advantage even in naive hosts. Changes in glycosylation state can be a source of antigenic novelty. Addition of glycosylation sites to hemagglutinin can reduce or abolish binding by monoclonal antibodies
[2],
[3] or human antisera
[4]. In one case
[3], abolition of binding was shown to be due to glycosylation
per se rather than the underlying amino acid change. As discussed in greater detail below, loss of a glycosylation site might also create useful antigenic novelty. Even if, as reported by
[5], changes in glycosylation state are not associated with transitions between "antigenic clusters", it seems likely that they are often associated with significant antigenic change. We note, in this regard, that the clusters analyzed by
[6] span several units of antigenic distance, and many of them contain multiple vaccine strains, indicating that changes in the vaccine were deemed necessary due to within-cluster antigenic drift. Additional effects of glycosylation on the immune response may also be selectively important: glycans can be targets of the innate immune system
[7], and variation in the presence and nature of oligosaccharides might provide within-host diversity without sequence variation. Glycosylation can also affect several non-immune-related aspects of hemagglutinin function, such as receptor affinity and the efficiency of the escape of new virus from host cells
[4],
[8],
[9]. Thus, several different selective forces may act on glycosylation. These forces may conflict, and the net fitness effect of gain or loss of a glycosylation site may depend on the ever-changing immune state of the host population.
The HA1 of the human H3N2 virus has experienced a long-term net gain of glycosylation sites since the appearance of H3 in humans in 1968. This gain of sites, and their long-term maintenance, are presumed to be due to a selective advantage of glycosylation, although they might result from a fortuitous excess of gains over losses. Here we examine, using a large phylogenetic tree of human H3 HA1 sequences, the dynamics of glycosylation sites along "side branches" of the tree (offshoots of the long-term line of descent). The pattern along these branches contrasts markedly with the long-term pattern: losses of glycosylation sites are not uncommon, and they far outnumber gains. This observation, though perhaps surprising, is not inconsistent with a long-term advantage of glycosylation. In fact, the details of the pattern of gains and losses provide evidence that selection acts on the state of glycosylation.