|Home | About | Journals | Submit | Contact Us | Français|
Rhinoviruses are the most common causes of the common cold. Their many distinct lineages fall into “major” and “minor” groups that use different cell surface receptors to enter host cells. Minor-group rhinoviruses are more immunogenic in laboratory studies, although their patterns of transmission and their cold symptoms are broadly similar to those of the major group. Here we present evolutionary evidence that minor-group viruses are also more immunogenic in humans. A key finding is that rates of amino acid substitutions at exposed sites in the capsid proteins VP2, VP3, and VP1 tend to be elevated in minor-group relative to major-group viruses, while rates at buried sites show no consistent differences. A reanalysis of historical virus watch data also indicates a higher immunogenicity of minor-group viruses, consistent with our findings about evolutionary rates at amino acid positions most directly exposed to immune surveillance. The increased immunogenicity and speed of evolution in minor-group lineages may contribute to the very large numbers of rhinovirus serotypes that coexist while differing in virulence.
IMPORTANCE Most colds are caused by rhinoviruses (RVs). Those caused by a subset known as the minor-group members of rhinovirus species A (RV-A) are correlated with the inception and aggravation of asthma in at-risk populations. Genetically, minor-group viruses are similar to major-group RV-A, from which they were derived, although they tend to elicit stronger immune responses. Differences in their rates and patterns of molecular evolution should be highly relevant to their epidemiology. All RV-A strains show high rates of amino acid substitutions in the capsid proteins at exposed sites not previously identified as being immunogenic, and this increase is significantly greater in minor-group viruses. These findings will inform future studies of the recently discovered RV-C, which also appears to exacerbate asthma in adults and children. In addition, these findings draw attention to the difficult problem of explaining the long-term coexistence of many serotypes of major- and minor-group RVs.
Cold infections are caused by more than 250 virus serotypes belonging to at least five different families. The most common cold-causing viruses are rhinoviruses (RVs) (10 to 50% of all colds), coronaviruses (10 to 15% of all colds), and influenza viruses (5 to 15% of all colds) (1, 2). Globally, RVs are the primary cause of acute upper respiratory tract infections. Although most such infections are mild, RVs can also replicate in the lower respiratory tract and exacerbate the severity of conditions such as chronic respiratory disease and asthma (3,–7).
Rhinoviruses are nonenveloped, positive-sense, single-stranded RNA viruses belonging to the genus Enterovirus of the family Picornaviridae that infect epithelial cells that line the posterior nasopharynx (5). Like other positive-sense single-stranded RNA viruses, rhinoviruses encode an RNA-dependent RNA polymerase that lacks the ability to proofread and repair mismatches. The mutation rate has been estimated to range from 10−3 to 10−5 mutations per nucleotide per genome replication event (8). At ~7,200 nucleotides long, the rhinovirus genome would be expected to accumulate roughly 1 mutation per replication cycle.
Unlike other respiratory tract viruses such as influenza viruses (Orthomyxoviridae), rhinovirus lineages are relatively stable in the sense that they are not serially replaced over time by newer, more fit serotypes (9). This group has accumulated a large diversity of genetically distinct lineages and now includes more than 165 recognized serotypes. Epidemiological studies have documented as many as 32 serotypes cocirculating in a given community (10,–13).
These serotypes do not appear to turn over in predictable patterns during “the cold season,” and there are no data to suggest that they differ much in long-term average fitness; in particular, the major and minor serotype groups appear to be equally successful (14). In this respect, RVs contrast strikingly with other viruses such as influenza virus that are serially replaced each year (15). Rhinovirus infections can be asymptomatic and typically induce temporary, limited immunity (14), whereas influenza virus infections are often severe and induce long-term immunity (16). Attempts to predict which RV serotypes will become quantitatively more prevalent in future years have not been successful, and the extreme antigenic diversity of rhinoviruses has presented formidable barriers to the development of effective vaccines with broad epitope coverage (17).
Rhinovirus has been classified into three species: RV-A (80 serotypes), RV-B (32 serotypes), and RV-C (55 serotypes). Serotype and species classifications are determined primarily by nucleotide sequence identity (18, 19). The serotypes within RV-A and RV-B have been classified into major and minor groups based on their cell receptor specificity. Serotypes that use intercellular adhesion molecule 1 (ICAM-1) as a receptor (68 of the RV-A serotypes and all RV-B serotypes) belong to the major receptor group. The remaining 12 RV-A serotypes use members of the low-density lipoprotein (LDL) receptor family and belong to the minor receptor group (20,–22). RV-C serotypes bind to CDHR3, which is a member of the cadherin family and is involved in susceptibility to childhood asthma (23, 24).
ICAM-1 is an important cell adhesion molecule that facilitates cellular communication (25). It is a transmembrane glycoprotein in the immunoglobulin superfamily and is expressed on epithelial cells such as those found in the adenoids and respiratory system and on a variety of immune cells (26). The expression of ICAM-1 on macrophages and dendritic cells is upregulated by inflammatory cytokines induced by rhinovirus infection (27). Although rhinoviruses cannot infect white blood cells, virus attachment makes these cells less likely to report the infection to cells required to make antibodies and immune memory (27). ICAM-1 binds to amino acids belonging to the RV capsid proteins VP1, VP2, and VP3, which line the floor of the cleft or “canyon,” which is a prominent feature on the surface of the capsid (20, 28).
Members of the LDL receptor family are involved in binding and transporting ligands with a variety of biological functions and are expressed on a diversity of cell types (29). The LDL receptor used by minor-group rhinoviruses binds around the top of the star-shaped mesa of the 5-fold axis and is thought to directly contact residues on the BC and HI loops of VP1 (22). LDL receptors are structurally and functionally unrelated to ICAM-1 (30) and are not known to interfere with immune responses (27, 31). Previous studies examining antibody responses in naturally infected human populations (11) and T-cell immunogenic responses in murine lymphoid dendritic cells (25) demonstrate that minor-group viruses are more effective than major-group viruses in stimulating an immune response.
The greater immunogenicity of minor-group viruses might be expected to put them at an evolutionary disadvantage. Nonetheless, they have evolved independently three times (32), although recombination could have played a role generating these polyphyletic relationships (33). This suggests that the evolutionary change from major- to minor-group functionality might be relatively easy to accomplish and that the minor-group phenotype, although immunogenic, has a long-term average fitness equal to that of the major-group phenotype. Longitudinal studies of rhinovirus epidemiology in the 1960s, 1970s, and 2000s documented intermediate frequencies of minor-group infections over time and space (Fig. 1) (10,–13, 34, 35; W. M. Lee, personal communication). The fraction of minor-group rhinoviruses in the four United Kingdom samples is significantly larger than in the five North American samples (P < 0.001, t test), but there is no evidence for seasonal trends (12). None of these data shed light on the mechanisms influencing minor-group prevalence or transmission (36).
Minor-group viruses have a strictly conserved lysine residue in the HI loop of the VP1 capsid protein that is essential but not sufficient for attachment to LDL receptors, as several “K-type” major-group viruses also have a lysine in the same position on VP1 but cannot use LDL receptors for cell entry (37). The amino acid residues within the LDL receptor footprint are primarily basic. In major-group viruses, residues located at similar positions tend to be more acidic (37, 38), implying that electrostatic interactions influence receptor-binding kinetics. Previous studies have reported different selection pressures acting on major- versus minor-group viruses (32, 39). Positive stabilizing selection (purifying selection that results in amino acids with similar biochemical properties) accounted for 93% of residues under selection in major-group viruses, while positive destabilizing selection (positive Darwinian selection that results in amino acids with different biochemical properties) dominated the evolution of minor-group viruses (78%) (32). In addition to the cell receptor footprint and antigenic sites, positive selection was inferred in some regions of the viral proteins for which the functional significance remains unknown.
The present study asks whether major- and minor-group viruses show distinct patterns of genetic change. We used previously reported data and the prototypic serotypes for major- and minor-group viruses within RV-A to test the following hypotheses: (i) minor-group serotypes elicit a stronger immune response than do major-group serotypes, and therefore, minor-group lineages should evolve more rapidly at antigenic sites; (ii) if the entire viral capsid surface interacts with the immune system rather than only at discrete antigenic sites, minor-group serotypes should evolve more rapidly at amino acid positions that are exposed on the surface of the intact capsid; and (iii) assuming that VP1 is the immunodominant protein for both major- and minor-group serotypes (40), a larger number of amino acid substitutions should occur in VP1 than in VP2 and VP3, and this difference should be more pronounced in minor-group viruses.
Phylogenetic relationships among sequences (Fig. 2) are similar to those described previously (32, 41,–43). Small differences in topology probably result from the use of maximum likelihood (ML) methods in the present study and neighbor-joining methods in previous studies. The 12 minor-group sequences cluster into three monophyletic clades.
Infections caused by minor-group viruses stimulated an antibody response significantly more often (P < 0.0001) than those caused by major-group viruses (Fig. 3).
More than 9,200 nucleotide substitutions were detected on the included branches of the phylogeny; 1,655 of these substitutions are nonsynonymous, and 7,590 are synonymous. Most of the nonsynonymous substitutions (1,307; 79%) occur in exposed codons, but most of the synonymous substitutions (4,188; 55%) occur in buried codons, as expected, since 57% of the 781 codons are buried (Table 1). The much higher rate of nonsynonymous substitutions at exposed sites is extremely highly significant (χ2 = 634.1; P = 0). Major- and minor-group lineages show this pattern, but it is twice as strong in the minor group (ϕ2 = 0.111) as in the major group (ϕ2 = 0.055) (Table 1). This difference between the groups illustrates our main finding: an interaction between exposure status and cell receptor group, with a relative excess of nonsynonymous substitutions at exposed codons in the minor group (Fig. 4A).
The basic generalized linear model (GLM) analysis shows that this pattern holds up, controlling for variation among the three genes (P = 0.006 for the exposure-group interaction) (Table 2), and that synonymous sites do not show any significant gene or interaction effects (Table 2 and Fig. 4B). In this analysis, branch length effects are captured so efficiently by group (M versus m) that the inclusion of synonymous substitutions (syn) as a covariate adds no significant explanatory power (P = 0.11) (not reported in Table 2). In the random-effects analysis (generalized linear mixed-effects regression [GLMER]), syn has a highly significant effect because group, which partitions branches into just two categories, cannot capture the branch length variation among 19 cladelets. With synonymous substitutions included as a covariate, the interaction between exposure status and group remains significant (Table 2). Analyses with antigenic rather than exposed codons show similar trends but no significant effects owing to the small numbers of antigenic sites (results not shown).
The relative excess of nonsynonymous changes in minor-group lineages could be caused by an increased substitution rate at exposed codons in the minor group (as we predicted) or by a reduced nonsynonymous substitution rate at buried codons in the major group. To distinguish between these possibilities, we plotted the ratio of nonsynonymous to synonymous changes at exposed and buried codons for the major and minor groups (Fig. 4C and andD).D). At exposed codons, minor-group viruses have a significantly higher ratio for all three genes (chi-square tests with Bonferroni correction to reduce the probability of false-positive type I errors) (Table 3). However, major- and minor-group viruses show no significant differences at buried codons (Table 3), implying that the relative excess of nonsynonymous changes in minor-group viruses at exposed codons is not due to a reduced rate of nonsynonymous substitutions at buried codons in major-group viruses.
The identification of exposed amino acids by the rolling model was based on capsid protein structures for a minor-group virus, RV-A2. If some of these codons are not exposed on major-group viruses, this could bias our analyses toward the detection of more changes at exposed codons in minor-group viruses. To test this possibility, the joint characterization divides codons into four categories, for all three capsid genes combined, with the following numbers: 449 codons identified as being buried in both the rolling and structural models, 111 codons identified as being exposed in both the rolling and structural models (i.e., the rolling+structural or R+S model of exposed sites); 213 codons identified as being exposed in the rolling model and buried in the structural (i.e., the rolling-only or R model); and 5 codons identified as being buried in the rolling model and exposed in the structural model.
To test whether the 213 codons identified as being exposed only by the rolling-only (R) model show a disproportionate number of changes in minor-group viruses, we used GLM to model the numbers of substitutions as a function of group, gene, and three classes of sites: buried (449 codons), exposed by the rolling-only model (213 codons), and exposed by both models (111 codons). The five codons identified by only the structural model are not included in this analysis due to the small sample size. Both the rolling-only model and the rolling+structural model show an excess of amino acid changes at exposed sites but with a much larger coefficient for the codons identified by the rolling+structural model (Table 2). The magnitudes and significance of their interactions with the minor group are nearly identical.
Within the three protein subunits, VP3 and VP1 show excess changes at buried codons and codons identified as being exposed by the rolling-only (R) model but not by the rolling+structural (R+S) model. The ratio of nonsynonymous to synonymous changes is consistently higher in minor-group viruses for all three genes and for both types of exposed codons (Fig. 5), with individually significant differences in VP2 and VP1 for exposed codons identified by the R model and in VP1 for exposed codons identified by the R+S model (Table 3). The numbers of synonymous substitutions show small but significant excesses at codons identified as being exposed by both the R model and the R+S model (Table 2 and Fig. 4D).
To address the question of whether these effects occur throughout the major and minor groups or whether they arise from very strong effects in just a few lineages, we included the 19 cladelets (Fig. 2) as fixed effects and then as random effects in the generalized linear models. When analyzed as fixed effects, the minor-group clades (c32, c38, and c43) have the three highest coefficients for interaction with exposure status, indicating a consistently greater increase of the nonsynonymous substitution rate at exposed sites (Table 1). Treating clades as random effects yields similar results, although c38 (which includes only RV-A1A and RV-A1B) regresses toward the mean (Table 2). The significance and effect sizes of the interaction between group membership and exposure status for nonsynonymous substitutions remain nearly identical to those in the original GLM model, although the main effect of group is no longer significant in any model (Table 2). Exposure status explains a small portion of the variance in synonymous substitutions in the mixed-effects analysis, with the effect of group again no longer being significant (Table 2).
Figure 6 summarizes nonsynonymous and synonymous changes along the lengths of the VP2, VP3, and VP1 genes. The overall patterns of substitution are similar for both the major (Fig. 6A) and minor (Fig. 6B) cell receptor groups. The fitted curve shows that the major and minor groups tend to have elevated rates of substitution in the same genomic locations and that the peaks are generally well aligned with the known antigenic sites in the three proteins, although substitutions in VP1 fail to track exposed and buried regions in major-group viruses.
Major-group and minor-group rhinoviruses evolve at different tempos and with other thematic differences that may reflect the paths that they take to cell entry. Minor-group viruses enter cells after binding to an LDL receptor. This method of entering cells does not provide a way to downregulate the immune response, in contrast to the method used by major-group viruses, which bind to ICAM-1 (4). Minor-group viruses are more effective than major-group viruses in stimulating T-cell responses in vitro (25). We hypothesized that as a consequence, minor-group viruses would induce stronger antibody responses and thereby experience higher rates of amino acid substitutions at sites exposed to the immune system.
We considered three ways to characterize sites as being exposed to the immune system: (i) experimentally determined antigenic sites, (ii) sites determined to lie on the surface of the capsid protein of a minor-group rhinovirus based on proximity to the surface (rolling model), and (iii) sites determined to lie at the surface of a major-group rhinovirus based on a previously determined structure (structural model). At the 34 known antigenic sites, we found relatively large numbers of amino acid substitutions in both cell receptor groups, compared to nonantigenic sites, but the interaction between antigenicity and minor-group membership was not significant. The rolling model identified larger numbers of potentially antigenic sites. This approach identified 327 exposed sites, including all 34 sites previously identified as being antigenic. At these exposed sites, the amino acid substitution rate is increased, with a highly significant interaction between exposure status and cell receptor group and a greater increase in the minor group (Table 2 and Fig. 4A). The structural model (based on RV-A16 ) identifies fewer than half as many exposed sites, only five of which are not identified by the rolling model. These sites show an even higher nonsynonymous substitution rate than those identified only by the rolling model, and the interaction with the minor group is of a nearly equal magnitude.
These differences occur consistently throughout the phylogeny and do not result from large differences between just a few major-group and minor-group clades. We separately scored the synonymous and nonsynonymous substitutions in 16 major-group and 3 minor-group clades of roughly equal depths and treated clade identity as a random effect in the generalized linear mixed-model analyses. The interaction of the minor group with exposed sites remains significant and of a nearly identical magnitude (Table 2), and the two larger minor-group clades (c32 and c43) (Fig. 2) show the largest excess of substitutions at exposed sites. The small minor-group clade (c38) shows a large excess when treated as a fixed effect but regresses toward the mean of the full ensemble in the mixed-model analysis. Thus, the interaction between group and site occurs throughout the tree.
The relatively higher rate of nonsynonymous substitutions at exposed sites in minor-group viruses could be caused either by an absolutely higher nonsynonymous rate at exposed sites in minor-group viruses or by a lower nonsynonymous rate at buried sites in major-group viruses. The ratio of nonsynonymous to synonymous changes at these sites argues strongly for the first explanation, the one consistent with immune escape. If the nonsynonymous substitution rates at buried sites were lower in major- than in minor-group lineages, we would expect the ratio of nonsynonymous to synonymous substitutions (dN/dS ratio) at buried sites to also be lower in major-group lineages. However, where major- and minor-group lineages differ, the ratio of nonsynonymous to synonymous changes is higher in the major group.
Analysis of substitution rates at synonymous sites reveals a weaker but significant and consistent excess of changes at exposed codons in both major- and minor-group viruses and in all three capsid proteins. Evidence for weak positive and negative (purifying) selection on synonymous mutations has been found in a variety of studies, including one on vesicular stomatitis virus (45). A number of possible explanations have been proposed, including preferences for translationally “optimal” synonymous codons, effects on mRNA secondary structure and stability (46), and technical inaccuracy in estimating the numbers of synonymous and nonsynonymous substitutions in codons with multiple substitutions on a branch.
The current hypothesis for picornavirus evolution, and rhinovirus evolution in particular, is that most of the genome is subject to purifying selection, with positive selection predominantly being restricted to antigenic sites and to certain regions within the nonstructural proteins 2A protease, 3C protease, and 3D polymerase (32, 42, 47, 48). We found that both major-group and minor-group viruses fix substitutions more frequently at many sites that are exposed on the surface of the capsid and not just those previously identified as being antigenic; this finding supports the hypothesis that the entire capsid surface has the potential to be antigenic (49).
Rhinovirus antigenic sites were identified by using monoclonal antibodies harvested from experimentally infected BALB/c mice that had been exposed to HeLa cell lines infected with RV (50,–52). The highly polymorphic major histocompatibility complex (MHC) determines which peptides will be recognized as being foreign by the immune system (53). Different MHC molecules within an individual, among individuals in a population, and among species will present different regions of a given protein, leading to different immune responses (53). Thus, the antigenic sites identified in a mouse model might plausibly differ from those that would be identified in another species, such as humans. The evolutionary evidence developed here suggests that the huge immunological diversity of the global human population collectively detects a large fraction of the amino acids displayed on the surface of the rhinovirus capsid.
The cell receptor footprints of major- and minor-group viruses differ, but this difference does not appear to explain why minor-group viruses are less well conserved at exposed amino acid positions. The cellular receptor footprint covers only a small region of the capsid surface in both groups. In major-group viruses, ICAM-1 binds to amino acid residues belonging to VP2, VP3, and VP1 that line the interior of the canyon (20). Smith and colleagues demonstrated previously that although antibodies are able to attach to residues within the canyon in the same region where ICAM-1 binds, residues in this region are conserved, and mutations were shown to be deleterious (54). In contrast, minor-group viruses use members of the LDL receptor family to bind to amino acids located on the BC and HI loops in VP1. Successful attachment has been shown to depend on the presence of a conserved amino acid (K224 in RV-A2) that is located in the HI loop and on an overall basic charge on the amino acids within the receptor footprint (22, 38). Members of the LDL receptor family have multiple repeated binding domains, and different viruses can use different repeats to attach to the receptor (55,–57). This diversity in available LDL-binding domains suggests that minor-group viruses may be able to accumulate a greater number of substitutions at exposed residues within the cellular footprint to escape immune detection while maintaining access to their cellular receptor. Thus, reduced constraint in LDL attachment might play a role in elevating the substitution rates in this region, but only a few sites would be affected in this way.
Finally, we hypothesized that a higher rate of amino acid substitutions should occur in the immunodominant capsid protein VP1 (52) and that this increased substitution rate should be accentuated in minor-group viruses. We found significantly more nonsynonymous substitutions per codon in VP1 and VP3 only at buried sites and no interaction between the virus receptor type and capsid protein subunits, providing no support for a specific effect in minor-group viruses (Table 2). The detailed substitution pattern in minor-group VP1 shows a peak in the shared antigenic region from positions 83 to 100, which is missing in the major group (Fig. 6), indicating that differences in patterns of substitution might occur over regions smaller than capsid proteins.
Beginning with the known immunological difference between major- and minor-group rhinoviruses, we found evidence that minor-group viruses elicit a stronger host immune response and evolve more rapidly at exposed amino acid sites. Rapid evolution might help to maintain minor-group viruses in the population, although we suspect that these serotypes have other selective advantages, through either increased transmission or more effective intrahost competition. On the continuum from mild, highly diverse, and weakly immunogenic rhinoviruses to severe, less diverse, and highly immunogenic influenza viruses, minor-group rhinoviruses may be more “influenza-like” than major-group viruses (58). As additional studies characterize the recently discovered RV-C, the patterns described here may provide a way to determine whether rhinoviruses are currently continuing to stably maintain a diversity of serotypes and strategies or whether they are evolving generally increased or decreased virulence.
In the Seattle Watch studies conducted from 1965 to 1969 and from 1975 to 1979, Fox and colleagues (11) reported the proportion of people infected with different RV serotypes who developed an antibody response. We classified these serotypes into major and minor groups within RV-A and used a chi-square test to assess whether antibody production differed significantly between infections caused by major- and minor-group viruses using data reported previously (see Table 5 in reference 11).
Eighty-four complete nucleotide sequences for the RV-A capsid genes VP1, VP2, and VP3 were downloaded from GenBank. To maintain the correct reading frame during alignment, each gene was converted to amino acids by using the AlignmentHelper program (59), which preserves the original nucleotide sequence, and then aligned separately at the amino acid level. Alignments were conducted by using default parameters in MUSCLE (60) and by using the G-INS-I refinement strategy and default parameters in MAFFT (61). The amino acid alignments were then converted back to nucleotides by using AlignmentHelper. MUSCLE and MAFFT generated similar alignments, and the gene tree topologies based on these alignments were identical. The aligned sequences were concatenated in the standard genomic order VP2-VP3-VP1.
The aligned, concatenated nucleotide sequences were used to estimate a consensus phylogenetic tree. Model selection was determined by using the Akaike information criterion (62) as implemented in jModeltest 0.1 (63). The best-fit model of evolution for all genes was the general time-reversible model with an estimated proportion of invariant sites and a gamma distribution of rates (GTR+I+G). Phylogenies were estimated by PhyML (64). Branch support values were estimated by using nonparametric bootstrapping with 1,000 pseudoreplications.
Trees generated from analyses of the full set of 84 RV-A sequences were used to identify 57 sequences that include the three minor-group clades and their closest major-group relatives. This subset of sequences was used in the analyses described below. The major-group serotype RV-A89 was used to root the tree and establish character state polarity because of its sister taxon relationship with the 56 members of the ingroup (Table 4 and Fig. 2).
Nucleotide substitutions on all branches of the 57-sequence genealogy were inferred by maximum likelihood reconstruction of its history. Statistical analyses were restricted to substitutions within the 3 minor-group clades (Fig. 2, thick branches) and 16 major-group clades with similar maximum and average branch depths. The roots of these 19 analyzed “cladelets” are indicated in Fig. 2 by clade names in the form of “cXX,” where XX is a number derived from the numbering of internal nodes in the tree. To be included, a cladelet's ancestral node was required to be no more distant than 0.22 expected substitutions per site from the tip nodes (modern sequences), in an independent estimate of the tree that was derived under molecular-clock assumptions by BEAST (65). This tree was topologically identical to the tree shown in Fig. 2 except for the relationships among a few of the deepest nodes. The limit of 0.22 substitutions/site includes one major-group clade (c6), which is slightly deeper than the two deepest minor-group clades (c32 and c43), and one major-group clade (c55), which is slightly shallower than the shallowest minor-group clade (c38).
By limiting the analysis to substitutions occurring on short, recent branches in both the major and minor groups, we reduced the risks that artifactual differences might arise between the groups and also that “overprinting” might hide significant numbers of nucleotide substitutions. (Synonymous sites begin to approach saturation in the deeper parts of the tree.) Ancestral nucleotide sequences were reconstructed at each internal node by using the dnaml program in PHYLIP 3.695 (66) and then translated by a custom Python script that separately scored synonymous and nonsynonymous (amino-acid-changing) substitutions for each codon along each included branch. Sites with ambiguous nucleotide states (e.g., “Y” for pyrimidine) were dropped from the analysis on branches affected by the ambiguity.
The three-dimensional structure of RV-A2 (Protein Data Bank [PDB] accession number 3DPR) was used as a reference for the general capsid structure. The PyMOL molecular visualization program (version 126.96.36.199; Schrödinger LLC, New York) was used to reconstruct the complete icosahedral capsid using the rotation matrices in the PDB data file. Surface area calculations were limited to the exposed surface of the virus by the following procedure. A single protomer made up of VP2, VP3, and VP1 and the minimal number of additional protein subunits that surround the central protomer were identified and saved as a new molecule. To visualize the exterior and interior surfaces, the accessible surface area was drawn with a 5-Å probe radius, which is approximately the size of an amino acid residue. The “rotate” command was used to rotate the molecule and modify the model coordinates such that the exterior and interior surfaces of the central protomer were perpendicular to the y axis. The exterior and interior surfaces of the virus capsid could then be distinguished by their y coordinates. A cutoff value was determined by inspection of the structure, looking for residues that were about halfway between the inner and outer surfaces. A Python script written by D. P. Goldenberg was used to identify the y coordinate values of all residues in the central protomer that were above the designated cutoff value. The accessible surface areas for the VP2, VP3, and VP1 proteins within the central protomer were calculated by using the algorithm described previously by Lee and Richards (68), as implemented in the ACCESS program by T. J. Richmond, and the group radii described previously by Chothia (69). The accessible surface area was calculated by modeling a molecule with a probe radius of 1.4 Å “rolling” over the surface of the viral capsid. All amino acids that came into contact with the rolling molecule were identified as being exposed, and we refer to this as the rolling model.
We used a three-dimensional structure of major-group RV-A16 (44) to create an alternative characterization of exposed and buried sites, which we call the structural model. The union of sites exposed in either the rolling or structural model defines an inclusive “combined” characterization of exposed sites. The contrasting “joint” characterization disaggregates sites into four categories based on which model(s) identifies them as being exposed (see Results).
The location of amino acid residues within antigenic sites in major- and minor-group viruses was based on descriptions reported previously (50, 51). Major-group rhinoviruses have four primary neutralizing immunogenic sites (NIm-IA, NIm-IB, NIm-II, and NIm-III) that have been mapped to protruding regions on the external capsid proteins VP1, VP2, and VP3 (50). Minor-group rhinoviruses have three neutralizing immunogenic sites, known as site A, site B, and site C, that are located in the vicinity of the NIm sites (51). Due to the relatively small number of sites identified as being antigenic, we base our analyses on a combined characterization that looks at the union of all identified immunogenic sites.
The raw data are counts of synonymous and nonsynonymous (amino-acid-changing) nucleotide substitutions per codon on each branch in the included parts of the tree. Branches are classified as occurring within major- or minor-group clades. Our basic statistical model analyzes the sums of the synonymous and nonsynonymous substitutions at each codon position across all major-group branches and across all minor-group branches. Codons are classified by gene and by surface exposure of the encoded amino acid. We used generalized linear models (glm in R, with the quasipoisson family of link functions) to explain the variation in the summed substitutions using various combinations of the explanatory factors of interest. For example, the most basic model reported here is of the form glm[aax ~ GUE * (gene + group), family = quasipoisson,…]. Here aax is the summed number of nonsynonymous substitutions at each codon position; GUE is the combined characterization of surface exposure, with state E for exposed or state B for buried codons; the gene is VP1, VP2, or VP3; and the group is M for major or m for minor. Synonymous substitutions (syn) were analyzed using the same framework.
We began by testing two-way and three-way interactions and used backward model selection to remove nonsignificant terms, with a P value of <0.05 being the threshold for statistical significance. Our central question was whether the interactions between exposure to the immune system and major-group versus minor-group identity (GUE * group in the model described above) would be significant and large. Owing to their greater immunogenicity, we predicted that minor-group rhinoviruses would show a large increase of amino acid substitution rates at exposed nonsynonymous sites (relative to their rates at buried nonsynonymous sites).
The number of substitutions is expected to be proportional to the branch length, so a significant main effect of group is expected even under the null hypothesis, owing to the much greater total branch length of the major-group clades. One way to remove this “nuisance” effect of phylogeny is to normalize the nonsynonymous substitutions by the synonymous substitutions on the same branches and analyze the resulting dN/dS ratios. This approach requires averaging over many codon positions, and it depends on the assumption that synonymous mutations are selectively neutral. An alternative is to include the number of synonymous substitutions per codon as a covariate in the model, thereby removing the synonymous component of the branch length effect statistically without averaging over codons or taking ratios of the Poisson-distributed primary events (i.e., small numbers of nucleotide substitutions at individual codons). We did this in some analyses, although for the present purposes, it is not necessary because the main effects of group and/or cladelet capture the nonsynonymous component of the branch length effect. To summarize the overall patterns, we constructed 2-by-2 contingency tables of synonymous and nonsynonymous substitutions categorized by group membership, with separate tables for each combination of gene and exposure status, and carried out chi-square tests. In all of our analyses, the effects of central interest are the strengths of the interactions between group and exposure and not the magnitudes of the main effects as such.
We asked whether the increased substitution rate at exposed minor-group nonsynonymous sites could be seen in each of the three minor-group clades or whether it resulted from the influence of one very-fast-evolving lineage or clade. To address this question, branches were categorized by their cladelet memberships, which were added as a fixed effect in the GLM analysis and as a random effect in a mixed-model analysis (using glmer in R with the Poisson family).
This work was supported by a 21st Century Science Initiative grant from the James S. McDonnell Foundation and the Modeling the Dynamics of Life Fund of the University of Utah to F.R.A. and by the NSF-DEB (grant 1051491) to J.S. The funding agencies had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank David P. Goldenberg for valuable advice and for generating the accessible-surface-area calculations and Chae Kyung Yoo for assistance with developing Python scripts. We also thank Duke S. Rogers for reviewing earlier versions of this paper and the sLaM and eKoSystems groups for useful discussion. Cole Porter invented the subtitle in 1944, anticipating recognition of the minor group by 4 decades (70).
N.L.-R. and F.R.A. conceived of the study, N.L.-R. and J.S. carried out the phylogenetic analyses and ancestral sequence reconstructions, F.R.A. and J.S. carried out statistical analyses, and all three authors wrote the manuscript.