The debate of genomic correlations between sequence conservation, protein connectivity, gene essentiality and gene expression, has generated a number of new hypotheses that are challenging the classical framework of molecular evolution. For instance, the translational selection hypothesis claims that the determination of the rate of protein evolution is the protein stability to avoid the misfolding toxicity. In this short article, we propose that gene pleiotropy, the capacity for affecting multiple phenotypes, may play a vital role in molecular evolution. We discuss several approaches to testing this hypothesis.
This article was reviewed by Dr Eugene Koonin, Dr Arcady Mushegian and Dr Claus Wilke.
One of the main objectives of the molecular evolution and evolutionary systems biology field is to reveal the underlying principles that dictate protein evolutionary rates. Several studies argue that expression abundance is the most critical component in determining the rate of evolution, especially in unicellular organisms. However, the expression breadth also needs to be considered for multicellular organisms.
In the present paper, we analyzed the relationship between the two expression variables and rates using two different genome-scale expression datasets, microarrays and ESTs. A significant positive correlation between the expression abundance (EA) and expression breadth (EB) was revealed by Kendall's rank correlation tests. A novel random shuffling approach was applied for EA and EB to compare the correlation coefficients obtained from real data sets to those estimated based on random chance. A novel method called a Fixed Group Analysis (FGA) was designed and applied to investigate the correlations between expression variables and rates when one of the two expression variables was evenly fixed.
In conclusion, all of these analyses and tests consistently showed that the breadth rather than the abundance of gene expression is tightly linked with the evolutionary rate in multicellular organisms.
Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account.
We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding = 0.90 with five parameters.
The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations.
The “oscillation hypothesis” has been proposed as a general explanation for the exceptional diversification of herbivorous insect species. The hypothesis states that speciation rates are elevated through repeated correlated changes – oscillations – in degree of host plant specificity and geographic range. The aim of this study is to test one of the predictions from the oscillation hypothesis: a positive correlation between diet breadth (number of host plants used) and geographic range size, using the globally distributed butterfly subfamily Nymphalinae. Data on diet breadth and global geographic range were collected for 182 Nymphalinae butterflies species and the size of the geographic range was measured using a GIS. We tested both diet breadth and geographic range size for phylogenetic signal to see if species are independent of each other with respect to these characters. As this test gave inconclusive results, data was analysed both using cross-species comparisons and taking phylogeny into account using generalised estimating equations as applied in the APE package in R. Irrespective of which method was used, we found a significant positive correlation between diet breadth and geographic range size. These results are consistent for two different measures of diet breadth and removal of outliers. We conclude that the global range sizes of Nymphalinae butterflies are correlated to diet breadth. That is, butterflies that feed on a large number of host plants tend to have larger geographic ranges than do butterflies that feed on fewer plants. These results lend support for an important step in the oscillation hypothesis of plant-driven diversification, in that it can provide the necessary fuel for future population fragmentation and speciation.
Independently evolving lineages mostly accumulate different changes, which leads to their gradual divergence. However, parallel accumulation of identical changes is also common, especially in traits with only a small number of possible states.
We characterize parallelism in evolution of coding sequences in three four-species sets of genomes of mammals, Drosophila, and yeasts. Each such set contains two independent evolutionary paths, which we call paths I and II. An amino acid replacement which occurred along path I also occurs along path II with the probability 50–80% of that expected under selective neutrality. Thus, the per site rate of parallel evolution of proteins is several times higher than their average rate of evolution, but still lower than the rate of evolution of neutral sequences. This deficit may be caused by changes in the fitness landscape, leading to a replacement being possible along path I but not along path II. However, constant, weak selection assumed by the nearly neutral model of evolution appears to be a more likely explanation. Then, the average coefficient of selection associated with an amino acid replacement, in the units of the effective population size, must exceed ~0.4, and the fraction of effectively neutral replacements must be below ~30%. At a majority of evolvable amino acid sites, only a relatively small number of different amino acids is permitted.
High, but below-neutral, rates of parallel amino acid replacements suggest that a majority of amino acid replacements that occur in evolution are subject to weak, but non-trivial, selection, as predicted by Ohta's nearly-neutral theory.
This article was reviewed by John McDonald (nominated by Laura Landweber), Sarah Teichmann and Subhajyoti De, and Chris Adami.
Non-independent evolution of amino acid sites has become a noticeable limitation of most methods aimed at identifying selective constraints at functionally important amino acid sites or protein regions. The need for a generalised framework to account for non-independence of amino acid sites has fuelled the design and development of new mathematical models and computational tools centred on resolving this problem. Molecular coevolution is one of the most active areas of research, with an increasing rate of new models and methods being developed everyday. Both parametric and non-parametric methods have been developed to account for correlated variability of amino acid sites. These methods have been utilised for detecting phylogenetic, functional and structural coevolution as well as to identify surfaces of amino acid sites involved in protein-protein interactions. Here we discuss and briefly describe these methods, and identify their advantages and limitations.
Molecular coevolution; Mutual Information Content; parametric methods; non-parametric methods; protein-protein interactions
Whole-genome scans for positive Darwinian selection are widely used to detect evolution of genome novelty. Most approaches are based on evaluation of nonsynonymous to synonymous substitution rate ratio across evolutionary lineages. These methods are sensitive to saturation of synonymous sites and thus cannot be used to study evolution of distantly related organisms. In contrast, indels occur less frequently than amino acid replacements, accumulate more slowly, and can be employed to characterize evolution of diverged organisms. As indels are also subject to the forces of natural selection, they can generate functional changes through positive selection. Here, we present a new computational approach to detect selective constraints on indel substitutions at the whole-genome level for distantly related organisms. Our method is based on ancestral sequence reconstruction, takes into account the varying susceptibility of different types of secondary structure to indels, and according to simulation studies is conservative. We applied this newly developed framework to characterize the evolution of organisms of the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum. The superphylum contains organisms with unique cell biology, physiology, and diverse lifestyles. It includes bacteria with simple cell organization and more complex eukaryote-like compartmentalization. Lifestyles range from free-living organisms to obligate pathogens. In this study, we conduct a whole-genome level analysis of indel substitutions specific to evolutionary lineages of the PVC superphylum and found that indels evolved under positive selection on up to 12% of gene tree branches. We also analyzed possible functional consequences for several case studies of predicted indel events.
selection; indel substitutions; PVC superphylum
Complexity of biological function relies on large networks of interacting molecules. However, the evolutionary properties of these networks are not fully understood. It has been shown that selective pressures depend on the position of genes in the network. We have previously shown that in the Drosophila insulin/target of rapamycin (TOR) signal transduction pathway there is a correlation between the pathway position and the strength of purifying selection, with the downstream genes being most constrained. In this study, we investigated the evolutionary dynamics of this well-characterized pathway in vertebrates. More specifically, we determined the impact of natural selection on the evolution of 72 genes of this pathway. We found that in vertebrates there is a similar gradient of selective constraint in the insulin/TOR pathway to that found in Drosophila. This feature is neither the result of a polarity in the impact of positive selection nor of a series of factors affecting selective constraint levels (gene expression level and breadth, codon bias, protein length, and connectivity). We also found that pathway genes encoding physically interacting proteins tend to evolve under similar selective constraints. The results indicate that the architecture of the vertebrate insulin/TOR pathway constrains the molecular evolution of its components. Therefore, the polarity detected in Drosophila is neither specific nor incidental of this genus. Hence, although the underlying biological mechanisms remain unclear, these may be similar in both vertebrates and Drosophila.
evolutionary divergence; insulin signaling pathway; network topology; selective constraint; network evolution
Recently, a positive correlation between basal leukocyte counts and mating system across primates suggested that sexual promiscuity could be an important determinant of the evolution of the immune system. Motivated by this idea, we examined the patterns of molecular evolution of 15 immune defense genes in primates in relation to promiscuity and other variables expected to affect disease risk. We obtained maximum likelihood estimates of the rate of protein evolution for terminal branches of the primate phylogeny at these genes. Using phylogenetically independent contrasts, we found that immunity genes evolve faster in more promiscuous species, but only for a subset of genes that interact closely with pathogens. We also observed a significantly greater proportion of branches under positive selection in the more promiscuous species. Analyses of independent contrasts also showed a positive effect of group size. However, this effect was not restricted to genes that interact closely with pathogens, and no differences were observed in the proportion of branches under positive selection in species with small and large groups. Together, these results suggest that mating system has influenced the evolution of some immunity genes in primates, possibly due to increased risk of acquiring sexually transmitted diseases in species with higher levels of promiscuity.
Disease risk; immunity genes; mating system; primates; sexual promiscuity
Altering a protein’s backbone through amino acid deletion is a common evolutionary mutational mechanism, but is generally ignored during protein engineering primarily because its effect on the folding-structure-function relationship is difficult to predict. Using directed evolution, enhanced green fluorescent protein (EGFP) was observed to tolerate residue deletion across the breadth of the protein, particularly within short and long loops, helical elements, and at the termini of strands. A variant with G4 removed from a helix (EGFPG4Δ) conferred significantly higher cellular fluorescence. Folding analysis revealed that EGFPG4Δ retained more structure upon unfolding and refolded with almost 100% efficiency but at the expense of thermodynamic stability. The EGFPG4Δ structure revealed that G4 deletion caused a beneficial helical registry shift resulting in a new polar interaction network, which potentially stabilizes a cis proline peptide bond and links secondary structure elements. Thus, deletion mutations and registry shifts can enhance proteins through structural rearrangements not possible by substitution mutations alone.
•Using directed evolution, the impact of amino acid deletion on EGFP is explored•Loops, helices, and strand termini are especially tolerant to amino acid deletion•A deletion mutant that enhances cellular production and fluorescence is identified•Structure reveals that a helical registry shift creates a new polar network
Using directed evolution, Arpino et al. examine the impact of amino acid deletion on EGFP and find that loops, helices, and strand termini are especially tolerant to amino acid deletion. Structural work provides a molecular explanation for this observation.
Recent results from Drosophila suggest that positive selection has a substantial impact on genomic patterns of polymorphism and divergence. However, species with smaller population sizes and/or stronger population structure may not be expected to exhibit Drosophila-like patterns of sequence variation. We test this prediction and identify determinants of levels of polymorphism and rates of protein evolution using genomic data from Arabidopsis thaliana and the recently sequenced Arabidopsis lyrata genome. We find that, in contrast to Drosophila, there is no negative relationship between nonsynonymous divergence and silent polymorphism at any spatial scale examined. Instead, synonymous divergence is a major predictor of silent polymorphism, which suggests variation in mutation rate as the main determinant of silent variation. Variation in rates of protein divergence is mainly correlated with gene expression level and breadth, consistent with results for a broad range of taxa, and map-based estimates of recombination rate are only weakly correlated with nonsynonymous divergence. Variation in mutation rates and the strength of purifying selection seem to be major drivers of patterns of polymorphism and divergence in Arabidopsis. Nevertheless, a model allowing for varying negative and positive selection by functional gene category explains the data better than a homogeneous model, implying the action of positive selection on a subset of genes. Genes involved in disease resistance and abiotic stress display high proportions of adaptive substitution. Our results are important for a general understanding of the determinants of rates of protein evolution and the impact of selection on patterns of polymorphism and divergence.
dN/dS; neutral theory; purifying selection; translational selection; recurrent hitchhiking
A fundamental observation of comparative genomics is that the distribution of evolution rates across the complete sets of orthologous genes in pairs of related genomes remains virtually unchanged throughout the evolution of life, from bacteria to mammals. The most straightforward explanation for the conservation of this distribution appears to be that the relative evolution rates of all genes remain nearly constant, or in other words, that evolutionary rates of different genes are strongly correlated within each evolving genome. This correlation could be explained by a model that we denoted Universal PaceMaker (UPM) of genome evolution. The UPM model posits that the rate of evolution changes synchronously across genome-wide sets of genes in all evolving lineages. Alternatively, however, the correlation between the evolutionary rates of genes could be a simple consequence of molecular clock (MC). We sought to differentiate between the MC and UPM models by fitting thousands of phylogenetic trees for bacterial and archaeal genes to supertrees that reflect the dominant trend of vertical descent in the evolution of archaea and bacteria and that were constrained according to the two models. The goodness of fit for the UPM model was better than the fit for the MC model, with overwhelming statistical significance, although similarly to the MC, the UPM is strongly overdispersed. Thus, the results of this analysis reveal a universal, genome-wide pacemaker of evolution that could have been in operation throughout the history of life.
A central concept of evolution is Molecular Clock according to which each gene evolves at a characteristic, near constant rate. Numerous studies support the Molecular Clock hypothesis in principle but also show that the clock is indeed very approximate. Genome-wide comparative analysis of phylogenetic trees described here reveals a distinct, more general feature of genome evolution that we called Universal Pacemaker. Under this model, when the rate of evolution changes, the change occurs synchronously in many if not all genes in the evolving genome. In other words, the relative rates of gene evolution remain constant across long evolutionary spans: if a gene is slow relative to the rest of the genes in the given lineage, it is always slow, and if it evolves fast, it is always fast. We show here that the Universal Pacemaker model fits the available data much better than the traditional Molecular Clock model. These findings are compatible with the previously observed accelerations and decelerations of evolution in individual lineages but we show that synchronous, genome-wide change of evolutionary rates is a global feature of genome evolution that appears to pervade the entire history of life.
Genes do not act in isolation but perform their biological functions within genetic pathways that are connected in larger networks. Investigation of nucleotide variation within genetic pathways and networks has shown that topology can affect the rate of protein evolution; however, it remains unclear whether a same pattern of nucleotide variation is expected within functionally similar networks and whether it may be due to similar or different biological mechanisms. We address these questions by investigating nucleotide variation in the context of the structure of the insulin/Tor-signaling pathway in Caenorhabditis, which is well characterized and is functionally conserved across phylogeny. In Drosophila and vertebrates, the rate of protein evolution is negatively correlated with the position of a gene within the insulin/Tor pathway. Similarly, we find that in Caenorhabditis, the rate of amino acid replacement is lower for downstream genes. However, in Caenorhabditis, the rate of synonymous substitution is also strongly affected by the position of a gene in the pathway, and we show that the distribution of selective pressure along the pathway is driven by differential expression level. A full understanding of the effect of pathway structure on selective constraints is therefore likely to require inclusion of specific biological function into more general network models.
network; aging; molecular evolution; gene expression; selection
We explore the ability of optimal foraging theory to explain the observation among marine bacteriophages that host range appears to be negatively correlated with host abundance in the local marine environment. We modified Charnov's classic diet composition model to describe the ecological dynamics of the related generalist and specialist bacteriophages φX174 and G4, and confirmed that specialist phages are ecologically favored only at high host densities. Our modified model accurately predicted the ecological dynamics of phage populations in laboratory microcosms, but had only limited success predicting evolutionary dynamics. We monitored evolution of attachment rate, the phenotype that governs diet breadth, in phage populations adapting to both low and high host density microcosms. Although generalist φX174 populations evolved even broader diets at low host density, they did not show a tendency to evolve the predicted specialist foraging strategy at high host density. Similarly, specialist G4 populations were unable to evolve the predicted generalist foraging strategy at low host density. These results demonstrate that optimal foraging models developed to explain the behaviorally determined diets of predators may have only limited success predicting the genetically determined diets of bacteriophage, and that optimal foraging probably plays a smaller role than genetic constraints in the evolution of host specialization in bacteriophages.
Motivation: In a nucleotide or amino acid sequence, not all sites evolve at the same rate, due to differing selective constraints at each site. Currently in computational molecular evolution, models incorporating rate heterogeneity always share two assumptions. First, the rate of evolution at each site is assumed to be independent of every other site. Second, the values of these rates are assumed to be drawn from a known prior distribution. Although often assumed to be small, the actual effect of these assumptions has not been previously quantified in the literature.
Results: Herein we describe an algorithm to simultaneously infer the set of n−1 relative rates that parameterize the likelihood of an n-site alignment. Unlike previous work (a) these relative rates are completely identifiable and distinct from the branch-length parameters, and (b) a far more general class of rate priors can be used, and their effects quantified. Although described in a Bayesian framework, we discuss a future maximum likelihood extension.
Conclusions: Using both synthetic data and alignments from the Myc, Max and p53 protein families, we find that inferring relative rather than absolute rates has several advantages. First, both empirical likelihoods and Bayes factors show strong preference for the relative-rate model, with a mean Δ ln P=−0.458 per alignment site. Second, the computed likelihoods and Bayes factors were essentially independent of the relative-rate prior, indicating that good estimates of the posterior rate distribution are not required a priori. Third, a novel finding is that rates can be accurately inferred even when up to ≈4 substitutions per site have occurred. Thus biologically relevant putative hypervariable sites can be identified as easily as conserved sites. Lastly, our model treats rates and tree branch-lengths as completely identifiable, allowing for the first time coherent simultaneous inference of branch-lengths and site-specific evolutionary rates.
Availability: Source code for the utility described is available under a BSD-style license at http://www.fernandes.org/txp/article/9/site-specific-relative-evolutionary-rates.
Supplementary information: Supplementary data is available at Bioinformatics online.
There is growing recognition that the elemental composition of genomes and proteins can be related to resource limitation. We examine the possibility that the elemental composition of nucleic acids and the amino acids (and proteins) they encode are correlated. We report a positive association between the stoichiometric ratio of N/C content of individual amino acids and their codons. Potentially, this is an outcome of chemical interactions between amino acids and anticodons that influenced the evolution of the genetic code. We also find a strong, positive relationship between N/C values of whole genomes and proteomes, across 94 prokaryotic species. This relationship is part of a spectrum in nitrogen versus carbon use across genomes and proteomes, which is correlated with genomic GC content. GC content is correlated positively with average nitrogen use, and negatively with average carbon use, across both genomes and proteomes.
Interacting proteins evolve at correlated rates, possibly as the result of evolutionary pressures shared by functional groups and/or coevolution between interacting proteins. This evolutionary signature can be exploited to learn more about protein networks and to infer functional relationships between proteins on a genome-wide scale. Multiple methods have been introduced that detect correlated evolution using amino acid distances. One assumption made by these methods is that the neutral rate of nucleotide substitution is uniform over time; however, this is unlikely and such rate heterogeneity would adversely affect amino acid distance methods. We explored alternative methods that detect correlated rates using protein-coding nucleotide sequences in order to better estimate the rate of nonsynonymous substitution at each branch (dN) normalized by the underlying synonymous substitution rate (dS). Our novel likelihood method, which was robust to realistic simulation parameters, was tested on Drosophila nuclear pore proteins, which form a complex with well-documented physical interactions. The method revealed significantly correlated evolution between nuclear pore proteins, where members of a stable subcomplex showed stronger correlations compared with those proteins that interact transiently. Furthermore, our likelihood approach was better able to detect correlated evolution among closely related species than previous methods. Hence, these sequence-based methods are a complementary approach for detecting correlated evolution and could be applied genome-wide to provide candidate protein–protein interactions and functional group assignments using just coding sequences.
coevolution; correlated evolution; rate correlation; protein interactions; nuclear pore
Transitions from cross- to self-fertilization are associated with increased genetic drift rendering weakly selected mutations effectively neutral. The effect of drift is predicted to reduce selective constraints on amino acid sequences of proteins and relax biased codon usage. We investigated patterns of nucleotide variation to assess the effect of inbreeding on the accumulation of deleterious mutations in three independently evolved selfing plants. Using high-throughput sequencing, we assembled the floral transcriptomes of four individuals of Eichhornia (Pontederiaceae); these included one outcrosser and two independently derived selfers of E. paniculata, and E. paradoxa, a selfing outgroup. The dataset included ~8000 loci totalling ~3.5 Mb of coding DNA.
Tests of selection were consistent with purifying selection constraining evolution of the transcriptome. However, we found an elevation in the proportion of non-synonymous sites that were potentially deleterious in the E. paniculata selfers relative to the outcrosser. Measurements of codon usage in high versus low expression genes demonstrated reduced bias in both E. paniculata selfers.
Our findings are consistent with a small reduction in the efficacy of selection on protein sequences associated with transitions to selfing, and reduced selection in selfers on synonymous changes that influence codon usage.
It is unknown whether patterns of human immunodeficiency virus (HIV)-specific T-cell responses during acute infection may influence the viral set point and the course of disease. We wished to establish whether the magnitude and breadth of HIV type 1 (HIV-1)-specific T-cell responses at 3 months postinfection were correlated with the viral-load set point at 12 months and hypothesized that the magnitude and breadth of HIV-specific T-cell responses during primary infection would predict the set point. Gamma interferon (IFN-γ) enzyme-linked immunospot (ELISPOT) assay responses across the complete proteome were measured in 47 subtype C HIV-1-infected participants at a median of 12 weeks postinfection. When corrected for amino acid length and individuals responding to each region, the order of recognition was as follows: Nef > Gag > Pol > Rev > Vpr > Env > Vpu > Vif > Tat. Nef responses were significantly (P < 0.05) dominant, targeted six epitopic regions, and were unrelated to the course of viremia. There was no significant difference in the magnitude and breadth of responses for each protein region with disease progression, although there was a trend of increased breadth (mean, four to seven pools) in rapid progressors. Correlation of the magnitude and breadth of IFN-γ responses with the viral set point at 12 months revealed almost zero association for each protein region. Taken together, these data demonstrate that the magnitude and breadth of IFN-γ ELISPOT assay responses at 3 months postinfection are unrelated to the course of disease in the first year of infection and are not associated with, and have low predictive power for, the viral set point at 12 months.
Previous statistical analyses have shown that amino acid sites in a protein evolve in a correlated way instead of independently. Even though located distantly in the linear sequence, the coevolved amino acids could be spatially adjacent in the tertiary structure, and constitute specific protein sectors. Moreover, these protein sectors are independent of one another in structure, function, and even evolution. Thus, systematic studies on protein sectors inside a protein will contribute to the clarification of protein function. In this paper, we propose a new algorithm BIFANR (Bi-factor Analysis Based on Noise-reduction) for detecting protein sectors in amino acid sequences. After applying BIFANR on S1A family and PDZ family, we carried out internal correlation test, statistical independence test, evolutionary rate analysis, evolutionary independence analysis, and function analysis to assess the prediction. The results showed that the amino acids in certain predicted protein sector are closely correlated in structure, function, and evolution, while protein sectors are nearly statistically independent. The results also indicated that the protein sectors have distinct evolutionary directions. In addition, compared with other algorithms, BIFANR has higher accuracy and robustness under the influence of noise sites.
An understanding of the relationship between the breadth and magnitude of T-cell epitope responses and viral loads is important for the design of effective vaccines. For this study, we screened a cohort of 46 subtype C human immunodeficiency virus type 1 (HIV-1)-infected individuals for T-cell responses against a panel of peptides corresponding to the complete subtype C genome. We used a gamma interferon ELISPOT assay to explore the hypothesis that patterns of T-cell responses across the expressed HIV-1 genome correlate with viral control. The estimated median time from seroconversion to response for the cohort was 13 months, and the order of cumulative T-cell responses against HIV proteins was as follows: Nef > Gag > Pol > Env > Vif > Rev > Vpr > Tat > Vpu. Nef was the most intensely targeted protein, with 97.5% of the epitopes being clustered within 119 amino acids, constituting almost one-third of the responses across the expressed genome. The second most targeted region was p24, comprising 17% of the responses. There was no correlation between viral load and the breadth of responses, but there was a weak positive correlation (r = 0.297; P = 0.034) between viral load and the total magnitude of responses, implying that the magnitude of T-cell recognition did not contribute to viral control. When hierarchical patterns of recognition were correlated with the viral load, preferential targeting of Gag was significantly (r = 0.445; P = 0.0025) associated with viral control. These data suggest that preferential targeting of Gag epitopes, rather than the breadth or magnitude of the response across the genome, may be an important marker of immune efficacy. These data have significance for the design of vaccines and for interpretation of vaccine-induced responses.
A frequent observation in molecular evolution is that amino-acid substitution rates show an index of dispersion (that is, ratio of variance to mean) substantially larger than one. This observation has been termed the overdispersed molecular clock. On the basis of in silico protein-evolution experiments, Bastolla and coworkers recently proposed an explanation for this observation: Proteins drift in neutral space, and can temporarily get trapped in regions of substantially reduced neutrality. In these regions, substitution rates are suppressed, which results in an overall substitution process that is not Poissonian. However, the simulation method of Bastolla et al. is representative only for cases in which the product of mutation rate μ and population size Ne is small. How the substitution process behaves when μNe is large is not known.
Here, I study the behavior of the molecular clock in in silico protein evolution as a function of mutation rate and population size. I find that the index of dispersion decays with increasing μNe, and approaches 1 for large μNe . This observation can be explained with the selective pressure for mutational robustness, which is effective when μNe is large. This pressure keeps the population out of low-neutrality traps, and thus steadies the ticking of the molecular clock.
The molecular clock in neutral protein evolution can fall into two distinct regimes, a strongly overdispersed one for small μNe, and a mostly Poissonian one for large μNe. The former is relevant for the majority of organisms in the plant and animal kingdom, and the latter may be relevant for RNA viruses.
A new method for detecting site-specific variation of evolutionary rate (the so-called covarion process) from protein sequence data is proposed. It involves comparing the maximum-likelihood estimates of the replacement rate of an amino acid site in distinct subtrees of a large tree. This approach allows detection of covarion at the gene or the amino acid levels. The method is applied to mammalian-mitochondrial-protein sequences. Significant covarion-like evolution is found in the (simian) primate lineage: some amino acid positions are fast-evolving (i.e. unconstrained) in non-primate mammals but slow-evolving (i.e. highly constrained) in primates, and some show the opposite pattern. Our results indicate that the mitochondrial genome of primates reached a new peak of the adaptive landscape through positive selection.
The expansion of amino acid repeats is determined by a high mutation rate and can be increased or limited by selection. It has been suggested that recent expansions could be associated with the potential of adaptation to new environments. In this work, we quantify the strength of this association, as well as the contribution of potential confounding factors.
Mammalian positively selected genes have accumulated more recent amino acid repeats than other mammalian genes. However, we found little support for an accelerated evolutionary rate as the main driver for the expansion of amino acid repeats. The most significant predictors of amino acid repeats are gene function and GC content. There is no correlation with expression level.
Our analyses show that amino acid repeat expansions are causally independent from protein adaptive evolution in mammalian genomes. Relaxed purifying selection or positive selection do not associate with more or more recent amino acid repeats. Their occurrence is slightly favoured by the sequence context but mainly determined by the molecular function of the gene.
Post-translational modifications of amino acids can be used to generate novel cofactors capable of chemistries inaccessible to conventional amino acid side chains. The biosynthesis of these sites often requires one or more enzyme or protein accessory factors, the functions of which are quite diverse and often difficult to isolate in cases where multiple enzymes are involved. Herein is described the current knowledge of the biosynthesis of urease and nitrile hydratase metal centers, pyrroloquinoline quinone, hypusine, and tryptophan tryptophylquinone cofactors along with the most recent work elucidating the functions of individual accessory factors in these systems. These examples showcase the breadth and diversity of this continually expanding field.