PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (52)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Influenza A H1N1 Pandemic Strain Evolution – Divergence and the Potential for Antigenic Drift Variants 
PLoS ONE  2014;9(4):e93632.
The emergence of a novel A(H1N1) strain in 2009 was the first influenza pandemic of the genomic age, and unprecedented surveillance of the virus provides the opportunity to better understand the evolution of influenza. We examined changes in the nucleotide coding regions and the amino acid sequences of the hemagglutinin (HA), neuraminidase (NA), and nucleoprotein (NP) segments of the A(H1N1)pdm09 strain using publicly available data. We calculated the nucleotide and amino acid hamming distance from the vaccine strain A/California/07/2009 for each sequence. We also estimated Pepitope–a measure of antigenic diversity based on changes in the epitope regions–for each isolate. Finally, we compared our results to A(H3N2) strains collected over the same period. Our analysis found that the mean hamming distance for the HA protein of the A(H1N1)pdm09 strain increased from 3.6 (standard deviation [SD]: 1.3) in 2009 to 11.7 (SD: 1.0) in 2013, while the mean hamming distance in the coding region increased from 7.4 (SD: 2.2) in 2009 to 28.3 (SD: 2.1) in 2013. These trends are broadly similar to the rate of mutation in H3N2 over the same time period. However, in contrast to H3N2 strains, the rate of mutation accumulation has slowed in recent years. Our results are notable because, over the course of the study, mutation rates in H3N2 similar to that seen with A(H1N1)pdm09 led to the emergence of two antigenic drift variants. However, while there has been an H1N1 epidemic in North America this season, evidence to date indicates the vaccine is still effective, suggesting the epidemic is not due to the emergence of an antigenic drift variant. Our results suggest that more research is needed to understand how viral mutations are related to vaccine effectiveness so that future vaccine choices and development can be more predictive.
doi:10.1371/journal.pone.0093632
PMCID: PMC3974778  PMID: 24699432
2.  Protein Quality Control Acts on Folding Intermediates to Shape the Effects of Mutations on Organismal Fitness 
Molecular cell  2012;49(1):133-144.
Summary
What are the molecular properties of proteins that fall on the radar of protein quality control (PQC)? Here we mutate the E. coli’s gene encoding dihydrofolate reductase (DHFR), and replace it with bacterial orthologous genes to determine how components of PQC modulate fitness effects of these genetic changes. We find that chaperonins GroEL/ES and protease Lon compete for binding to molten globule intermediate of DHFR, resulting in a peculiar symmetry in their action: Over-expression of GroEL/ES and deletion of Lon both restore growth of deleterious DHFR mutants and most of the slow-growing orthologous DHFR strains. Kinetic steady-state modeling predicts and experimentation verifies that mutations affect fitness by shifting the flux balance in cellular milieu between protein production, folding and degradation orchestrated by PQC through the interaction with folding intermediates.
doi:10.1016/j.molcel.2012.11.004
PMCID: PMC3545112  PMID: 23219534
3.  Quantifying Chaperone-Mediated Transitions in the Proteostasis Network of E. coli 
PLoS Computational Biology  2013;9(11):e1003324.
For cells to function, the concentrations of all proteins in the cell must be maintained at the proper levels (proteostasis). This task – complicated by cellular stresses, protein misfolding, aggregation, and degradation – is performed by a collection of chaperones that alter the configurational landscape of a given client protein through the formation of protein-chaperone complexes. The set of all such complexes and the transitions between them form the proteostasis network. Recently, a computational model was introduced (FoldEco) that synthesizes experimental data into a system-wide description of the proteostasis network of E. coli. This model describes the concentrations over time of all the species in the system, which include different conformations of the client protein, as well as protein-chaperone complexes. We apply to this model a recently developed analysis tool to calculate mediation probabilities in complex networks. This allows us to determine the probability that a given chaperone system is used to mediate transitions between client protein conformations, such as folding, or the correction of misfolded conformations. We determine how these probabilities change both across different proteins, as well as with system parameters, such as the synthesis rate, and in each case reveal in detail which factors control the usage of one chaperone system over another. We find that the different chaperone systems do not operate orthogonally and can compensate for each other when one system is disabled or overworked, and that this can complicate the analysis of “knockout” experiments, where the concentration of native protein is compared both with and without the presence of a given chaperone system. This study also gives a general recipe for conducting a transition-path–based analysis on a network of coupled chemical reactions, which can be useful in other types of networks as well.
Author Summary
To maintain proper amounts of folded, functional proteins, cells use systems of chaperones to correct misfolded proteins, disassemble aggregates, and provide sheltered environments in which proteins fold to their native structure. Typically, an individual system is studied in isolation, and its effects on a given protein are studied using “knockouts”, where the amount of native protein is compared with and without the active chaperone system. However, when multiple chaperone systems are operating simultaneously, knockouts can fail to reveal chaperone activity, as different chaperone systems can compensate for one another. We use a previously introduced computational model of chaperone systems in Escherichia coli, in combination with our transition-path analysis methods for networks, to analyze paths of individual proteins through the set of possible chaperone-bound and -unbound states. Our analysis allows us to answer questions that are inaccessible to knockout experiments, such as: How often will a given chaperone system be used to rescue a protein from a misfolded state? This approach provides a clear view of how the different systems of chaperones cooperate and compete under varying conditions.
doi:10.1371/journal.pcbi.1003324
PMCID: PMC3828153  PMID: 24244134
4.  Catalysis of Protein Folding by Chaperones Accelerates Evolutionary Dynamics in Adapting Cell Populations 
PLoS Computational Biology  2013;9(11):e1003269.
Although molecular chaperones are essential components of protein homeostatic machinery, their mechanism of action and impact on adaptation and evolutionary dynamics remain controversial. Here we developed a physics-based ab initio multi-scale model of a living cell for population dynamics simulations to elucidate the effect of chaperones on adaptive evolution. The 6-loci genomes of model cells encode model proteins, whose folding and interactions in cellular milieu can be evaluated exactly from their genome sequences. A genotype-phenotype relationship that is based on a simple yet non-trivially postulated protein-protein interaction (PPI) network determines the cell division rate. Model proteins can exist in native and molten globule states and participate in functional and all possible promiscuous non-functional PPIs. We find that an active chaperone mechanism, whereby chaperones directly catalyze protein folding, has a significant impact on the cellular fitness and the rate of evolutionary dynamics, while passive chaperones, which just maintain misfolded proteins in soluble complexes have a negligible effect on the fitness. We find that by partially releasing the constraint on protein stability, active chaperones promote a deeper exploration of sequence space to strengthen functional PPIs, and diminish the non-functional PPIs. A key experimentally testable prediction emerging from our analysis is that down-regulation of chaperones that catalyze protein folding significantly slows down the adaptation dynamics.
Author Summary
Molecular chaperones or heat-shock proteins are essential components of protein homeostatic machinery in all three domains of life, whose role is not only to prevent protein aggregation but also catalyze the protein folding process by decreasing the energetic barrier for folding. Importantly, chaperones have often been implicated as phenotypic capacitors since they buffer the deleterious effects of mutations, promote genetic diversity, and thus speed up adaptive evolution. Here we explore computationally the consequences of chaperone activity in cytoplasm via long-time evolutionary dynamics simulations. We use a 6-loci multi scale model of cell populations, where the fitness of each cell is determined from its genome, based on statistical mechanical principles of protein folding and protein-protein interactions. We find that by catalyzing protein folding chaperones buffer the deleterious effect of mutations on folding stability and thus open up a sequence space for efficient and simultaneous optimization of multiple molecular traits determining the cellular fitness. As a result, chaperones dramatically accelerate adaptation dynamics.
doi:10.1371/journal.pcbi.1003269
PMCID: PMC3820506  PMID: 24244114
5.  Contribution of Selection for Protein Folding Stability in Shaping the Patterns of Polymorphisms in Coding Regions 
Molecular Biology and Evolution  2013;31(1):165-176.
The patterns of polymorphisms in genomes are imprints of the evolutionary forces at play in nature. In particular, polymorphisms have been extensively used to infer the fitness effects of mutations and their dynamics of fixation. However, the role and contribution of molecular biophysics to these observations remain unclear. Here, we couple robust findings from protein biophysics, enzymatic flux theory, the selection against the cytotoxic effects of protein misfolding, and explicit population dynamics simulations in the polyclonal regime. First, we recapitulate results on the dynamics of clonal interference and on the shape of the DFE, thus providing them with a molecular and mechanistic foundation. Second, we predict that if evolution is indeed under the dynamic equilibrium of mutation–selection balance, the fraction of stabilizing and destabilizing mutations is almost equal among single-nucleotide polymorphisms segregating at high allele frequencies. This prediction is proven true for polymorphisms in the human coding region. Overall, our results show how selection for protein folding stability predominantly shapes the patterns of polymorphisms in coding regions.
doi:10.1093/molbev/mst189
PMCID: PMC3879451  PMID: 24124208
SNPs; polymorphism; protein folding stability; DFE; clonal interference
6.  Assessing the Effect of Loop Mutations in the Folding Space of β2-Microglobulin with Molecular Dynamics Simulations 
We use molecular dynamics simulations of a full atomistic Gō model to explore the impact of selected DE-loop mutations (D59P and W60C) on the folding space of protein human β2-microglobulin (Hβ2m), the causing agent of dialysis-related amyloidosis, a conformational disorder characterized by the deposition of insoluble amyloid fibrils in the osteoarticular system. Our simulations replicate the effect of mutations on the thermal stability that is observed in experiments in vitro. Furthermore, they predict the population of a partially folded state, with 60% of native internal free energy, which is akin to a molten globule. In the intermediate state, the solvent accessible surface area increases up to 40 times relative to the native state in 38% of the hydrophobic core residues, indicating that the identified species has aggregation potential. The intermediate state preserves the disulfide bond established between residue Cys25 and residue Cys80, which helps maintain the integrity of the core region, and is characterized by having two unstructured termini. The movements of the termini dominate the essential modes of the intermediate state, and exhibit the largest displacements in the D59P mutant, which is the most aggregation prone variant. PROPKA predictions of pKa suggest that the population of the intermediate state may be enhanced at acidic pH explaining the larger amyloidogenic potential observed in vitro at low pH for the WT protein and mutant forms.
doi:10.3390/ijms140917256
PMCID: PMC3794727  PMID: 23975166
intermediate states; molten globule; folding pathways; discrete molecular dynamics; principal component analysis; dialysis-related amyloidosis
7.  An all-atom model for stabilization of α-helical structure in peptides by hydrocarbon staples 
Recent work has shown that the incorporation of an all-hydrocarbon “staple” into peptides can greatly increase their α-helix propensity, leading to an improvement in pharmaceutical properties such as proteolytic stability, receptor affinity and cell-permeability. Stapled peptides thus show promise as a new class of drugs capable of accessing intractable targets such as those that engage in intracellular protein-protein interactions. The extent of α-helix stabilization provided by stapling has proven to be substantially context dependent, requiring cumbersome screening to identify the optimal site for staple incorporation. In certain cases, a staple encompassing one turn of the helix (attached at residues i and i+4) furnishes greater helix stabilization than one encompassing two turns (i,i+7 staple), which runs counter to expectation based on polymer theory. These findings highlight the need for a more thorough understanding of the forces that underlie helix stabilization by hydrocarbon staples. Here we report all-atom Monte Carlo folding simulations comparing unmodified peptides derived from RNAse A and BID BH3 with various i,i+4 and i,i+7 stapled versions thereof. The results of these simulations were found to be in quantitative agreement with experimentally determined helix propensities. We also discovered that staples can stabilize quasi-stable decoy conformations, and that the removal of these states plays a major role in determining the helix stability of stapled peptides. Finally, we critically investigate why our method works, exposing the underlying physical forces that stabilize stapled peptides.
doi:10.1021/ja805037p
PMCID: PMC2735086  PMID: 19334772
Stapled Peptides; Monte-Carlo simulations; Drug Discovery; Folding Traps; Entropic Stabilization
8.  The folding mechanics of a knotted protein 
Journal of molecular biology  2007;368(3):884-893.
An increasing number of proteins are being discovered with a remarkable and somewhat surprising feature, a knot in their native structures. How the polypeptide chain is able to “knot” itself during the folding process to form these highly intricate protein topologies is not known. Here we perform a computational study on the 160-amino acid homodimeric protein YibK which, like other proteins in the SpoU family of MTases, contains a deep trefoil knot in its C-terminal region. In this study, we use a coarse-grained Cα-chain representation and Langevin dynamics to study folding kinetics. We find that specific, attractive nonnative interactions are critical for knot formation. In the absence of these interactions, i.e. in an energetics driven entirely by native interactions, knot formation is exceedingly unlikely. Further, we find, in concert with recent experimental data on YibK, two parallel folding pathways which we attribute to an early and a late formation of the trefoil knot, respectively. For both pathways, knot formation occurs before dimerization. A bioinformatics analysis of the SpoU family of proteins reveals further that the critical nonnative interactions may originate from evolutionary conserved hydrophobic segments around the knotted region.
doi:10.1016/j.jmb.2007.02.035
PMCID: PMC2692925  PMID: 17368671
9.  SDR: a database of predicted specificity-determining residues in proteins 
Nucleic Acids Research  2008;37(Database issue):D191-D194.
The specificity-determining residue database (SDR database) presents residue positions where mutations are predicted to have changed protein function in large protein families. Because the database pre-calculates predictions on existing protein sequence alignments, users can quickly find the predictions by selecting the appropriate protein family or searching by protein sequence. Predictions can be used to guide mutagenesis or to gain a better understanding of specificity changes in a protein family. The database is available on the web at http://paradox.harvard.edu/sdr.
doi:10.1093/nar/gkn716
PMCID: PMC2686543  PMID: 18927118
10.  Positively Selected Sites in Cetacean Myoglobins Contribute to Protein Stability 
PLoS Computational Biology  2013;9(3):e1002929.
Since divergence ∼50 Ma ago from their terrestrial ancestors, cetaceans underwent a series of adaptations such as a ∼10–20 fold increase in myoglobin (Mb) concentration in skeletal muscle, critical for increasing oxygen storage capacity and prolonging dive time. Whereas the O2-binding affinity of Mbs is not significantly different among mammals (with typical oxygenation constants of ∼0.8–1.2 µM−1), folding stabilities of cetacean Mbs are ∼2–4 kcal/mol higher than for terrestrial Mbs. Using ancestral sequence reconstruction, maximum likelihood and Bayesian tests to describe the evolution of cetacean Mbs, and experimentally calibrated computation of stability effects of mutations, we observe accelerated evolution in cetaceans and identify seven positively selected sites in Mb. Overall, these sites contribute to Mb stabilization with a conditional probability of 0.8. We observe a correlation between Mb folding stability and protein abundance, suggesting that a selection pressure for stability acts proportionally to higher expression. We also identify a major divergence event leading to the common ancestor of whales, during which major stabilization occurred. Most of the positively selected sites that occur later act against other destabilizing mutations to maintain stability across the clade, except for the shallow divers, where late stability relaxation occurs, probably due to the shorter aerobic dive limits of these species. The three main positively selected sites 66, 5, and 35 undergo changes that favor hydrophobic folding, structural integrity, and intra-helical hydrogen bonds.
Author Summary
In this work, we identify positive selection in cetacean myoglobins and an early, significant divergence event. While O2-binding is nearly unchanged, positive selection acts to introduce and later maintain stability. Stability correlates with abundance across the species, supporting that selection for increased stability concurred with the known 10–20 fold increase in myoglobin abundance of cetaceans relative to terrestrial mammals, which itself resulted from speciation towards longer dive lengths of the animals. We suggest that this selection acted to keep constant the otherwise increasing number of unfolded Mb. Altogether, this work for the first time links protein phenotype (stability and abundance) in a specific, real protein to organism-level evolution and fitness of mammals.
doi:10.1371/journal.pcbi.1002929
PMCID: PMC3591298  PMID: 23505347
11.  Protein Biophysics Explains Why Highly Abundant Proteins Evolve Slowly 
Cell reports  2012;2(2):249-256.
SUMMARY
The consistent observation across all kingdoms of life that highly abundant proteins evolve slowly demonstrates that cellular abundance is a key determinant of protein evolutionary rate. However, other empirical findings, such as the broad distribution of evolutionary rates, suggest that additional variables determine the rate of protein evolution. Here, we report that under the global selection against the cytotoxic effects of misfolded proteins, folding stability (ΔG), simultaneous with abundance, is a causal variable of evolutionary rate. Using both theoretical analysis and multiscale simulations, we demonstrate that the anticorrelation between the pre-mutation ΔG and the arising mutational effect (ΔΔG), purely biophysical in origin, is a necessary requirement for abundance–evolutionary rate covariation. Additionally, we predict and demonstrate in bacteria that the strength of abundance–evolutionary rate correlation depends on the divergence time separating reference genomes. Altogether, these results highlight the intrinsic role of protein biophysics in the emerging universal patterns of molecular evolution.
doi:10.1016/j.celrep.2012.06.022
PMCID: PMC3533372  PMID: 22938865
12.  Improvisation in Evolution of Genes and Genomes: Whose Structure is it Anyway? 
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time new microscopic models were developed that made it possible to analyze evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales – from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How does gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provide first robust answers to these questions.
doi:10.1016/j.sbi.2008.02.007
PMCID: PMC3533375  PMID: 18487041
13.  A Universal Trend among Proteomes Indicates an Oily Last Common Ancestor 
PLoS Computational Biology  2012;8(12):e1002839.
Despite progresses in ancestral protein sequence reconstruction, much needs to be unraveled about the nature of the putative last common ancestral proteome that served as the prototype of all extant lifeforms. Here, we present data that indicate a steady decline (oil escape) in proteome hydrophobicity over species evolvedness (node number) evident in 272 diverse proteomes, which indicates a highly hydrophobic (oily) last common ancestor (LCA). This trend, obtained from simple considerations (free from sequence reconstruction methods), was corroborated by regression studies within homologous and orthologous protein clusters as well as phylogenetic estimates of the ancestral oil content. While indicating an inherent irreversibility in molecular evolution, oil escape also serves as a rare and universal reaction-coordinate for evolution (reinforcing Darwin's principle of Common Descent), and may prove important in matters such as (i) explaining the emergence of intrinsically disordered proteins, (ii) developing composition- and speciation-based “global” molecular clocks, and (iii) improving the statistical methods for ancestral sequence reconstruction.
Author Summary
Although of importance to both evolution and protein design, the manner in which the first proteome came to be, and the actual features of the earliest ancestral proteomes are both unknown. Through the analysis of diverse proteomes, we provide glimpses into the composition of the last common ancestor (LUCA) of all lifeforms, which indicate that the earliest/last common ancestor had a proteome that was highly hydrophobic/oily. Notably, the evidence presented (a) indicates that proteomes of all species ranging from bacteria to mammals appear to adhere to the same universal constraint (“oil escape”) set into motion by the last common ancestor more than 3.5 billion years ago, (b) indicates the presence of a previously untapped global (composition-level) molecular clock, and (c) strengthens the non-equilibrium/directional view of amino acid substitutions that challenges central dogmas regarding reversibility in molecular evolution.
doi:10.1371/journal.pcbi.1002839
PMCID: PMC3531291  PMID: 23300421
14.  Thermodynamics and kinetics of the hairpin ribozyme from atomistic folding/unfolding simulations 
Journal of molecular biology  2011;411(5):1128-1144.
We report a set of atomistic folding/unfolding simulations for the hairpin ribozyme using a monte carlo algorithm. The hairpin ribozyme folds in solution and catalyzes self-cleavage or ligation via a specific two-domain structure. The minimal active ribozyme has been studied extensively, showing stabilization of the active structure by cations and dynamic motion of the active structure. Here we introduce a simple model of tertiary structure formation that leads to a phase diagram for the RNA as a function of temperature and tertiary structure strength. We then employ this model to capture many folding/unfolding events and to examine the transition state ensemble (TSE) of the RNA during folding to its active “docked” conformation. The TSE is compact but with few tertiary interactions formed, in agreement with single-molecule dynamics experiments. To compare with experimental kinetic parameters we introduce a novel method to benchmark monte carlo kinetic parameters to docking/undocking rates collected over many single molecular trajectories. We find that topology alone, as encoded in a biased potential which discriminates between secondary and tertiary interactions, is sufficient to predict the thermodynamic behavior and kinetic folding pathway of the hairpin ribozyme. This method should be useful in predicting folding transition states for many natural or man-made RNA tertiary structures.
doi:10.1016/j.jmb.2011.06.042
PMCID: PMC3508787  PMID: 21740912
15.  Mutation Induced Extinction in Finite Populations: Lethal Mutagenesis and Lethal Isolation 
PLoS Computational Biology  2012;8(8):e1002609.
Reproduction is inherently risky, in part because genomic replication can introduce new mutations that are usually deleterious toward fitness. This risk is especially severe for organisms whose genomes replicate “semi-conservatively,” e.g. viruses and bacteria, where no master copy of the genome is preserved. Lethal mutagenesis refers to extinction of populations due to an unbearably high mutation rate (U), and is important both theoretically and clinically, where drugs can extinguish pathogens by increasing their mutation rate. Previous theoretical models of lethal mutagenesis assume infinite population size (N). However, in addition to high U, small N can accelerate extinction by strengthening genetic drift and relaxing selection. Here, we examine how the time until extinction depends jointly on N and U. We first analytically compute the mean time until extinction (τ) in a simplistic model where all mutations are either lethal or neutral. The solution motivates the definition of two distinct regimes: a survival phase and an extinction phase, which differ dramatically in both how τ scales with N and in the coefficient of variation in time until extinction. Next, we perform stochastic population-genetics simulations on a realistic fitness landscape that both (i) features an epistatic distribution of fitness effects that agrees with experimental data on viruses and (ii) is based on the biophysics of protein folding. More specifically, we assume that mutations inflict fitness penalties proportional to the extent that they unfold proteins. We find that decreasing N can cause phase transition-like behavior from survival to extinction, which motivates the concept of “lethal isolation.” Furthermore, we find that lethal mutagenesis and lethal isolation interact synergistically, which may have clinical implications for treating infections. Broadly, we conclude that stably folded proteins are only possible in ecological settings that support sufficiently large populations.
Author Summary
Most spontaneous mutations hurt organismal fitness, e.g. by destabilizing proteins. In many species, the normal mutation rate is strikingly high: on the order of one per genome per replication. In the face of these mutations, how can proteins maintain their native structure, and how can populations of organisms avoid extinction? Are there physics-based limits on how large the mutation rate of any species can be before the onslaught of mutations outpaces natural selection and melts-down proteins? Here, we address these questions with a computational model that combines protein folding thermodynamics with individual-based population genetics simulations. We calculate a theoretical “speed limit” equal to a few mutations per genome per replication—near the mutation rate of RNA viruses. Additionally, we find that the speed limit can be much lower in small populations where “random genetic drift” is strong. Thus, we conclude that stably folded proteins are only possible in ecological settings that support sufficiently large populations. These findings may have clinical implications for treating viral infections with drugs that elevate the viral mutation rate.
doi:10.1371/journal.pcbi.1002609
PMCID: PMC3410861  PMID: 22876168
16.  The Ensemble Folding Kinetics of the FBP28 WW Domain Revealed by an All-atom Monte Carlo Simulation in a Knowledge-based Potential 
Proteins  2011;79(6):1704-1714.
In this work, we apply a detailed all-atom model with a transferable knowledge-based potential to study the folding kinetics of Formin-Binding protein, FBP28, which is a canonical three-stranded β-sheet WW domain. Replica exchange Monte Carlo (REMC) simulations starting from random coils find native-like (C α RMSD of 2.68Å) lowest energy structure. We also study the folding kinetics of FBP28 WW domain by performing a large number of ab initio Monte Carlo folding simulations. Using these trajectories, we examine the order of formation of two β –hairpins, the folding mechanism of each individual β– hairpin, and transition state ensemble (TSE) of FBP28 WW domain and compare our results with experimental data and previous computational studies. To obtain detailed structural information on the folding dynamics viewed as an ensemble process, we perform a clustering analysis procedure based on graph theory. Further, a rigorous Pfold analysis is used to obtain representative samples of the TSEs showing good quantitative agreement between experimental and simulated Φ values. Our analysis shows that the turn structure between first and second β strands is a partially stable structural motif that gets formed before entering the TSE in FBP28 WW domain and there exist two major pathways for the folding of FBP28 WW domain, which differ in the order and mechanism of hairpin formation.
doi:10.1002/prot.22993
PMCID: PMC3092795  PMID: 21365688
transition state ensemble; protein folding; β-strand; β-hairpin; β-sheet; Φ-value analysis; Pfold analysis
17.  Residual Structures, Conformational Fluctuations, and Electrostatic Interactions in the Synergistic Folding of Two Intrinsically Disordered Proteins 
PLoS Computational Biology  2012;8(1):e1002353.
To understand the interplay of residual structures and conformational fluctuations in the interaction of intrinsically disordered proteins (IDPs), we first combined implicit solvent and replica exchange sampling to calculate atomistic disordered ensembles of the nuclear co-activator binding domain (NCBD) of transcription coactivator CBP and the activation domain of the p160 steroid receptor coactivator ACTR. The calculated ensembles are in quantitative agreement with NMR-derived residue helicity and recapitulate the experimental observation that, while free ACTR largely lacks residual secondary structures, free NCBD is a molten globule with a helical content similar to that in the folded complex. Detailed conformational analysis reveals that free NCBD has an inherent ability to substantially sample all the helix configurations that have been previously observed either unbound or in complexes. Intriguingly, further high-temperature unbinding and unfolding simulations in implicit and explicit solvents emphasize the importance of conformational fluctuations in synergistic folding of NCBD with ACTR. A balance between preformed elements and conformational fluctuations appears necessary to allow NCBD to interact with different targets and fold into alternative conformations. Together with previous topology-based modeling and existing experimental data, the current simulations strongly support an “extended conformational selection” synergistic folding mechanism that involves a key intermediate state stabilized by interaction between the C-terminal helices of NCBD and ACTR. In addition, the atomistic simulations reveal the role of long-range as well as short-range electrostatic interactions in cooperating with readily fluctuating residual structures, which might enhance the encounter rate and promote efficient folding upon encounter for facile binding and folding interactions of IDPs. Thus, the current study not only provides a consistent mechanistic understanding of the NCBD/ACTR interaction, but also helps establish a multi-scale molecular modeling framework for understanding the structure, interaction, and regulation of IDPs in general.
Author Summary
Intrinsically disordered proteins (IDPs) are now widely recognized to play fundamental roles in biology and to be frequently associated with human diseases. Although the potential advantages of intrinsic disorder in cellular signaling and regulation have been widely discussed, the physical basis for these proposed phenomena remains sketchy at best. An integration of multi-scale molecular modeling and experimental characterization is necessary to uncover the molecular principles that govern the structure, interaction, and regulation of IDPs. In this work, we characterize the conformational properties of two IDPs involved in transcription regulation at the atomistic level and further examine the roles of these properties in their coupled binding and folding interactions. Our simulations suggest interplay among residual structures, conformational fluctuations, and electrostatic interactions that allows efficient synergistic folding of these two IDPs. In particular, we propose that electrostatic interactions might play an important role in facilitating rapid folding and binding recognition of IDPs, by enhancing the encounter rate and promoting efficient folding upon encounter.
doi:10.1371/journal.pcbi.1002353
PMCID: PMC3257294  PMID: 22253588
18.  Predictability of Evolutionary Trajectories in Fitness Landscapes 
PLoS Computational Biology  2011;7(12):e1002302.
Experimental studies on enzyme evolution show that only a small fraction of all possible mutation trajectories are accessible to evolution. However, these experiments deal with individual enzymes and explore a tiny part of the fitness landscape. We report an exhaustive analysis of fitness landscapes constructed with an off-lattice model of protein folding where fitness is equated with robustness to misfolding. This model mimics the essential features of the interactions between amino acids, is consistent with the key paradigms of protein folding and reproduces the universal distribution of evolutionary rates among orthologous proteins. We introduce mean path divergence as a quantitative measure of the degree to which the starting and ending points determine the path of evolution in fitness landscapes. Global measures of landscape roughness are good predictors of path divergence in all studied landscapes: the mean path divergence is greater in smooth landscapes than in rough ones. The model-derived and experimental landscapes are significantly smoother than random landscapes and resemble additive landscapes perturbed with moderate amounts of noise; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. We suggest that smoothness and the substantial deficit of peaks in the fitness landscapes of protein evolution are fundamental consequences of the physics of protein folding.
Author Summary
Is evolution deterministic, hence predictable, or stochastic, that is unpredictable? What would happen if one could “replay the tape of evolution”: will the outcomes of evolution be completely different or is evolution so constrained that history will be repeated? Arguably, these questions are among the most intriguing and most difficult in evolutionary biology. In other words, the predictability of evolution depends on the fraction of the trajectories on fitness landscapes that are accessible for evolutionary exploration. Because direct experimental investigation of fitness landscapes is technically challenging, the available studies only explore a minuscule portion of the landscape for individual enzymes. We therefore sought to investigate the topography of fitness landscapes within the framework of a previously developed model of protein folding and evolution where fitness is equated with robustness to misfolding. We show that model-derived and experimental landscapes are significantly smoother than random landscapes and resemble moderately perturbed additive landscapes; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. Thus, the smoothness and substantial deficit of peaks in fitness landscapes of protein evolution could be fundamental consequences of the physics of protein folding.
doi:10.1371/journal.pcbi.1002302
PMCID: PMC3240586  PMID: 22194675
19.  Optimality of Mutation and Selection in Germinal Centers 
PLoS Computational Biology  2010;6(6):e1000800.
The population dynamics theory of B cells in a typical germinal center could play an important role in revealing how affinity maturation is achieved. However, the existing models encountered some conflicts with experiments. To resolve these conflicts, we present a coarse-grained model to calculate the B cell population development in affinity maturation, which allows a comprehensive analysis of its parameter space to look for optimal values of mutation rate, selection strength, and initial antibody-antigen binding level that maximize the affinity improvement. With these optimized parameters, the model is compatible with the experimental observations such as the ∼100-fold affinity improvements, the number of mutations, the hypermutation rate, and the “all or none” phenomenon. Moreover, we study the reasons behind the optimal parameters. The optimal mutation rate, in agreement with the hypermutation rate in vivo, results from a tradeoff between accumulating enough beneficial mutations and avoiding too many deleterious or lethal mutations. The optimal selection strength evolves as a balance between the need for affinity improvement and the requirement to pass the population bottleneck. These findings point to the conclusion that germinal centers have been optimized by evolution to generate strong affinity antibodies effectively and rapidly. In addition, we study the enhancement of affinity improvement due to B cell migration between germinal centers. These results could enhance our understanding of the functions of germinal centers.
Author Summary
The antibodies in our immune system could efficiently improve their abilities in recognizing new antigens. This is done with the help of proliferation, mutation and selection of B cells which carry antibodies, but we have difficulties in developing a quantitative description of this adaptation process which is consistent with the various aspects of experimental observations. Based on the knowledge from experiments, here we present a theoretical model to calculate the numbers of B cells with different antigen recognizing abilities all the time, and look for the best possible design that improves the antigen recognizing ability most efficiently. We find that the best possible design is consistent with the experimental observations, pointing to the conclusion that the immune system has been optimized in evolution. We then study the trade-offs leading to the optimization of the design. The results will not only improve our understanding of the functions in immune system, but also reveal the design principles behind the details. In addition, the study enhances our understanding of the population dynamics in evolution.
doi:10.1371/journal.pcbi.1000800
PMCID: PMC2880589  PMID: 20532164
20.  Interplay between Pleiotropy and Secondary Selection Determines Rise and Fall of Mutators in Stress Response 
PLoS Computational Biology  2010;6(3):e1000710.
Mutators are clones whose mutation rate is about two to three orders of magnitude higher than the rate of wild-type clones and their roles in adaptive evolution of asexual populations have been controversial. Here we address this problem by using an ab initio microscopic model of living cells, which combines population genetics with a physically realistic presentation of protein stability and protein-protein interactions. The genome of model organisms encodes replication controlling genes (RCGs) and genes modeling the mismatch repair (MMR) complexes. The genotype-phenotype relationship posits that the replication rate of an organism is proportional to protein copy numbers of RCGs in their functional form and there is a production cost penalty for protein overexpression. The mutation rate depends linearly on the concentration of homodimers of MMR proteins. By simulating multiple runs of evolution of populations under various environmental stresses—stationary phase, starvation or temperature-jump—we find that adaptation most often occurs through transient fixation of a mutator phenotype, regardless of the nature of stress. By contrast, the fixation mechanism does depend on the nature of stress. In temperature jump stress, mutators take over the population due to loss of stability of MMR complexes. In contrast, in starvation and stationary phase stresses, a small number of mutators are supplied to the population via epigenetic stochastic noise in production of MMR proteins (a pleiotropic effect), and their net supply is higher due to reduced genetic drift in slowly growing populations under stressful environments. Subsequently, mutators in stationary phase or starvation hitchhike to fixation with a beneficial mutation in the RCGs, (second order selection) and finally a mutation stabilizing the MMR complex arrives, returning the population to a non-mutator phenotype. Our results provide microscopic insights into the rise and fall of mutators in adapting finite asexual populations.
Author Summary
The dramatic rise of mutators has been found to accompany adaptation of bacteria in response to many kinds of stress. Two views on the evolutionary origin of this phenomenon emerged: the pleiotropic hypothesis positing that it is a byproduct of environmental stress or other specific stress response mechanisms and the second order selection which states that mutators hitchhike to fixation with unrelated beneficial alleles. Conventional population genetics models could not fully resolve this controversy because they are based on certain assumptions about fitness landscape. Here we address this problem using a microscopic multiscale model, which couples physically realistic molecular descriptions of proteins and their interactions with population genetics of carrier organisms without assuming any a priori mutational effect on fitness landscape. We found that both pleiotropy and second order selection play a crucial role at different stages of adaptation: the supply of mutators is provided through destabilization of error correction complexes or, alternatively, fluctuations of production levels of prototypic mismatch repair proteins (pleiotropic effects), while the rise and fixation of mutators occurs when there is a sufficient supply of beneficial mutations in replication-controlling genes. This general mechanism assures a robust and reliable adaptation of organisms to unforeseen challenges. This study highlights physical principles underlying biological mechanisms of stress response and adaptation.
doi:10.1371/journal.pcbi.1000710
PMCID: PMC2837395  PMID: 20300650
21.  Investigating the Conformational Stability of Prion Strains through a Kinetic Replication Model 
PLoS Computational Biology  2009;5(7):e1000420.
Prion proteins are known to misfold into a range of different aggregated forms, showing different phenotypic and pathological states. Understanding strain specificities is an important problem in the field of prion disease. Little is known about which PrPSc structural properties and molecular mechanisms determine prion replication, disease progression and strain phenotype. The aim of this work is to investigate, through a mathematical model, how the structural stability of different aggregated forms can influence the kinetics of prion replication. The model-based results suggest that prion strains with different conformational stability undergoing in vivo replication are characterizable in primis by means of different rates of breakage. A further role seems to be played by the aggregation rate (i.e. the rate at which a prion fibril grows). The kinetic variability introduced in the model by these two parameters allows us to reproduce the different characteristic features of the various strains (e.g., fibrils' mean length) and is coherent with all experimental observations concerning strain-specific behavior.
Author Summary
Prion diseases are caused by the accumulation of a cellular prion protein with an altered conformation, which acts as a catalyst for the further recruitment and the modification of the normal form of the protein. Protein polymerization appears to have a central role in the progression of the disease, an aspect shared with several other neurodegenerative diseases. The aim of this work is to investigate at the kinetic level the “prion strain phenomenon”, i.e., the ability of prion proteins to misfold into a range of different aggregated forms exhibiting different replication and propagation properties. The dynamics of prion replication is investigated with the help of a mathematical model. We relate a measurement accessible in vitro (prion structural stability) to a mathematical description of the fibrils' kinetics in vivo. The analysis of the model suggests that the replication kinetics of the different prion strains is characterizable by means of two parameters, representing the rates of breakage and aggregation. This result is coherent with various experimental findings concerning strain-specific behavior, such as, for example, the observation of the fibril mean length of the various strains.
doi:10.1371/journal.pcbi.1000420
PMCID: PMC2697384  PMID: 19578427
22.  Influence of Protein Abundance on High-Throughput Protein-Protein Interaction Detection 
PLoS ONE  2009;4(6):e5815.
Experimental protein-protein interaction (PPI) networks are increasingly being exploited in diverse ways for biological discovery. Accordingly, it is vital to discern their underlying natures by identifying and classifying the various types of deterministic (specific) and probabilistic (nonspecific) interactions detected. To this end, we have analyzed PPI networks determined using a range of high-throughput experimental techniques with the aim of systematically quantifying any biases that arise from the varying cellular abundances of the proteins. We confirm that PPI networks determined using affinity purification methods for yeast and Eschericia coli incorporate a correlation between protein degree, or number of interactions, and cellular abundance. The observed correlations are small but statistically significant and occur in both unprocessed (raw) and processed (high-confidence) data sets. In contrast, the yeast two-hybrid system yields networks that contain no such relationship. While previously commented based on mRNA abundance, our more extensive analysis based on protein abundance confirms a systematic difference between PPI networks determined from the two technologies. We additionally demonstrate that the centrality-lethality rule, which implies that higher-degree proteins are more likely to be essential, may be misleading, as protein abundance measurements identify essential proteins to be more prevalent than nonessential proteins. In fact, we generally find that when there is a degree/abundance correlation, the degree distributions of nonessential and essential proteins are also disparate. Conversely, when there is no degree/abundance correlation, the degree distributions of nonessential and essential proteins are not different. However, we show that essentiality manifests itself as a biological property in all of the yeast PPI networks investigated here via enrichments of interactions between essential proteins. These findings provide valuable insights into the underlying natures of the various high-throughput technologies utilized to detect PPIs and should lead to more effective strategies for the inference and analysis of high-quality PPI data sets.
doi:10.1371/journal.pone.0005815
PMCID: PMC2686099  PMID: 19503833
23.  Constraints imposed by non-functional protein–protein interactions on gene expression and proteome size 
Crowded intracellular environments present a challenge for proteins to form functional specific complexes while reducing non-functional interactions with promiscuous non-functional partners. Here we show how the need to minimize the waste of resources to non-functional interactions limits the proteome diversity and the average concentration of co-expressed and co-localized proteins. Using the results of high-throughput Yeast 2-Hybrid experiments, we estimate the characteristic strength of non-functional protein–protein interactions. By combining these data with the strengths of specific interactions, we assess the fraction of time proteins spend tied up in non-functional interactions as a function of their overall concentration. This allows us to sketch the phase diagram for baker's yeast cells using the experimentally measured concentrations and subcellular localization of their proteins. The positions of yeast compartments on the phase diagram are consistent with our hypothesis that the yeast proteome has evolved to operate closely to the upper limit of its size, whereas keeping individual protein concentrations sufficiently low to reduce non-functional interactions. These findings have implication for conceptual understanding of intracellular compartmentalization, multicellularity and differentiation.
doi:10.1038/msb.2008.48
PMCID: PMC2538908  PMID: 18682700
non-functional interaction; protein–protein interaction; proteome size; yeast cytoplasm
24.  A First-Principles Model of Early Evolution: Emergence of Gene Families, Species, and Preferred Protein Folds 
PLoS Computational Biology  2007;3(7):e139.
In this work we develop a microscopic physical model of early evolution where phenotype—organism life expectancy—is directly related to genotype—the stability of its proteins in their native conformations—which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the “Big Bang” scenario whereby exponential population growth ensues as soon as favorable sequence–structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at timescales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary timescales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species—subpopulations that carry similar genomes. Further, we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first-principles picture of how first-gene families developed in the course of early evolution.
Author Summary
Here, we address the question of how Darwinian evolution of organisms determines molecular evolution of their proteins and genomes. We developed a microscopic ab initio model of early biological evolution where the fitness (essentially lifetime) of an organism is explicitly related to the evolving sequences of its proteins. The main assumption of the model is that the death rate of an organism is determined by the stability of the least stable of their proteins. A lattice model is used to calculate stability of all proteins in a genome from their amino acid sequence. The simulation of the model starts from 100 identical organisms, each carrying the same random gene, and proceeds via random mutations, gene duplication, organism births via replication, and organism deaths. We find that exponential population growth is possible only after the discovery of a very small number of specific advantageous protein structures. The number of genes in the evolving organisms depends on the mutation rate, demonstrating the intricate relationship between the genome sizes and protein stability requirements. Further, the model explains the observed power-law distributions of protein family and superfamily sizes, as well as the scale-free character of protein structural similarity graphs. Together, these results and their analysis suggest a plausible comprehensive scenario of emergence of the protein universe in early biological evolution.
doi:10.1371/journal.pcbi.0030139
PMCID: PMC1914367  PMID: 17630830
25.  A First-Principles Model of Early Evolution: Emergence of Gene Families, Species, and Preferred Protein Folds 
PLoS Computational Biology  2007;3(7):e139.
In this work we develop a microscopic physical model of early evolution where phenotype—organism life expectancy—is directly related to genotype—the stability of its proteins in their native conformations—which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the “Big Bang” scenario whereby exponential population growth ensues as soon as favorable sequence–structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at timescales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary timescales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species—subpopulations that carry similar genomes. Further, we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first-principles picture of how first-gene families developed in the course of early evolution.
Author Summary
Here, we address the question of how Darwinian evolution of organisms determines molecular evolution of their proteins and genomes. We developed a microscopic ab initio model of early biological evolution where the fitness (essentially lifetime) of an organism is explicitly related to the evolving sequences of its proteins. The main assumption of the model is that the death rate of an organism is determined by the stability of the least stable of their proteins. A lattice model is used to calculate stability of all proteins in a genome from their amino acid sequence. The simulation of the model starts from 100 identical organisms, each carrying the same random gene, and proceeds via random mutations, gene duplication, organism births via replication, and organism deaths. We find that exponential population growth is possible only after the discovery of a very small number of specific advantageous protein structures. The number of genes in the evolving organisms depends on the mutation rate, demonstrating the intricate relationship between the genome sizes and protein stability requirements. Further, the model explains the observed power-law distributions of protein family and superfamily sizes, as well as the scale-free character of protein structural similarity graphs. Together, these results and their analysis suggest a plausible comprehensive scenario of emergence of the protein universe in early biological evolution.
doi:10.1371/journal.pcbi.0030139
PMCID: PMC1914367  PMID: 17630830

Results 1-25 (52)