The ability to predict RNA secondary structure is fundamental for understanding and manipulating RNA function. The structural information obtained from selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) experiments greatly improves the accuracy of RNA secondary structure prediction. Recently, Das and colleagues [Kladwang et al., Biochemistry
50:8049 (2011)] proposed a “bootstrapping” approach to estimate the variance and helix-by-helix confidence levels of predicted secondary structures based on resampling (randomizing and summing) the measured SHAPE data. We show that the specific resampling approach described by Kladwang et al. introduces systematic errors and underestimates confidence in secondary structure prediction using SHAPE data. Instead, a leave-data-out jackknife approach better estimates the influence of a given experimental dataset on SHAPE-directed secondary structure modeling. Even when 35% of the data were left out in the jackknife approach, the confidence levels of SHAPE-directed secondary structure prediction were significantly higher than those calculated by Das and colleagues using bootstrapping. Helix confidence levels were thus significantly underestimated in the recent study, and resampling approach implemented by Kladwang et al. is not an appropriate metric for assigning confidences in SHAPE-directed secondary structure modeling.
Aggregation of Cu, Zn Superoxide Dismutase (SOD1) is often found in Amyotrophic Lateral Sclerosis (ALS) patients. The fibrillar aggregates formed by wildtype and various disease-associated mutants have recently been found to have distinct cores and morphologies. Previous computational and experimental studies of wildtype SOD1 suggest that the apo-monomer, highly aggregation-prone, displays substantial local unfolding dynamics. The residual folded structure of locally unfolded apoSOD1 corresponds to peptide segments forming the aggregation core as identified by a combination of proteolysis and mass spectroscopy. Therefore, we hypothesize that the destabilization of apoSOD1 caused by various mutations leads to distinct local unfolding dynamics. The partially unfolded structure, exposing the hydrophobic core and backbone hydrogen bond donors and acceptors, is prone to aggregate. The peptide segments in the residual folded structures form the “building block” for aggregation, which in turn determines the morphology of the aggregates. To test this hypothesis, we apply a multiscale simulation approach to study the aggregation of three typical SOD1 variants: wildtype, G37R, and I149T. Each of these SOD1 variants has distinct peptide segments forming the core structure and features different aggregate morphologies. We perform atomistic molecular dynamics simulations to study the conformational dynamics of apoSOD1 monomer, and coarse-grained molecular dynamics simulations to study the aggregation of partially unfolded SOD1 monomers. Our computational studies of monomer local unfolding and the aggregation of different SOD1 variants are consistent with experiments, supporting the hypothesis of the formation of aggregation “building blocks” via apo-monomer local unfolding as the mechanism of SOD1 fibrillar aggregation.
SOD1 misfolding and aggregation; fibrillar aggregate; aggregation building block; molecular dynamics; multiscale modeling
Protein-peptide interactions play important roles in many cellular processes, including signal transduction, trafficking, and immune recognition. Protein conformational changes upon binding, an ill-defined peptide binding surface, and the large number of peptide degrees of freedom make the prediction of protein-peptide interactions particularly challenging. To address these challenges, we perform rapid molecular dynamics simulations in order to examine the energetic and dynamic aspects of protein-peptide binding. We find that, in most cases, we recapitulate the native binding sites and native-like poses of protein-peptide complexes. Inclusion of electrostatic interactions in simulations significantly improves the prediction accuracy. Our results also highlight the importance of protein conformational flexibility, especially side-chain movement, which allows the peptide to optimize its conformation. Our findings not only demonstrate the importance of sufficient sampling of the protein and peptide conformations, but also reveal the possible effects of electrostatics and conformational flexibility on peptide recognition.
Opioids that stimulate the μ-opioid receptor (MOR1) are the most frequently prescribed and effective analgesics. Here we present a structural model of MOR1. Molecular dynamics simulations show a ligand-dependent increase in the conformational flexibility of the third intracellular loop that couples with the G-protein complex. These simulations likewise identified residues that form frequent contacts with ligands. We validated the binding residues using site-directed mutagenesis coupled with radioligand binding and functional assays. The model was used to blindly screen a library of ~1.2 million compounds. From the thirty-four compounds predicted to be strong binders, the top three candidates were examined using biochemical assays. One compound showed high efficacy and potency. Post hoc testing revealed this compound to be nalmefene, a potent clinically used antagonist, thus further validating the model. In summary, the MOR1 model provides a tool for elucidating the structural mechanism of ligand-initiated cell signaling and screening for novel analgesics.
Aggregation of Cu, Zn superoxide dismutase (SOD1) is implicated in Amyotrophic Lateral Sclerosis (ALS). Glutathionylation and phosphorylation of SOD1 is omnipresent in the human body, even in healthy individuals, and has been shown to increase SOD1 dimer dissociation, which is the first step on the pathway toward SOD1 aggregation. We find that post-translational modification of SOD1, especially glutathionylation, promotes dimer dissociation. We discover an intermediate state in the pathway to dissociation, a conformational change that involves a “loosening” of the β-barrels and a loss or shift of dimer interface interactions. In modified SOD1, this intermediate state is stabilized as compared to unmodified SOD1. The presence of post-translational modifications could explain the environmental factors involved in the speed of disease progression. Because post-translational modifications such as glutathionylation are often induced by oxidative stress, post-translational modification of SOD1 could be a factor in the occurrence of sporadic cases of ALS, which make up 90% of all cases of the disease.
Molecular modeling of proteins including homology modeling, structure determination, and knowledge-based protein design requires tools to evaluate and refine three-dimensional protein structures. Steric clash is one of the artifacts prevalent in low-resolution structures and homology models. Steric clashes arise due to the unnatural overlap of any two non-bonding atoms in a protein structure. Usually, removal of severe steric clashes in some structures is challenging since many existing refinement programs do not accept structures with severe steric clashes. Here, we present a quantitative approach of identifying steric clashes in proteins by defining clashes based on the Van der Waals repulsion energy of the clashing atoms. We also define a metric for quantitative estimation of the severity of clashes in proteins by performing statistical analysis of clashes in high-resolution protein structures. We describe a rapid, automated and robust protocol, Chiron, which efficiently resolves severe clashes in low-resolution structures and homology models with minimal perturbation in the protein backbone. Benchmark studies highlight the efficiency and robustness of Chiron compared to other widely used methods. We provide Chiron as an automated web server to evaluate and resolve clashes in protein structures that can be further used for more accurate protein design.
Homology modeling; refinement; Chiron; Discrete Molecular Dynamics; Protein Design
Existing flexible docking approaches model the ligand and receptor flexibility either separately or in a loosely-coupled manner, which captures the conformational changes inefficiently. Here, we propose a flexible docking approach, MedusaDock, which models both ligand and receptor flexibility simultaneously with sets of discrete rotamers. We develop an algorithm to build the ligand rotamer library “on-the-fly” during docking simulations. MedusaDock benchmarks demonstrate a rapid sampling efficiency and high prediction accuracy in both self-docking (to the co-crystallized state) and cross-docking (to a state co-crystallized with a different ligand), the latter of which mimics the virtual-screening procedure in computational drug discovery. We also perform a virtual-screening test of four flexible kinase targets including cyclin-dependent kinase 2, vascular endothelial growth factor receptor 2, HIV reverse transcriptase, and HIV protease. We find significant improvements of virtual-screening enrichments when compared to rigid-receptor methods. The predictive power of MedusaDock in cross-docking and preliminary virtual-screening benchmarks highlights the importance to model both ligand and receptor flexibility simultaneously in computational docking.
Conformational changes of filamin A under stress have been postulated to play crucial roles in signaling pathways of cell responses. Direct observation of conformational changes under stress is beyond the resolution of current experimental techniques. On the other hand, computational studies are mainly limited to either traditional molecular dynamics simulations of short durations and high forces or simulations of simplified models. Here we perform all-atom discrete molecular dynamics (DMD) simulations to study thermally and force-induced unfolding of filamin A. The high conformational sampling efficiency of DMD allows us to observe force-induced unfolding of filamin A Ig domains under physiological forces. The computationally identified critical unfolding forces agree well with experimental measurements. Despite a large heterogeneity in the population of force-induced intermediate states, we find a common initial unfolding intermediate in all the Ig domains of filamin, where the N-terminal strand unfolds. We also study the thermal unfolding of several filamin Ig-like domains. We find that thermally induced unfolding features an early-stage intermediate state similar to the one observed in force-induced unfolding and characterized by N-terminal strand being unfurled. We propose that the N-terminal strand may act as a conformational switch that unfolds under physiological forces leading to exposure of cryptic binding sites, removal of native binding sites, and modulating the quaternary structure of domains.
Over the past three decades the protein folding field has undergone monumental changes. Originally a purely academic question, how a protein folds has now become vital in understanding diseases and our abilities to rationally manipulate cellular life by engineering protein folding pathways. We review and contrast past and recent developments in the protein folding field. Specifically, we discuss the progress in our understanding of protein folding thermodynamics and kinetics, the properties of evasive intermediates, and unfolded states. We also discuss how some abnormalities in protein folding lead to protein aggregation and human diseases.
Understanding the role of biomolecular dynamics in cellular processes leading to human diseases and the ability to rationally manipulate these processes is of fundamental importance in scientific research. The last decade has witnessed significant progress in probing biophysical behavior of proteins. However, we are still limited in understanding how changes in protein dynamics and inter-protein interactions occurring in short length- and time-scales lead to aberrations in their biological function. Bridging this gap in biology probed using computer simulations marks a challenging frontier in computational biology. Here we examine hypothesis-driven simplified protein models in conjunction with discrete molecular dynamics in the study of protein aggregation, implicated in series of neurodegenerative diseases, such as Alzheimer's and Huntington's diseases. Discrete molecular dynamics simulations of simplified protein models have emerged as a powerful methodology with its ability to bridge the gap in time and length scales from protein dynamics to aggregation, and provide an indispensable tool for probing protein aggregation.
Protein Aggregation; Protein Misfolding; Simplified Modeling; Aggregation Kinetics; Folding Thermodynamics; Discrete Molecular Dynamics; Molecular Dynamics; Computational Biology; Biophysics; MD; DMD; Misfolding; Molecular Dynamics; Review
Until now it has been impractical to observe protein folding in silico for proteins larger than 50 residues. Limitations of both force field accuracy and computational efficiency make the folding problem very challenging. Here we employ discrete molecular dynamics (DMD) simulations with an all-atom force field to fold fast-folding proteins. We extend the DMD force field by introducing long-range electrostatic interactions to model salt-bridges and a sequence-dependent semi-empirical potential accounting for natural tendencies of certain amino acid sequences to form specific secondary structures. We enhance the computational performance by parallelizing the DMD algorithm. Using a small number of commodity computers, we achieve sampling quality and folding accuracy comparable to the explicit-solvent simulations performed on high-end hardware. We demonstrate that DMD can be used to observe equilibrium folding of villin headpiece and WW domain, study two-state folding kinetics and sample near-native states in ab initio folding of proteins of ~100 residues.
Conformational dynamics; structure prediction; implicit solvent; parallel event-driven simulation
Molecular modeling guided by experimentally-derived structural information is an attractive approach for three-dimensional structure determination of complex RNAs that are not amenable to study by high-resolution methods. Hydroxyl radical probing (HRP), performed routinely in many laboratories, provides a measure of solvent accessibility at individual nucleotides. HRP measurements have, to date, only been used to evaluate RNA models qualitatively. Here, we report development of a quantitative structure refinement approach using HRP measurements to drive discrete molecular dynamics simulations for RNAs ranging in size from 80 to 230 nucleotides. HRP reactivities were first used to identify RNAs that form extensive helical packing interactions. For these RNAs, we achieved highly significant structure predictions, given inputs of RNA sequence and base pairing. This HRP-directed tertiary structure refinement approach generates robust structural hypotheses useful for guiding explorations of structure-function interrelationships in RNA.
Motivation: Increasing use of structural modeling for understanding structure–function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures.
Results: In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement.
Availability and Implementation: We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures.
Supplementary information: Supplementary data are available at Bioinformatics online.
Therapeutics based on RNA interference (RNAi) have emerged as a potential new class of drugs for treating human disease by silencing the target messenger RNA (mRNA), thereby reducing levels of the corresponding pathogenic protein. The major challenge for RNAi therapeutics is the development of safe delivery vehicles for small interfering RNAs (siRNAs). We previously showed that cholesterol-conjugated siRNAs (chol-siRNA) associate with plasma lipoprotein particles and distribute primarily to the liver after systemic administration to mice. We further demonstrated enhancement of silencing by administration of chol-siRNA pre-associated with isolated high-density lipoprotein (HDL) or low-density lipoprotein (LDL). In this study, we investigated mimetic lipoprotein particle prepared from recombinant apolipoprotein A1 (apoA) and apolipoprotein E3 (apoE) as a delivery vehicle for chol-siRNAs. We show that apoE-containing particle (E-lip) is highly effective in functional delivery of chol-siRNA to mouse liver. E-lip delivery was found to be considerably more potent than apoA-containing particle (A-lip). Furthermore, E-lip–mediated delivery was not significantly affected by high endogenous levels of plasma LDL. These results demonstrate that E-lip has substantial potential as delivery vehicles for lipophilic conjugates of siRNAs.
RNA function is dependent on its structure, yet three-dimensional folds for most biologically important RNAs are unknown. We develop a generic discrete molecular dynamics (DMD)-based modeling system that uses long-range constraints inferred from diverse biochemical or bioinformatic analyses to create statistically significant (p < 0.01) native-like folds for RNAs of known structure ranging from 45 to 158 nucleotides. We then predict the unknown structure of the hepatitis C virus IRES pseudoknot domain. The resulting RNA model rationalizes independent solvent accessibility and cryo-electron microscopy structure information. The pseudoknot positions the AUG start codon near the mRNA channel and is tRNA-like, suggesting the IRES employs molecular mimicry as a functional strategy.
Studies of cellular and tissue dynamics benefit greatly from tools that can control protein activity with specificity and precise timing in living systems. We describe here a new approach to confer allosteric regulation specifically on the catalytic activity of kinases. A highly conserved portion of the kinase catalytic domain is modified with a small protein insert that inactivates catalytic activity, but does not affect other protein interactions. Catalytic activity is restored by addition of rapamycin or non-immunosuppresive analogs (Fig. 1A). We demonstrate the approach by specifically activating focal adhesion kinase (FAK) within minutes in living cells, thereby demonstrating a novel role for FAK in regulation of membrane dynamics. Molecular modeling and mutagenesis indicate that the protein insert reduces activity by increasing the flexibility of the catalytic domain. Drug binding restores activity by increasing rigidity. Successful regulation of Src and p38 suggest that modification of this highly conserved site will be applicable to other kinases.
Polyglutamine (polyQ) expansion in exon1 (XN1) of the huntingtin protein is linked to Huntington's disease. When the number of glutamines exceeds a threshold of approximately 36–40 repeats, XN1 can readily form amyloid aggregates similar to those associated with disease. Many experiments suggest that misfolding of monomeric XN1 plays an important role in the length-dependent aggregation. Elucidating the misfolding of a XN1 monomer can help determine the molecular mechanism of XN1 aggregation and potentially help develop strategies to inhibit XN1 aggregation. The flanking sequences surrounding the polyQ region can play a critical role in determining the structural rearrangement and aggregation mechanism of XN1. Few experiments have studied XN1 in its entirety, with all flanking regions. To obtain structural insights into the misfolding of XN1 toward amyloid aggregation, we perform molecular dynamics simulations on monomeric XN1 with full flanking regions, a variant missing the polyproline regions, which are hypothesized to prevent aggregation, and an isolated polyQ peptide (Qn). For each of these three constructs, we study glutamine repeat lengths of 23, 36, 40 and 47. We find that polyQ peptides have a positive correlation between their probability to form a β-rich misfolded state and their expansion length. We also find that the flanking regions of XN1 affect its probability to^x_page_count=28 form a β-rich state compared to the isolated polyQ. Particularly, the polyproline regions form polyproline type II helices and decrease the probability of the polyQ region to form a β-rich state. Additionally, by lengthening polyQ, the first N-terminal 17 residues are more likely to adopt a β-sheet conformation rather than an α-helix conformation. Therefore, our molecular dynamics study provides a structural insight of XN1 misfolding and elucidates the possible role of the flanking sequences in XN1 aggregation.
Huntington's Disease is a neurodegenerative disorder associated with protein aggregation in neurons. The aggregates formed are thought to lead to neurotoxicity and cell death. Understanding the molecular structure of these aggregates may lead to strategies to inhibit aggregation. Exon 1 (XN1) of the huntingtin protein is critical for aggregate formation. This polypeptide has a naturally occurring polyglutamine sequence (polyQ), which is elongated in patients afflicted with the disease. The polyQ region in XN1 has several flanking sequences with distinct physicochemical properties, including the N-terminal 17 residues, two polyproline regions, and C-terminal sequences, that may affect its overall structure and aggregation. What is the overall structure of XN1, and what structural effects do the neighboring sequences have on each other and polyQ? We address these questions by studying computational models of various polypeptides, including XN1 and three mutant forms associated with Huntington's Disease. Certain neighboring sequences are found to inhibit aggregation, while others may be recruited by polyQ to form aggregates. Our results suggest the role that the flanking sequences may play in XN1 aggregation and may subsequently guide future structural models of XN1 aggregation.
The difficulty of analyzing higher order RNA structure, especially for folding intermediates and for RNAs whose functions require domains that are conformationally flexible, emphasizes the need for new approaches for modeling RNA tertiary structure accurately. Here, we report a concise approach that makes use of facile RNA structure probing experiments that are then interpreted using a computational algorithm, carefully tailored to optimize both the resolution and refinement speed for the resulting structures, without requiring user intervention. The RNA secondary structure is first established using SHAPE chemistry. We then use a sequence-directed cleavage agent, that can be placed arbitrarily in many helical motifs, to obtain high quality inter-residue distances. We interpret this in-solution chemical information using a fast, coarse grained, discrete molecular dynamics engine in which each RNA nucleotide is represented by pseudoatoms for the phosphate, ribose and nucleobase groups. By this approach, we refine base paired positions in yeast tRNAAsp to 4 Å RMSD without any preexisting information or assumptions about secondary or tertiary structures. This blended experimental and computational approach has the potential to yield native-like models for the diverse universe of functionally important RNAs whose structures cannot be characterized by conventional structural methods.
Summary: Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nt) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2–5 Å root mean squre deviations (RMSDs) from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, RMSDs from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses.
Supplementary information: Supplementary data are available at Bioinformatics online.
Discrete molecular dynamics (DMD) is a rapid sampling method used in protein folding and aggregation studies. Until now, DMD was used to perform simulations of simplified protein models in conjunction with structure-based force fields. Here, we develop an all-atom protein model and a transferable force field featuring packing, solvation, and environment-dependent hydrogen bond interactions. Using the replica exchange method, we perform folding simulations of six small proteins (20–60 residues) with distinct native structures. In all cases, native or near-native states are reached in simulations. For three small proteins, multiple folding transitions are observed and the computationally-characterized thermodynamics are in quantitative agreement with experiments. The predictive power of all-atom DMD highlights the importance of environment-dependent hydrogen bond interactions in modeling protein folding. The developed approach can be used for accurate and rapid sampling of conformational spaces of proteins and protein-protein complexes, and applied to protein engineering and design of protein-protein interactions.
ab initio protein folding; environment-dependent hydrogen bond; replica exchange; free energy landscape; conformational sampling
Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nucleotides) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2–5 Angstrom root mean square deviations from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, root mean square deviations from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses.
Nuclear receptor ligand binding domains (LBDs) convert ligand binding events into changes in gene expression by recruiting transcriptional coregulators to a conserved activation function-2 (AF-2) surface. While most nuclear receptor LBDs form homo- or heterodimers, the human nuclear receptor pregnane X receptor (PXR) forms a unique and essential homodimer and is proposed to assemble into a functional heterotetramer with the retinoid X receptor (RXR). How the homodimer interface, which is located 30 Å from the AF-2, would affect function at this critical surface has remained unclear. By using 20- to 30-ns molecular dynamics simulations on PXR in various oligomerization states, we observed a remarkably high degree of correlated motion in the PXR–RXR heterotetramer, most notably in the four helices that create the AF-2 domain. The function of such correlation may be to create “active-capable” receptor complexes that are ready to bind to transcriptional coactivators. Indeed, we found in additional simulations that active-capable receptor complexes involving other orphan or steroid nuclear receptors also exhibit highly correlated AF-2 domain motions. We further propose a mechanism for the transmission of long-range motions through the nuclear receptor LBD to the AF-2 surface. Taken together, our findings indicate that long-range motions within the LBD scaffold are critical to nuclear receptor function by promoting a mobile AF-2 state ready to bind coactivators.
Long-range motions play essential roles in protein function but are difficult to appreciate from static crystal structures. We sought to understand how macromolecular motion affects the formation of transcriptional complexes central to controlling gene expression. Using 20- to 30-ns molecular dynamics simulations, we examined three nuclear receptors that function as ligand-regulated transcription factors: the pregnane X receptor, the peroxisome proliferator-activator receptor-γ, and estrogen receptor-α. We found that each of these receptors exhibits a high degree of correlated motions within the domain responsible for forming functionally essential protein–protein interactions with transcriptional coactivators. We further found that specific long-range (up to 30 Å) motions play an important role in these dynamics. Our results show that “active-capable” nuclear receptors are prepared for coactivator contacts by maintaining a mobile but preformed protein–protein interaction surface.
Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%–30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1–2 Å root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to ~25%–30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology.
Studies of known proteins have revealed intriguing co-organization of their sequences and structures. Proteins with sequence similarity higher than 25%–30% usually adopt a similar structure and are called homologs, whereas those with low sequence similarity (<20%) can share the same structure and are referred as analogs. The origin of such co-organization has been a topic of extensive discussions among protein folding, design, and evolution research communities, because understanding of the emergence of homologs and analogs in the protein universe has broad implications for our ability to rationally manipulate proteins. In this study, the authors developed a molecular modeling and design method, Medusa, to computationally design diversified protein sequences that correspond to similar backbone structures, which determine a protein fold family. Using Medusa, the authors directly demonstrated the formation of distinct protein homologs within a specific fold family when the structure deviates only 1–2 Å away from the original structure. The study suggests that subtle structural changes, which appear due to accumulating mutations in the families of homologs, lead to a distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs.
Expansion of polyglutamine (polyQ) tracts in proteins results in protein aggregation and is associated with cell death in at least nine neurodegenerative diseases. Disease age of onset is correlated with the polyQ insert length above a critical value of 35–40 glutamines. The aggregation kinetics of isolated polyQ peptides in vitro also shows a similar critical-length dependence. While recent experimental work has provided considerable insights into polyQ aggregation, the molecular mechanism of aggregation is not well understood. Here, using computer simulations of isolated polyQ peptides, we show that a mechanism of aggregation is the conformational transition in a single polyQ peptide chain from random coil to a parallel β-helix. This transition occurs selectively in peptides longer than 37 glutamines. In the β-helices observed in simulations, all residues adopt β-strand backbone dihedral angles, and the polypeptide chain coils around a central helical axis with 18.5 ± 2 residues per turn. We also find that mutant polyQ peptides with proline-glycine inserts show formation of antiparallel β-hairpins in their ground state, in agreement with experiments. The lower stability of mutant β-helices explains their lower aggregation rates compared to wild type. Our results provide a molecular mechanism for polyQ-mediated aggregation.
Nine human diseases, including Huntington's disease, are associated with an expanded trinucleotide sequence CAG in genes. Since CAG codes for the amino acid glutamine, these disorders are collectively known as polyglutamine diseases. Although the genes (and proteins) involved in different polyglutamine diseases have little in common, the disorders they cause follow a strikingly similar course: If the length of the expansion exceeds a critical value of 35–40, the greater the number of glutamine repeats in a protein, the earlier the onset of disease and the more severe the symptoms. This fact suggests that abnormally long glutamine tracts render their host protein toxic to nerve cells, and all polyglutamine diseases are hypothesized to progress via common molecular mechanisms. One possible mechanism of cell death is that the abnormally long sequence of glutamines acquires a shape that prevents the host protein from folding into its proper shape. What is the structure acquired by polyglutamine and what is the molecular basis of the observed threshold repeat length? Using computer models of polyglutamine, the authors show that if, and only if, the length of polyglutamine repeats is longer than the critical value found in disease, it acquires a specific shape called a β-helix. The longer the glutamine tract length, the higher is the propensity to form β-helices. This length-dependent formation of β-helices by polyglutamine stretches may provide a unified molecular framework for understanding the structural basis of different trinucleotide repeat-associated diseases.