Human noroviruses (HuNoVs) are positive-sense RNA viruses that can cause severe, highly infectious gastroenteritis. HuNoV outbreaks are frequently associated with recombination between circulating strains. Strain genotyping and phylogenetic analyses show that noroviruses often recombine in a highly conserved region near the junction of the viral polyprotein (open reading frame 1 [ORF1]) and capsid (ORF2) genes and occasionally within the RNA-dependent RNA polymerase (RdRP) gene. Although genotyping methods are useful for tracking changes in circulating viral populations, they report only the dominant recombinant strains and do not elucidate the frequency or range of recombination events. Furthermore, the relatively low frequency of recombination in RNA viruses has limited studies to cell culture or in vitro systems, which do not reflect the complexities and selective pressures present in an infected organism. Using two murine norovirus (MNV) strains to model coinfection, we developed a microfluidic platform to amplify, detect, and recover individual recombinants following in vitro and in vivo coinfection. One-step reverse transcriptase PCR (RT-PCR) was performed in picoliter drops with primers that identified the wild-type and recombinant progenies and scanned for recombination breakpoints at ∼1-kb intervals. We detected recombination between MNV strains at multiple loci spanning the viral protease, RdRP, and capsid ORFs and isolated individual recombinant RNA genomes that were present at a frequency of 1/300,000 or higher. This study is the first to examine norovirus recombination following coinfection of an animal and suggests that the exchange of RNA among viral genomes in an infected host occurs in multiple locations and is an important driver of genetic diversity.
IMPORTANCE RNA viruses increase diversity and escape host immune barriers by genomic recombination. Studies using a number of viral systems indicate that recombination occurs via template switching by the virus-encoded RNA-dependent RNA polymerase (RdRP). However, factors that govern the frequency and positions of recombination in an infected organism remain largely unknown. This work leverages advances in the applied physics of drop-based microfluidics to isolate and sequence rare recombinants arising from the coinfection of mice with two distinct strains of murine norovirus. This study is the first to detect and analyze norovirus recombination in an animal model.
Horizontal gene transfer (HGT) plays a central role in bacterial evolution, yet the molecular and cellular constraints on functional integration of the foreign genes are poorly understood. Here we performed inter-species replacement of the chromosomal folA gene, encoding an essential metabolic enzyme dihydrofolate reductase (DHFR), with orthologs from 35 other mesophilic bacteria. The orthologous inter-species replacements caused a marked drop (in the range 10–90%) in bacterial growth rate despite the fact that most orthologous DHFRs are as stable as E.coli DHFR at 37°C and are more catalytically active than E. coli DHFR. Although phylogenetic distance between E. coli and orthologous DHFRs as well as their individual molecular properties correlate poorly with growth rates, the product of the intracellular DHFR abundance and catalytic activity (k
cat/KM), correlates strongly with growth rates, indicating that the drop in DHFR abundance constitutes the major fitness barrier to HGT. Serial propagation of the orthologous strains for ~600 generations dramatically improved growth rates by largely alleviating the fitness barriers. Whole genome sequencing and global proteome quantification revealed that the evolved strains with the largest fitness improvements have accumulated mutations that inactivated the ATP-dependent Lon protease, causing an increase in the intracellular DHFR abundance. In one case DHFR abundance increased further due to mutations accumulated in folA promoter, but only after the lon inactivating mutations were fixed in the population. Thus, by apparently distinguishing between self and non-self proteins, protein homeostasis imposes an immediate and global barrier to the functional integration of foreign genes by decreasing the intracellular abundance of their products. Once this barrier is alleviated, more fine-tuned evolution occurs to adjust the function/expression of the transferred proteins to the constraints imposed by the intracellular environment of the host organism.
Horizontal gene transfer (HGT) is central to bacterial evolution. The outcome of an HGT event (fixation in a population, elimination, or separation as a subdominant clone) depends not only on the availability of a new gene but crucially on the fitness cost or benefit of the genomic incorporation of the foreign gene and its expression in recipient bacteria. Here we studied the fitness landscape for inter-species chromosomal replacement of an essential protein, dihydrofolate reductase (DHFR) encoded by the folA gene, by its orthologs from other mesophilic bacteria. We purified and biochemically characterized 33 out of 35 orthologous DHFRs and found that most of them are stable and more catalytically active than E. coli DHFR. However, the inter-species replacement of DHFR caused significant fitness loss for most transgenic strains due to low abundance of orthologous DHFRs in E. coli cytoplasm. Laboratory evolution resulted in an increase in orthologous DHFR abundance leading to a dramatic fitness improvement. Genomic and proteomic analyses of “naive” and evolved strains suggest a new function of protein homeostasis to discriminate between “self” and “non-self” proteins, thus creating fitness barriers to HGT.
The variation among sequences and structures in nature is both determined by physical laws and by evolutionary history. However, these two factors are traditionally investigated by disciplines with different emphasis and philosophy—molecular biophysics on one hand and evolutionary population genetics in another. Here, we review recent theoretical and computational approaches that address the critical need to integrate these two disciplines. We first articulate the elements of these integrated approaches. Then, we survey their contribution to our mechanistic understanding of molecular evolution, the polymorphisms in coding region, the distribution of fitness effects (DFE) of mutations, the observed folding stability of proteins in nature, and the distribution of protein folds in genomes.
Understanding the relative contributions of various evolutionary processes—purifying selection, neutral drift, and adaptation—is fundamental to evolutionary biology. A common metric to distinguish these processes is the ratio of nonsynonymous to synonymous substitutions (i.e., dN/dS) interpreted from the neutral theory as a null model. However, from biophysical considerations, mutations have non-negligible effects on the biophysical properties of proteins such as folding stability. In this work, we investigated how stability affects the rate of protein evolution in phylogenetic trees by using simulations that combine explicit protein sequences with associated stability changes. We first simulated myoglobin evolution in phylogenetic trees with a biophysically realistic approach that accounts for 3D structural information and estimates of changes in stability upon mutation. We then compared evolutionary rates inferred directly from simulation to those estimated using maximum-likelihood (ML) methods. We found that the dN/dS estimated by ML methods (ωML) is highly predictive of the per gene dN/dS inferred from the simulated phylogenetic trees. This agreement is strong in the regime of high stability where protein evolution is neutral. At low folding stabilities and under mutation-selection balance, we observe deviations from neutrality (per gene dN/dS > 1 and dN/dS < 1). We showed that although per gene dN/dS is robust to these deviations, ML tests for positive selection detect statistically significant per site dN/dS > 1. Altogether, we show how protein biophysics affects the dN/dS estimations and its subsequent interpretation. These results are important for improving the current approaches for detecting positive selection.
dN/dS; molecular evolution; protein evolution; folding stability; positive selection; maximum likelihood
The emergence of a novel A(H1N1) strain in 2009 was the first influenza pandemic of the genomic age, and unprecedented surveillance of the virus provides the opportunity to better understand the evolution of influenza. We examined changes in the nucleotide coding regions and the amino acid sequences of the hemagglutinin (HA), neuraminidase (NA), and nucleoprotein (NP) segments of the A(H1N1)pdm09 strain using publicly available data. We calculated the nucleotide and amino acid hamming distance from the vaccine strain A/California/07/2009 for each sequence. We also estimated Pepitope–a measure of antigenic diversity based on changes in the epitope regions–for each isolate. Finally, we compared our results to A(H3N2) strains collected over the same period. Our analysis found that the mean hamming distance for the HA protein of the A(H1N1)pdm09 strain increased from 3.6 (standard deviation [SD]: 1.3) in 2009 to 11.7 (SD: 1.0) in 2013, while the mean hamming distance in the coding region increased from 7.4 (SD: 2.2) in 2009 to 28.3 (SD: 2.1) in 2013. These trends are broadly similar to the rate of mutation in H3N2 over the same time period. However, in contrast to H3N2 strains, the rate of mutation accumulation has slowed in recent years. Our results are notable because, over the course of the study, mutation rates in H3N2 similar to that seen with A(H1N1)pdm09 led to the emergence of two antigenic drift variants. However, while there has been an H1N1 epidemic in North America this season, evidence to date indicates the vaccine is still effective, suggesting the epidemic is not due to the emergence of an antigenic drift variant. Our results suggest that more research is needed to understand how viral mutations are related to vaccine effectiveness so that future vaccine choices and development can be more predictive.
What are the molecular properties of proteins that fall on the radar of protein quality control (PQC)? Here we mutate the E. coli’s gene encoding dihydrofolate reductase (DHFR), and replace it with bacterial orthologous genes to determine how components of PQC modulate fitness effects of these genetic changes. We find that chaperonins GroEL/ES and protease Lon compete for binding to molten globule intermediate of DHFR, resulting in a peculiar symmetry in their action: Over-expression of GroEL/ES and deletion of Lon both restore growth of deleterious DHFR mutants and most of the slow-growing orthologous DHFR strains. Kinetic steady-state modeling predicts and experimentation verifies that mutations affect fitness by shifting the flux balance in cellular milieu between protein production, folding and degradation orchestrated by PQC through the interaction with folding intermediates.
The patterns of polymorphisms in genomes are imprints of the evolutionary forces at play in nature. In particular, polymorphisms have been extensively used to infer the fitness effects of mutations and their dynamics of fixation. However, the role and contribution of molecular biophysics to these observations remain unclear. Here, we couple robust findings from protein biophysics, enzymatic flux theory, the selection against the cytotoxic effects of protein misfolding, and explicit population dynamics simulations in the polyclonal regime. First, we recapitulate results on the dynamics of clonal interference and on the shape of the DFE, thus providing them with a molecular and mechanistic foundation. Second, we predict that if evolution is indeed under the dynamic equilibrium of mutation–selection balance, the fraction of stabilizing and destabilizing mutations is almost equal among single-nucleotide polymorphisms segregating at high allele frequencies. This prediction is proven true for polymorphisms in the human coding region. Overall, our results show how selection for protein folding stability predominantly shapes the patterns of polymorphisms in coding regions.
SNPs; polymorphism; protein folding stability; DFE; clonal interference
Since divergence ∼50 Ma ago from their terrestrial ancestors, cetaceans underwent a series of adaptations such as a ∼10–20 fold increase in myoglobin (Mb) concentration in skeletal muscle, critical for increasing oxygen storage capacity and prolonging dive time. Whereas the O2-binding affinity of Mbs is not significantly different among mammals (with typical oxygenation constants of ∼0.8–1.2 µM−1), folding stabilities of cetacean Mbs are ∼2–4 kcal/mol higher than for terrestrial Mbs. Using ancestral sequence reconstruction, maximum likelihood and Bayesian tests to describe the evolution of cetacean Mbs, and experimentally calibrated computation of stability effects of mutations, we observe accelerated evolution in cetaceans and identify seven positively selected sites in Mb. Overall, these sites contribute to Mb stabilization with a conditional probability of 0.8. We observe a correlation between Mb folding stability and protein abundance, suggesting that a selection pressure for stability acts proportionally to higher expression. We also identify a major divergence event leading to the common ancestor of whales, during which major stabilization occurred. Most of the positively selected sites that occur later act against other destabilizing mutations to maintain stability across the clade, except for the shallow divers, where late stability relaxation occurs, probably due to the shorter aerobic dive limits of these species. The three main positively selected sites 66, 5, and 35 undergo changes that favor hydrophobic folding, structural integrity, and intra-helical hydrogen bonds.
In this work, we identify positive selection in cetacean myoglobins and an early, significant divergence event. While O2-binding is nearly unchanged, positive selection acts to introduce and later maintain stability. Stability correlates with abundance across the species, supporting that selection for increased stability concurred with the known 10–20 fold increase in myoglobin abundance of cetaceans relative to terrestrial mammals, which itself resulted from speciation towards longer dive lengths of the animals. We suggest that this selection acted to keep constant the otherwise increasing number of unfolded Mb. Altogether, this work for the first time links protein phenotype (stability and abundance) in a specific, real protein to organism-level evolution and fitness of mammals.
Isopentenyl phosphate kinase (IPK) catalyzes the phosphorylation of isopentenyl phosphate to form the isoprenoid precursor isopentenyl diphosphate (IPP) in the archaeal mevalonate pathway. This enzyme is highly homologous to fosfomycin kinase (FomA), an antibiotic resistance enzyme found in a few strains of Streptomyces and Pseudomonas whose mode of action is inactivation by phosphorylation. Superposition of Thermoplasma acidophilum (THA) IPK and FomA structures aligns their respective substrates and catalytic residues, including H50 and K14 in THA IPK, and H58 and K18 in S. wedmorensis FomA. These residues are conserved only in the IPK and FomA members of the phosphate subdivision of the amino acid kinase superfamily. We measured the fosfomycin kinase activity of THA IPK, Km = 15.1 ± 1.0 mM and kcat = (4.0 ± 0.1) × 10−2 s−1, resulting in a catalytic efficiency, kcat/Km = 2.6 M−1s−1, that is five orders of magnitude less than the native reaction. Fosfomycin is a competitive inhibitor of IPK, Ki = 3.6 ± 0.2 mM. Molecular dynamics simulation of the IPK•fosfomycin•MgATP complex identified two binding poses for fosfomycin in the IP binding site, one of which results in a complex analogous to the native IPK•IP•ATP complex that it engages H50 and the lysine triangle formed by K5, K14, and K205. The other binding pose leads to a dead-end complex that engages K204 near the IP binding site to bind fosfomycin. Our findings suggest a mechanism for acquisition of FomA-based antibiotic resistance in fosfomycin producing organisms.
The consistent observation across all kingdoms of life that highly abundant proteins evolve slowly demonstrates that cellular abundance is a key determinant of protein evolutionary rate. However, other empirical findings, such as the broad distribution of evolutionary rates, suggest that additional variables determine the rate of protein evolution. Here, we report that under the global selection against the cytotoxic effects of misfolded proteins, folding stability (ΔG), simultaneous with abundance, is a causal variable of evolutionary rate. Using both theoretical analysis and multiscale simulations, we demonstrate that the anticorrelation between the pre-mutation ΔG and the arising mutational effect (ΔΔG), purely biophysical in origin, is a necessary requirement for abundance–evolutionary rate covariation. Additionally, we predict and demonstrate in bacteria that the strength of abundance–evolutionary rate correlation depends on the divergence time separating reference genomes. Altogether, these results highlight the intrinsic role of protein biophysics in the emerging universal patterns of molecular evolution.
Opioids that stimulate the μ-opioid receptor (MOR1) are the most frequently prescribed and effective analgesics. Here we present a structural model of MOR1. Molecular dynamics simulations show a ligand-dependent increase in the conformational flexibility of the third intracellular loop that couples with the G-protein complex. These simulations likewise identified residues that form frequent contacts with ligands. We validated the binding residues using site-directed mutagenesis coupled with radioligand binding and functional assays. The model was used to blindly screen a library of ~1.2 million compounds. From the thirty-four compounds predicted to be strong binders, the top three candidates were examined using biochemical assays. One compound showed high efficacy and potency. Post hoc testing revealed this compound to be nalmefene, a potent clinically used antagonist, thus further validating the model. In summary, the MOR1 model provides a tool for elucidating the structural mechanism of ligand-initiated cell signaling and screening for novel analgesics.
Cytoplasmic dynein is an important motor that drives all minus-end directed movement along microtubules. Dynein is a complex motor whose processive motion is driven by ATP-hydrolysis. Dynein's run length has been measured to be several millimetres with typical velocities in the order of a few nanometres per second. Therefore, the average time between steps is a fraction of a second. When this time scale is compared with typical time scales for protein side chain and backbone movements (~10−9 s and ~10−5 s, respectively), it becomes clear that a multi-timescale modelling approach is required to understand energy transduction in this protein. Here, we review recent efforts to use computational and mathematical modelling to understand various aspects of dynein's chemomechanical cycle. First, we describe a structural model of dynein's motor unit showing a heptameric organization of the motor subunits. Second, we describe our molecular dynamics simulations of the motor unit that are used to investigate the dynamics of the various motor domains. Third, we present a kinetic model of the coordination between the two dynein heads. Lastly, we investigate the various potential geometries of the dimer during its hydrolytic and stepping cycle.
The Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) is a cAMP dependent chloride channel that is mutated in cystic fibrosis, an inherited disease of high morbidity and mortality. The phosphorylation of its ∼200 amino acid R domain by protein kinase A is obligatory for channel gating under normal conditions. The R domain contains more than ten PKA phosphorylation sites. No individual site is essential but phosphorylation of increasing numbers of sites enables progressively greater channel activity. In spite of numerous studies of the role of the R domain in CFTR regulation, its mechanism of action remains largely unknown. This is because neither its structure nor its interactions with other parts of CFTR have been completely elucidated. Studies have shown that the R domain lacks well-defined secondary structural elements and is an intrinsically disordered region of the channel protein. Here, we have analyzed the disorder pattern and employed computational methods to explore low energy conformations of the R domain. Specific disorder and secondary structure patterns detected suggest the presence of Molecular Recognition Elements (MoREs) that may mediate phosphorylation regulated intra- and inter-domain interactions. Simulations were performed to generate an ensemble of accessible R domain conformations. Although the calculated structures may represent more compact conformers than occur in vivo, their secondary structure propensities are consistent with predictions and published experimental data. Equilibrium simulations of a mimic of a phosphorylated R domain showed that it exhibited an increased radius of gyration. In one possible interpretation of these findings, by changing its size, the globally unstructured R domain may act as an entropic spring to perturb the packing of membrane-spanning sequences that constitute the ion permeability pathway and thereby activate channel gating.
CFTR; R domain; phosphorylation; disordered protein; molecular dynamics
Ryanodine receptors (RyRs) are ion channels that regulate muscle contraction by releasing calcium ions from intracellular stores into the cytoplasm. Mutations in skeletal muscle RyR (RyR1) give rise to congenital diseases such as central core disease. The absence of high-resolution structures of RyR1 has limited our understanding of channel function and disease mechanisms at the molecular level. Here, we report a structural model of the pore-forming region of RyR1. Molecular dynamics simulations show high ion binding to putative pore residues D4899, E4900, D4938, and D4945, which are experimentally known to be critical for channel conductance and selectivity. We also observe preferential localization of Ca2+ over K+ in the selectivity filter of RyR1. Simulations of RyR1-D4899Q mutant show a loss of preference to Ca2+ in the selectivity filter as seen experimentally. Electrophysiological experiments on a central core disease mutant, RyR1-G4898R, show constitutively open channels that conduct K+ but not Ca2+. Our simulations with G4898R likewise show a decrease in the preference of Ca2+ over K+ in the selectivity filter. Together, the computational and experimental results shed light on ion conductance and selectivity of RyR1 at an atomistic level.
Ryanodine receptors (RyRs) are ion channels present in the membranes of an intracellular calcium storage organelle, the sarcoplasmic reticulum. Nerve impulse triggers the opening of RyR channels, thus increasing the cytoplasmic calcium levels, which subsequently leads to muscle contraction. Congenital mutations in a specific type of RyR that is present in skeletal muscles, RyR1, lead to central core disease (CCD), which leads to weakened muscle. RyR1 mutations also render patients to be highly susceptible to malignant hyperthermia, an adverse reaction to general anesthesia. Although it is generally known that CCD mutations abort RyR1 function, the molecular basis of RyR1 dysfunction remains largely unknown because of the lack of atomic-level structure. Here, we present a structural model of the RyR1 pore region, where many of the CCD mutations are located. Molecular dynamics simulations of the pore region confirm the positions of residues experimentally known to be relevant for function. Furthermore, electrophysiological experiments and simulations shed light on the loss of function of CCD mutant channels. The combined theoretical and experimental studies on RyR1 elucidate the ion conduction pathway of RyR1 and a potential molecular origin of muscle diseases.
Over the past three decades the protein folding field has undergone monumental changes. Originally a purely academic question, how a protein folds has now become vital in understanding diseases and our abilities to rationally manipulate cellular life by engineering protein folding pathways. We review and contrast past and recent developments in the protein folding field. Specifically, we discuss the progress in our understanding of protein folding thermodynamics and kinetics, the properties of evasive intermediates, and unfolded states. We also discuss how some abnormalities in protein folding lead to protein aggregation and human diseases.
Cysteine-rich intestinal protein 1 (CRIP1) has been identified as a novel marker for early detection of cancers. Here we report on the use of phage display in combination with molecular modeling to identify a high-affinity ligand for CRIP1. Panning experiments using a circularized C7C phage library yielded several consensus sequences with modest binding affinities to purified CRIP1. Two sequence motifs, A1 and B5, having the highest affinities for CRIP1, were chosen for further study. With peptide structure information and the NMR structure of CRIP1, the higher-affinity A1 peptide was computationally redesigned, yielding a novel peptide, A1M, whose affinity was predicted to be much improved. Synthesis of the peptide and saturation and competitive binding studies demonstrated approximately a 10–28-fold improvement in the affinity of A1M compared to that of either A1 or B5 peptide. These techniques have broad application to the design of novel ligand peptides.
Breast cancer is one of the most frequently diagnosed malignancies in American females and is the second leading cause of cancer deaths in women. Several improvements in diagnostic protocols have enhanced our ability for earlier detection of breast cancer, resulting in improvement of therapeutic outcome and an increased survival rate for breast cancer patients. However, current early screening techniques are neither comprehensive nor infallible. Imaging techniques that improve breast cancer detection, localization, and evaluation of therapy are essential in combating the disease. Cysteine-rich intestinal protein 1 (CRIP1) has been identified as a novel marker for early detection of breast cancers. Here, we report the use of phage display and computational molecular modeling to identify a high-affinity ligand for CRIP1. Phage display panning experiments initially identified consensus peptide sequences with modest binding affinity to purified CRIP1. Using ab initio modeling of binding peptide structures, computational docking, and recently developed free energy estimation protocols, we redesigned the peptides to increase their affinity for CRIP1. Synthesis of the redesigned peptide and binding studies demonstrated approximately a 10–28-fold improvement in the binding affinity. The combination of computational and experimental techniques in this study demonstrates a potentially powerful tool in modulating protein–protein interactions.
The absence of a functional ATP Binding Cassette (ABC) protein called the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) from apical membranes of epithelial cells is responsible for cystic fibrosis (CF). Over 90% of CF patients carry at least one mutant allele with deletion of phenylalanine at position 508 located in the N-terminal nucleotide binding domain (NBD1). Biochemical and cell biological studies show that the ΔF508 mutant exhibits inefficient biosynthetic maturation and susceptibility to degradation probably due to misfolding of NBD1 and the resultant misassembly of other domains. However, little is known about the direct effect of the Phe508 deletion on the NBD1 folding, which is essential for rational design strategies of cystic fibrosis treatment. Here we show that the deletion of Phe508 alters the folding dynamics and kinetics of NBD1, thus possibly affecting the assembly of the complete CFTR. Using molecular dynamics simulations, we find that meta-stable intermediate states appearing on wild type and mutant folding pathways are populated differently and that their kinetic accessibilities are distinct. The structural basis of the increased misfolding propensity of the ΔF508 NBD1 mutant is the perturbation of interactions in residue pairs Q493/P574 and F575/F578 found in loop S7-H6. As a proof-of-principle that the S7-H6 loop conformation can modulate the folding kinetics of NBD1, we virtually design rescue mutations in the identified critical interactions to force the S7-H6 loop into the wild type conformation. Two redesigned NBD1-ΔF508 variants exhibited significantly higher folding probabilities than the original NBD1-ΔF508, thereby partially rescuing folding ability of the NBD1-ΔF508 mutant. We propose that these observed defects in folding kinetics of mutant NBD1 may also be modulated by structures separate from the 508 site. The identified structural determinants of increased misfolding propensity of NBD1-ΔF508 are essential information in correcting this pathogenic mutant.
Deletion of a single residue, phenylalanine at position 508, in the first nucleotide binding domain (NBD1) of the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) is present in approximately 90% of cystic fibrosis (CF) patients. Experiments show that this mutant protein exhibits inefficient biosynthetic maturation and susceptibility to degradation probably due to misfolding of NBD1 and the resultant incorrect interactions of other domains. However, little is known about the direct effect of the Phe508 deletion on NBD1 folding. Here, using molecular dynamics simulations of NBD1-WT, NBD1-F508A, and NBD1-ΔF508, we show that the deletion of Phe508 indeed alters the kinetics of NBD1 folding. We also find that the intermediate states appearing on wild type and mutant folding pathways are populated differently and that their kinetic accessibilities are distinct. Moreover, we identified critical interactions not necessarily localized near position 508, such as Q493/P574 and F575/F587, to be significant structural elements influencing the kinetic difference between wild type and mutant NBD1. We propose that these observed alterations in folding kinetics of mutant NBD1 result in misassembly of the whole multi-domain protein, thereby causing its premature degradation.