Predicting the phenotypes of missense mutations uncovered by large-scale sequencing projects is an important goal in computational biology. High-confidence predictions can be an aid in focusing experimental and association studies on those mutations most likely to be associated with causative relationships between mutation and disease. As an aid in developing these methods further, we have derived a set of random mutations of the enzymatic domains of human cystathionine beta synthase. This enzyme is a dimeric protein that catalyzes the condensation of serine and homocysteine to produce cystathionine. Yeast missing this enzyme cannot grow on medium lacking a source of cysteine, while transfection of functional human CBS into yeast strains missing endogenous enzyme can successfully complement for the missing gene. We used PCR mutagenesis with error-prone Taq polymerase to produce 948 colonies, and compared cell growth in the presence or absence of a cysteine source as a measure of CBS function. We were able to infer the phenotypes of 204 single-site mutants, 79 of them deleterious and 125 neutral. This set was used to test the accuracy of six publicly available prediction methods for phenotype prediction of missense mutations: SIFT, PolyPhen, PMut, SNPs3D, PhD-SNP, and nsSNPAnalyzer. The top methods are PolyPhen, SIFT, and nsSNPAnalyzer, which have similar performance. Using kernel discriminant functions, we found that the difference in position-specific scoring matrix values is more predictive than the wild-type PSSM score alone, and that the relative surface area in the biologically relevant complex is more predictive than that of the monomeric proteins.
mutations; phenotype prediction; cystathionine beta synthase
Agents targeting EGFR and related ErbB family proteins are valuable therapies for the treatment of many cancers. For some tumor types, including squamous cell carcinomas of the head and neck (SCCHN), antibodies targeting EGFR were the first protein-directed agents to show clinical benefit, and remain a standard component of clinical strategies for management of the disease. Nevertheless, many patients display either intrinsic or acquired resistance to these drugs; hence, major research goals are to better understand the underlying causes of resistance, and to develop new therapeutic strategies that boost the impact of EGFR/ErbB inhibitors. In this review, we first summarize current standard use of EGFR inhibitors in the context of SCCHN, and described new agents targeting EGFR currently moving through pre-clinical and clinical development. We then discuss how changes in other transmembrane receptors, including IGF1R, c-Met, and TGF-β, can confer resistance to EGFR-targeted inhibitors, and discuss new agents targeting these proteins. Moving downstream, we discuss critical EGFR-dependent effectors, including PLC-γ; PI3K and PTEN; SHC, GRB2, and RAS and the STAT proteins, as factors in resistance to EGFR-directed inhibitors and as alternative targets of therapeutic inhibition. We summarize alternative sources of resistance among cellular changes that target EGFR itself, through regulation of ligand availability, post-translational modification of EGFR, availability of EGFR partners for hetero-dimerization and control of EGFR intracellular trafficking for recycling versus degradation. Finally, we discuss new strategies to identify effective therapeutic combinations involving EGFR-targeted inhibitors, in the context of new system level data becoming available for analysis of individual tumors.
PLC-γ; PI3K; PTEN; SHC; GRB2; RAS; STAT; IGFR; c-MET
Rotamer libraries are used in protein structure determination, structure prediction, and design. The backbone-dependent rotamer library consists of rotamer frequencies and their mean dihedral angles and variances as a function of the backbone dihedral angles ϕ and ψ. Previous versions of this rotamer library were not developed with smoothness in mind, although some structure prediction and protein design methods would strongly benefit from smoothing. A new version of the backbone-dependent rotamer library has been developed using adaptive kernel density estimates for the rotamer frequencies and adaptive kernel regression for the mean dihedral angles and variances. The formulation presented allows for evaluation of the rotamer probabilities, mean angles and variances at any ϕ, ψ point, i.e. as a continuous function of ϕ and ψ. Continuous probability density estimates for the non-rotameric degrees of freedom of amides, carboxylates, and aromatic side chains have been modeled as a function of the backbone dihedral angles and rotamers of the remaining degrees of freedom. New backbone-dependent rotamer libraries at varying levels of smoothing are available from http://dunbrack.fccc.edu.
Previous analyses of the complementarity determining regions (CDRs) of antibodies have focused on a small number of “canonical” conformations for each loop. This is primarily the result of the work of Chothia and colleagues, most recently in 1997. Because of the widespread utility of antibodies, we have revisited the clustering of conformations of the six CDR loops with the much larger amount of structural information currently available. In this work, we were careful to use a high-quality data set by eliminating low-resolution structures and CDRs with high B-factors or high conformational energies. We used a distance function based on directional statistics and an effective clustering algorithm using affinity propagation. With this data set of over 300 non-redundant antibody structures, we were able to cover 28 CDR-length combinations (e.g., L1 length 11, or “L1-11” in our nomenclature) for L1, L2, L3, H1 and H2. The Chothia analysis covered only 20 CDR-lengths. Only four of these had more than one conformational cluster, of which two could easily be distinguished by gene source (mouse/human; κ/λ) and one purely by the presence and positions of Pro residues (L3-9). Thus using the Chothia analysis does not require the complicated set of “structure-determining residues” that is often assumed. Of our 28 CDR-lengths, 15 of them have multiple conformational clusters including ten for which Chothia had only one canonical class. We have a total of 72 clusters for the non-H3 CDRs; approximately 85% of the non-H3 sequences can be assigned to a conformational cluster based on gene source and/or sequence. We found that earlier predictions of “bulged” vs. “non-bulged” conformations based on the presence or absence of anchor residues Arg/Lys94 and Asp101 of H3 have not held up, since all four combinations lead to a majority of conformations that are bulged. Thus the earlier analyses have been significantly enhanced by the increased data. We believe the new classification will lead to improved methods for antibody structure prediction and design.
antibody structure; canonical loop conformations; affinity propagation
Foldamers present a particularly difficult challenge for accurate computational design compared to the case for conventional peptide and protein design due to the lack of a large body of structural data to allow parameterization of rotamer libraries and energies. We therefore explored the use of molecular mechanics for constructing rotamer libraries for non-natural foldamer backbones. We first evaluated the accuracy of molecular mechanics (MM) for the prediction of rotamer probability distributions in the crystal structures of proteins is explored. The van der Waals radius, dielectric constant and effective Boltzmann temperature were systematically varied to maximize agreement with experimental data. Boltzmann-weighted probabilities from these molecular mechanics energies compare well with database-derived probabilities for both an idealized α-helix (R = 0.95) as well as β-strand conformations (R = 0.92). Based on these parameters, de novo rotamer probabilities for secondary structures of peptides built from β-amino acids were determined. To limit computational complexity, it is useful to establish a residue-specific criterion for excluding rare, high-energy rotamers from the library. This is accomplished by including only those rotamers with probability greater than a given threshold (e.g. 10%) of the random value, defined as 1/n where n is the number of potential rotamers for each residue type.
Protein intrinsic disorder is becoming increasingly recognized in proteomics research. While lacking structure, many regions of disorder have been associated with biological function. There are many different experimental methods for characterizing intrinsically disordered proteins and regions; nevertheless, the prediction of intrinsic disorder from amino acid sequence remains a useful strategy especially for many large-scale proteomics investigations. Here we introduced a consensus artificial neural network (ANN) prediction method, which was developed by combining the outputs of several individual disorder predictors. By eight-fold cross-validation, this meta-predictor, called PONDR-FIT, was found to improve the prediction accuracy over a range of 3 to 20% with an average of 11% compared to the single predictors, depending on the datasets being used. Analysis of the errors shows that the worst accuracy still occurs for short disordered regions with less than ten residues, as well as for the residues close to order/disorder boundaries. Increased understanding of the underlying mechanism by which such meta-predictors give improved predictions will likely promote the further development of protein disorder predictors. The access to PONDR-FIT is available at www.disprot.org.
natively unfolded; intrinsically unstructured; intrinsically disordered; highly flexible; highly dynamic; structurally disordered; predictor; PONDR
Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set.
Protein structure comparison is important for understanding the evolutionary relationships among proteins, predicting protein functions, and predicting protein structures. Despite its importance, there have been no rigorous mathematical or statistical frameworks for protein structure comparison. One notable issue in this field is that with many different similarity measures used in comparing protein structures, none of them are proper distances when protein structures of different sequences are compared. In this study, we develop a mathematical framework for protein structure comparison by treating protein structures as three dimensional curves. A formal distance, geodesic distance, can be computed for any two protein structures. In this framework, population-specific variations within protein families can be characterized through building probability distributions for structures of protein families. The mean and covariance computed from groups of protein structures can also help to improve the classifications of protein structures. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions.
In homology modeling of protein structures, it is typical to find templates through a sequence search against a database of proteins with known structures. In more complicated modeling cases, such as modeling a protein structure in contact with a ligand, sequence information itself may not be enough and more biological information is required for a successful modeling process. SCOP and PFAM are two databases providing protein domain information which can be utilized in complex protein structure modeling. However, due to the manually-curated nature of both databases, they fail to provide timely coverage of protein sequences existing in the Protein Data Bank (PDB). In this paper, we introduce a new relational database, IDOPS, which integrates sequence and biological information extracted from remediated PDB files and protein domain information generated with HMM profiles of PFAM families. With a carefully designed protocol, this database is updated regularly and the coverage rate of PDB entries is guaranteed to be high.
Determination of side-chain conformations is an important step in protein structure prediction and protein design. Many such methods have been presented, although only a small number are in widespread use. SCWRL is one such method, and the SCWRL3 program (2003) has remained popular due to its speed, accuracy, and ease-of-use for the purpose of homology modeling. However, higher accuracy at comparable speed is desirable. This has been achieved through: 1) a new backbone-dependent rotamer library based on kernel density estimates; 2) averaging over samples of conformations about the positions in the rotamer library; 3) a fast anisotropic hydrogen bonding function; 4) a short-range, soft van der Waals atom-atom interaction potential; 5) fast collision detection using k-discrete oriented polytopes; 6) a tree decomposition algorithm to solve the combinatorial problem; and 7) optimization of all parameters by determining the interaction graph within the crystal environment using symmetry operators of the crystallographic space group. Accuracies as a function of electron density of the side chains demonstrate that side chains with higher electron density are easier to predict than those with low electron density and presumed conformational disorder. For a testing set of 379 proteins, 86% of χ1 angles and 75% of χ1+2 are predicted correctly within 40° of the X-ray positions. Among side chains with higher electron density (25th–100th percentile), these numbers rise to 89% and 80%. The new program maintains its simple command-line interface, designed for homology modeling, and is now available as a dynamic-linked library for incorporation into other software programs.
homology modeling; side-chain prediction; protein structure; rotamer library; graph decomposition; SCWRL
The protein common interface database (ProtCID) is a database that contains clusters of similar homodimeric and heterodimeric interfaces observed in multiple crystal forms (CFs). Such interfaces, especially of homologous but non-identical proteins, have been associated with biologically relevant interactions. In ProtCID, protein chains in the protein data bank (PDB) are grouped based on their PFAM domain architectures. For a single PFAM architecture, all the dimers present in each CF are constructed and compared with those in other CFs that contain the same domain architecture. Interfaces occurring in two or more CFs comprise an interface cluster in the database. The same process is used to compare heterodimers of chains with different domain architectures. By examining interfaces that are shared by many homologous proteins in different CFs, we find that the PDB and the Protein Interfaces, Surfaces, and Assemblies (PISA) are not always consistent in their annotations of biological assemblies in a homologous family. Our data therefore provide an independent check on publicly available annotations of the structures of biological interactions for PDB entries. Common interfaces may also be useful in studies of protein evolution. Coordinates for all interfaces in a cluster are downloadable for further analysis. ProtCiD is available at http://dunbrack2.fccc.edu/protcid.
Protein structure determination and predictive modeling have long been guided by the paradigm that the peptide backbone has a single, context-independent ideal geometry. Both quantum-mechanics calculations and empirical analyses have shown this is an incorrect simplification in that backbone covalent geometry actually varies systematically as a function of the Φ and Ψ backbone dihedral angles. Here, we use a nonredundant set of ultrahigh-resolution protein structures to define these conformation-dependent variations. The trends have a rational, structural basis that can be explained by avoidance of atomic clashes or optimization of favorable electrostatic interactions. To facilitate adoption of this new paradigm, we have created a conformation-dependent library of covalent bond lengths and bond angles and shown that it has improved accuracy over existing methods without any additional variables to optimize. Protein structures derived both from crystallographic refinement and predictive modeling both stand to benefit from incorporation of the new paradigm.
Comparison of elastic network model predictions with experimental data has provided important insights on the dominant role of the network of inter-residue contacts in defining the global dynamics of proteins. Most of these studies have focused on interpreting the mean-square fluctuations of residues, or deriving the most collective, or softest, modes of motions that are known to be insensitive to structural and energetic details. However, with increasing structural data, we are in a position to perform a more critical assessment of the structure-dynamics relations in proteins, and gain a deeper understanding of the major determinants of not only the mean-square fluctuations and lowest frequency modes, but the covariance or the cross-correlations between residue fluctuations and the shapes of higher modes. A systematic study of a large set of NMR-determined proteins is analyzed using a novel method based on entropy maximization to demonstrate that the next level of refinement in the elastic network model description of proteins ought to take into consideration properties such as contact order (or sequential separation between contacting residues) and the secondary structure types of the interacting residues, whereas the types of amino acids do not play a critical role. Most importantly, an optimal description of observed cross-correlations requires the inclusion of destabilizing, as opposed to exclusively stabilizing, interactions, stipulating the functional significance of local frustration in imparting native-like dynamics. This study provides us with a deeper understanding of the structural basis of experimentally observed behavior, and opens the way to the development of more accurate models for exploring protein dynamics.
As more protein structures are solved, we are able to perform a more critical assessment of the relationship between protein structure and dynamics, and to gain a deeper understanding of the major determinants of structural dynamics. Here we perform a systematic study on a set of proteins structurally determined by NMR spectroscopy. The dynamics are analyzed using elastic network models and a novel method based on entropy maximization to demonstrate that properties such as contact order and secondary structure do play a role in defining the experimentally observed covariance data. Most importantly, an optimal description of observed cross-correlations requires the inclusion of destabilizing, as well as stabilizing, interactions, stipulating the functional significance of local frustration in imparting native-like dynamics.
Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.
The three-dimensional structure of a protein enables it to perform its specific function, which may be catalysis, DNA binding, cell signaling, maintaining cell shape and structure, or one of many other functions. Predicting the structures of proteins is an important goal of computational biology. One way of doing this is to figure out the rules that determine protein structure from protein sequences by determining how local protein sequence is associated with local protein structure. That is, many (but not all) of the interactions that determine protein structure occur between amino acids that are a short distance away from each other in the sequence. This is particularly true in the irregular parts of protein structure, often called loops. In this work, we have performed a statistical analysis of the structure of the protein backbone in loops as a function of the protein sequence. We have determined how an amino acid bends the local backbone due to its amino acid type and the amino acid types of its neighbors. We used a recently developed statistical method that is particularly suited to this problem. The analysis shows that backbone conformation prediction can be improved using the information in the statistical distributions we have developed.
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
Many proteins function as homooligomers and are regulated via their oligomeric state. For some proteins, the stoichiometry of homooligomeric states under various conditions has been studied using gel filtration or analytical ultracentrifugation experiments. The interfaces involved in these assemblies may be identified using crosslinking and mass spectrometry, solution-state NMR, and other experiments. But for most proteins, the actual interfaces that are involved in oligomerization are inferred from X-ray crystallographic structures using assumptions about interface surface areas and physical properties. Examination of interfaces across different PDB entries in a protein family reveals several important features. First, similarity of space group, asymmetric unit size, and cell dimensions and angles (within 1%) does not guarantee that two crystals are actually the same crystal form, that is containing similar relative orientations and interactions within the crystal. Conversely, two crystals in different space groups may be quite similar in terms of all of the interfaces within each crystal. Second, NMR structures and an existing benchmark of PDB crystallographic entries consisting of 126 dimers and larger structures and 132 monomers was used to determine whether the existence or lack of existence of common interfaces across multiple crystal forms can be used to predict whether a protein is an oligomer or not. Monomeric proteins tend to have common interfaces across only a minority of crystal forms, while higher order structures exhibit common interfaces across a majority of available crystal forms. The data can be used to estimate the probability that an interface is biological if two or more crystal forms are available. Finally, the PISA database available from the EBI is more consistent in identifying interfaces observed in many crystal forms than is the PDB or EBI’s Protein Quaternary Server (PQS). The PDB in particular is missing highly likely biological interfaces in its biological unit files for about 10% of PDB entries.
Cytosolic sulfotransferases catalyze the sulfonation of hormones, metabolites, and xenobiotics. Many of these proteins have been shown to form homo- and heterodimers. An unusually small dimer interface was previously identified by Petrotchenko et al. (FEBS Lett 490, 39-43, 2001) by crosslinking, protease digestion, and mass spectrometry, and verified by site-directed mutagenesis. Analysis of the crystal packing interfaces in all 28 available crystal structures consisting of 17 crystal forms shows that this interface occurs in all of them. With a small number of exceptions, the publicly available databases of biological assemblies contain either monomers or incorrect dimers. Even crystal structures of mouse SULT1E1, which is a monomer in solution, contain the common dimeric interface, although distorted and missing two important salt bridges.
SCWRL and MolIDE are software applications for prediction of protein structures. SCWRL is designed specifically for the task of prediction of side-chain conformations given a fixed backbone usually obtained from an experimental structure determined by X-ray crystallography or NMR. SCWRL is a command-line program that typically runs in a few seconds. MolIDE provides a graphical interface for basic comparative (homology) modeling using SCWRL and other programs. MolIDE takes an input target sequence, and uses PSI-BLAST to identify and align templates for comparative modeling of the target. The sequence alignment to any template can be manually modified within a graphical window of the target-template alignment and visualization of the alignment on the template structure. MolIDE builds the model of the target structure based on the template backbone, predicted side-chain conformations with SCWRL, and a loop-modeling program for insertion-deletion regions with user-selected sequence segments. SCWRL and MolIDE can be obtained at http://dunbrack.fccc.edu/Software.php.
Computational methods; Protein structure prediction; Comparative (homology) modeling
Genotoxic stress triggers a rapid translocation of p53 to the mitochondria, contributing to apoptosis in a transcription-independent manner. Using immuno-purification protocols and mass spectrometry we previously identified the pro-apoptotic protein BAK as a mitochondrial p53-binding protein, and showed that recombinant p53 directly binds to BAK and can induce its oligomerization, leading to cytochrome C release. In the present work we describe a combination of molecular modeling, electrostatic analysis and site-directed mutagenesis to define contact residues between BAK and p53. Our data indicate that three regions within the core DNA binding domain of p53 make contact with BAK: these are the conserved H2 α-helix and the L1 and L3 loop. Notably, point mutations in these regions markedly impair the ability of p53 to oligomerize BAK, and to induce transcription-independent cell death. We present a model whereby positively charged residues within the H2 helix and L1 loop of p53 interact with an electronegative domain on the N-terminal α-helix of BAK; the latter is known to undergo conformational changes upon BAK activation. We show that mutation of acidic residues in the N-terminal helix impair the ability of BAK to bind to p53. Interestingly, many of the p53 contact residues predicted by our model are also direct DNA contact residues, suggesting that p53 interacts with BAK in a manner analogous to DNA. The combined data point to the H2 helix and L1 and L3 loops of p53 as novel functional domains contributing to transcription-independent apoptosis by this tumor suppressor protein.
PISCES is a database server for producing lists of sequences from the Protein Data Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence identity. Our goal in culling the PDB is to provide the longest list possible of the highest resolution structures that fulfill the sequence identity and structural quality cut-offs. The new PISCES server uses a combination of PSI-BLAST and structure-based alignments to determine sequence identities. Structure alignment produces more complete alignments and therefore more accurate sequence identities than PSI-BLAST. PISCES now allows a user to cull the PDB by-entry in addition to the standard culling by individual chains. In this scenario, a list will contain only entries that do not have a chain that has a sequence identity to any chain in any other entry in the list over the sequence identity cut-off. PISCES also provides fully annotated sequences including gene name and species. The server allows a user to cull an input list of entries or chains, so that other criteria, such as function, can be used. Results from a search on the re-engineered RCSB's site for the PDB can be entered into the PISCES server by a single click, combining the powerful searching abilities of the PDB with PISCES's utilities for sequence culling. The server's data are updated weekly. The server is available at .
Oncogenic hyperactivation of the mitotic kinase Aurora-A (AurA) in cancer is associated with genomic instability. Increasing evidence indicates that AurA also regulates critical processes in normal interphase cells, but the source of such activity has been obscure. We report here that multiple stimuli causing release of Ca2+ from intracellular endoplasmic reticulum stores rapidly and transiently activate AurA, without requirement for second messengers. This activation is mediated by direct Ca2+-dependent calmodulin (CaM) binding to multiple motifs on AurA. On the basis of structure–function analysis and molecular modelling, we map two primary regions of CaM-AurA interaction to unfolded sequences in the AurA N- and C-termini. This unexpected mechanism for AurA activation provides a new context for evaluating the function of AurA and its inhibitors in normal and cancerous cells.
Aurora-A kinase localizes to centrosomes, is involved in the progression through mitosis and is overexpressed in certain cancers. Here, calcium is shown to induce Aurora-A auto-phosphorylation in a calmodulin-dependent manner, suggesting a novel role for Aurora-A in non-mitotic cells.
Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures.
An N-glycan is a carbohydrate chain covalently linked to the side chain of asparagine. Due to the flexibility of carbohydrate chains, it is believed that the N-glycan chains would not have a well-defined structure. However, our survey of N-glycan structures in the PDB shows that the N-glycan structures found on the surfaces of homologous glycoproteins are significantly conserved. This suggests that the interaction between the carbohydrate and the protein structure around the glycan chain plays an important role in determining the N-glycan structure. While the global N-glycan structures found on the surfaces of non-homologous glycoproteins are not conserved, the conformations of the carbohydrate residues that are closer to the protein appear to be more conserved. Our analysis highlights the applicability of template-based approaches used in protein structure prediction to structure prediction and modeling of N-glycans of glycoproteins.
Troponin C (TnC) is implicated in the initiation of myocyte contraction via binding of cytosolic and subsequent recognition of the Troponin I switch peptide. Mutations of the cardiac TnC N-terminal regulatory domain have been shown to alter both calcium binding and myofilament force generation. We have performed molecular dynamics simulations of engineered TnC variants that increase or decrease sensitivity, in order to understand the structural basis of their impact on TnC function. We will use the distinction for mutants that are associated with increased affinity and for those mutants with reduced affinity. Our studies demonstrate that for GOF mutants V44Q and L48Q, the structure of the physiologically-active site II binding site in the -free (apo) state closely resembled the -bound (holo) state. In contrast, site II is very labile for LOF mutants E40A and V79Q in the apo form and bears little resemblance with the holo conformation. We hypothesize that these phenomena contribute to the increased association rate, , for the GOF mutants relative to LOF. Furthermore, we observe significant positive and negative positional correlations between helices in the GOF holo mutants that are not found in the LOF mutants. We anticipate these correlations may contribute either directly to affinity or indirectly through TnI association. Our observations based on the structure and dynamics of mutant TnC provide rationale for binding trends observed in GOF and LOF mutants and will guide the development of inotropic drugs that target TnC.
Muscle cells contract using a network of thread-like protein assemblies called myofilaments. Contraction is preceded by a signal that causes calcium to rush into the cell cytosol, where it can freely diffuse to and bind the myofilament proteins. Troponin C, a calcium sensor located on the thin filament, initiates and regulates the cascade of changes resulting in the generation of force by the thin and thick filaments comprising the myofilament lattice. In heart tissue, pathological conditions known as dilated and hypertrophic cardiomyopathies (DCM and HCM, respectively) are in part associated with abnormalities in the ability of the myofilaments to generate force at normal calcium concentrations. Manipulation of Troponin C calcium-binding through protein engineering and pharmaceutical intervention has thus attracted considerable attention as a therapeutic strategy for ameliorating these cardiac defects. In this study, we uncover a molecular basis of altered calcium handling for several engineered Troponin C variants, which provides further insight into tuning its control of myofilament contraction.
Gene expression signatures that are predictive of therapeutic response or prognosis are increasingly useful in clinical care; however, mechanistic (and intuitive) interpretation of expression arrays remains an unmet challenge. Additionally, there is surprisingly little gene overlap among distinct clinically validated expression signatures. These “causality challenges” hinder the adoption of signatures as compared to functionally well-characterized single gene biomarkers. To increase the utility of multi-gene signatures in survival studies, we developed a novel approach to generate “personal mechanism signatures” of molecular pathways and functions from gene expression arrays. FAIME, the Functional Analysis of Individual Microarray Expression, computes mechanism scores using rank-weighted gene expression of an individual sample. By comparing head and neck squamous cell carcinoma (HNSCC) samples with non-tumor control tissues, the precision and recall of deregulated FAIME-derived mechanisms of pathways and molecular functions are comparable to those produced by conventional cohort-wide methods (e.g. GSEA). The overlap of “Oncogenic FAIME Features of HNSCC” (statistically significant and differentially regulated FAIME-derived genesets representing GO functions or KEGG pathways derived from HNSCC tissue) among three distinct HNSCC datasets (pathways:46%, p<0.001) is more significant than the gene overlap (genes:4%). These Oncogenic FAIME Features of HNSCC can accurately discriminate tumors from control tissues in two additional HNSCC datasets (n = 35 and 91, F-accuracy = 100% and 97%, empirical p<0.001, area under the receiver operating characteristic curves = 99% and 92%), and stratify recurrence-free survival in patients from two independent studies (p = 0.0018 and p = 0.032, log-rank). Previous approaches depending on group assignment of individual samples before selecting features or learning a classifier are limited by design to discrete-class prediction. In contrast, FAIME calculates mechanism profiles for individual patients without requiring group assignment in validation sets. FAIME is more amenable for clinical deployment since it translates the gene-level measurements of each given sample into pathways and molecular function profiles that can be applied to analyze continuous phenotypes in clinical outcome studies (e.g. survival time, tumor volume).
Clinical utilization of multi-gene expression signatures that are predictive of therapeutic response has been steadily increasing, however, interpretation of such results remains challenging because multi-gene signatures, generated from analyzing different patient cohorts, tend to be equally predictive but contain minimal overlap. Whereas pathway-level analyses of expression arrays show promise for generating clinically meaningful mechanistic signatures, current approaches do not permit single-patient based analyses that are independent of cross-group calculations. To bridge the gap between deterministic biological mechanisms of single-gene biomarkers and the statistical predictive power of multi-gene signatures that are disconnected from mechanisms, we developed FAIME, a novel method that transforms microarray gene expression data into individualized patient profiles of molecular mechanisms. We have validated its capability for predicting clinical outcomes, including cancer patient samples derived from six different clinical trial cohorts of head and neck cancers. This method provides opportunities to harness an untapped resource for personal genomics: clinical evaluation and testing of individually interpretable mechanistic profiles derived from gene expression arrays.
Cholesteryl ester transfer protein (CETP) transports cholesteryl esters, triglycerides, and phospholipids between different lipoprotein fractions in blood plasma. The inhibition of CETP has been shown to be a sound strategy to prevent and treat the development of coronary heart disease. We employed molecular dynamics simulations to unravel the mechanisms associated with the CETP-mediated lipid exchange. To this end we used both atomistic and coarse-grained models whose results were consistent with each other. We found CETP to bind to the surface of high density lipoprotein (HDL) -like lipid droplets through its charged and tryptophan residues. Upon binding, CETP rapidly (in about 10 ns) induced the formation of a small hydrophobic patch to the phospholipid surface of the droplet, opening a route from the core of the lipid droplet to the binding pocket of CETP. This was followed by a conformational change of helix X of CETP to an open state, in which we found the accessibility of cholesteryl esters to the C-terminal tunnel opening of CETP to increase. Furthermore, in the absence of helix X, cholesteryl esters rapidly diffused into CETP through the C-terminal opening. The results provide compelling evidence that helix X acts as a lid which conducts lipid exchange by alternating the open and closed states. The findings have potential for the design of novel molecular agents to inhibit the activity of CETP.
Coronary heart disease is a major cause of death in the Western societies. One of the most promising interventions to prevent and slow down the progress of coronary heart disease is the elevation of high density lipoprotein (HDL) levels in circulation. Animal models together with early clinical studies have shown that the inhibition of cholesteryl ester transfer protein (CETP) is a promising strategy to achieve higher HDL levels. However, drugs with acceptable side-effects for CETP-inhibition do not yet exist, although the next generation CETP inhibitor (anacetrapib) has great potential in this regard. In this study, our objective is to gain more detailed information regarding the interactions of CETP with lipoprotein particles. We show how the CETP-lipoprotein complex is formed and how lipid exchange between CETP and lipoprotein particles takes place. Our findings help to understand in a mechanistic way how CETP-mediated lipid exchange occurs and how it could be exploited in the design of new and more efficient molecular agents against coronary heart disease.