Motivation: Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state.
Results: We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven ‘recovery’ functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein–protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures.
Availability and implementation: SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Supplementary data are available at Bioinformatics online.
A substantial challenge for genomic enzymology is the reliable annotation for proteins of unknown function. Described here is an interrogation of uncharacterized enzymes from the amidohydrolase superfamily using a structure-guided approach that integrates bioinformatics, computational biology and molecular enzymology. Previously, Tm0936 from Thermotoga maritima was shown to catalyze the deamination of S-adenosylhomocysteine (SAH) to Sinosylhomocysteine (SIH). Homologues of Tm0936 homologues were identified, and substrate profiles were proposed by docking metabolites to modeled enzyme structures. These enzymes were predicted to deaminate analogues of adenosine including SAH, 5’-methylthioadenosine (MTA), adenosine (Ado), and 5’-deoxyadenosine (5’-dAdo). Fifteen of these proteins were purified to homogeneity and the three-dimensional structures of three proteins were determined by X-ray diffraction methods. Enzyme assays supported the structure-based predictions and identified subgroups of enzymes with the capacity to deaminate various combinations of the adenosine analogues, including the first enzyme (Dvu1825) capable of deaminating 5’-dAdo. One subgroup of proteins, exemplified by Moth1224 from Moorella thermoacetica, deaminates guanine to xanthine and another subgroup, exemplified by Avi5431 from Agrobacterium vitis S4, deaminates two oxidatively damaged forms of adenine: 2-oxoadenine and 8-oxoadenine. The sequence and structural basis of the observed substrate specificities was proposed and the substrate profiles for 834 protein sequences were provisionally annotated. The results highlight the power of a multidisciplinary approach for annotating enzymes of unknown function.
Proteins of unknown function belonging to cog1816 and cog0402 were characterized. Sav2595 from Steptomyces avermitilis MA-4680, Acel0264 from Acidothermus cellulolyticus 11B, Nis0429 from Nitratiruptor sp. SB155-2 and Dr0824 from Deinococcus radiodurans R1 were cloned, purified, and their substrate profiles determined. These enzymes were previously incorrectly annotated as adenosine deaminases or chlorohydrolases. It was shown here that these enzymes actually deaminate 6-aminodeoxyfutalosine. The deamination of 6-aminodeoxyfutalosine is part of an alternative menaquinone biosynthetic pathway that involves the formation of futalosine. 6-Aminodeoxyfutalosine is deaminated by these enzymes with catalytic efficiencies greater than 105 M−1 s−1, Km values of 0.9 to 6.0 μM and kcat values of 1.2 to 8.6 s−1. Adenosine, 2′-deoxyadenosine, thiomethyladenosine, and S-adenosylhomocysteine are deaminated at least an order of magnitude slower than 6-aminodeoxyfutalosine. The crystal structure of Nis0429 was determined and the substrate, 6-aminodeoxyfutalosine, was positioned in the active site, based on the presence of adventitiously bound benzoic acid. In this model Ser-145 interacts with the carboxylate moiety of the substrate. The structure of Dr0824 was also determined, but a collapsed active site pocket prevented docking of substrates. A computational model of Sav2595 was built based on the crystal structure of adenosine deaminase and substrates were docked. The model predicted a conserved arginine after β-strand 1 to be partially responsible for the substrate specificity of Sav2595.
Adenoid Cystic Carcinoma (ACC) is one of the most common malignancies to arise in human salivary glands, and also arises in glandular tissue of other organ systems. To address the paucity of experimental model systems for this tumor type, we have undertaken a program of transplanting tissue samples of human ACC into immunodeficient nu/nu mice to create xenograft model systems. In 17 of 23 attempts (74%) xenograft tumors were successfully grown. In all cases, the histologic appearance of the donating tumor was recapitulated in the subsequent xenograft. Characterization of a subset of xenograft models by immunohistochemical biomarkers and by RNA transcript microarray analysis showed good fidelity in the recapitulation of gene expression patterns in the xenograft tumors compared to the human donor tumors. Since ACC is known to frequently contain a t(6;9) translocation that fuses the MYB and NFIB genes, fluorescence in situ hybridization (FISH) of twelve ACC xenograft models was performed that assayed MYB locus break-apart and MYB-NFIB locus fusion. 11/12 (92%) xenograft models revealed MYB locus rearrangement and 10/12 (83%) xenograft models showed evidence of fusion of the MYB and NFIB loci. The two related xenograft models (derived from primary and metastatic tumors, respectively, of the same human subject) were karyotyped, showing a t(1;6) translocation, suggesting MYB translocation to a novel fusion partner gene. Overall, our results indicate that ACC is amenable to xenografting and that ACC xenograft models recapitulate the molecular and morphologic characteristics of human tumors, suggesting utility as valid experimental and preclinical model systems for this disease.
Adenoid cystic carcinoma; MYB; NFIB; oncogene fusion; xenograft
A metabonomic approach based on ultra performance liquid chromatography coupled with mass spectrometry (UPLC/MS) was used to study the hepatotoxicity of ricinine in rats. Potential biomarkers of ricinine toxicity and toxicological mechanism were analyzed by serum metabonomic method. The significant differences in the metabolic profiling of the control and treated rats were clear by using the principal components analysis (PCA) of the chromatographic data. Significant changes of metabolite biomarkers like phenylalanine, tryptophan, cholic acid, LPC and PC were detected in the serum. These biochemical changes were related to the metabolic disorders in amino acids and phospholipids. This research indicates that UPLC/MS-based metabonomic analysis of serum samples can be used to predict the hepatotoxicity and further understand the toxicological mechanism induced by ricinine. This work shows that metabonomics method is a valuable tool in drug mechanism study.
Of the over 22 million protein sequences in the nonredundant TrEMBL database, fewer than 1% have experimentally confirmed functions. Structure-based methods have been used to predict enzyme activities from experimentally determined structures; however, for the vast majority of proteins, no such structures are available. Here, homology models of a functionally uncharacterized amidohydrolase from Agrobacterium radiobacter K84 (Arad3529) were computed based on a remote template structure. The protein backbone of two loops near the active site was remodeled, resulting in four distinct active site conformations. Substrates of Arad3529 were predicted by docking of 57672 high-energy intermediate (HEI) forms of 6440 metabolites against these four homology models. Based on docking ranks and geometries, a set of modified pterins were suggested as candidate substrates for Arad3529. The predictions were tested by enzymology experiments, and Arad3529 deaminated many pterin metabolites (substrate, kcat/Km [M−1s−1]): formylpterin, 5.2 × 106; pterin-6-carboxylate, 4.0 × 106; pterin-7-carboxylate, 3.7 × 106; pterin, 3.3 × 106; hydroxymethylpterin, 1.2 × 106; biopterin, 1.0 × 106; D-(+)-neopterin, 3.1 × 105; isoxanthopterin, 2.8 × 105; sepiapterin, 1.3 × 105; folate, 1.3 × 105, xanthopterin, 1.17 × 105; 7,8-dihydrohydroxymethylpterin, 3.3 × 104. While pterin is a ubiquitous oxidative product of folate degradation, genomic analysis suggests that the first step of an undescribed pterin degradation pathway is catalyzed by Arad3529. Homology model-based virtual screening, especially with modeling of protein backbone flexibility, may be broadly useful for enzyme function annotation and discovering new pathways and drug targets.
ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (http://salilab.org/allosmod), the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (http://salilab.org/allosmod-foxs), the FoXSDock server for protein–protein docking filtered by an SAXS profile (http://salilab.org/foxsdock), the SAXS Merge server for automatic merging of SAXS profiles (http://salilab.org/saxsmerge) and the Pose & Rank server for scoring protein–ligand complexes (http://salilab.org/poseandrank). In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.
Neuroinflammation has been recognized to play a critical role in the pathogenesis of Alzheimer's disease (AD), which is pathologically characterized by the accumulation of senile plaques containing activated microglia and amyloid β-peptides (Aβ). In the present study, we examined the neuroprotective effects of hydrogen sulfide (H2S) on neuroinflammation in rats with Aβ1-40 hippocampal injection. We found that Aβ-induced rats exhibited a disorder of pyramidal cell layer arrangement, and a decrease of mean pyramidal cell number in the CA1 hippocampal region compared with those in sham operated rats. NaHS (a donor of H2S, 5.6 mg/kg/d, i.p.) treatment for 3 weeks rescued neuronal cell death significantly. Moreover, we found that H2S dramatically suppressed the release of TNF-α, IL-1β and IL-6 in the hippocampus. Consistently, both immunohistochemistry and Western blotting assays showed that H2S inhibited the upregulation of COX-2 and the activation of NF-κB in the hippocampus. In conclusion, our data indicate that H2S suppresses neuroinflammation via inhibition of the NF-κB activation pathway in the Aβ-induced rat model and has potential value for AD therapy.
Alzheimer's disease; hydrogen sulfide; cyclooxygenase-2; nuclear factor-κB (NF-κB); amyloid
The von Hippel-Lindau (VHL) tumor suppressor gene product is the recognition component of an E3 ubiquitin ligase and is inactivated in patients with VHL disease and in most sporadic clear cell renal carcinomas (RCC). pVHL controls oxygen-responsive gene expression at the transcriptional and post-transcriptional levels. The vascular endothelial growth factor A (VEGFA) mRNA contains AU-rich elements (AREs) in the 3' untranslated region, and mRNA stability or decay is determined through ARE-associated RNA binding factors. We show here that levels of the ARE binding factor, AUF1, are regulated by pVHL and by hypoxia. pVHL and AUF1 stably associate with each other in cells and AUF1 is a ubiquitylation target of pVHL. AUF1 and another RNA binding protein, HuR, bind to VEGFA ARE RNA. Ribonucleoprotein (RNP)-immunoprecipitations showed that pVHL associates indirectly with VEGFA mRNA through AUF1 and/or HuR, and this complex is associated with VEGFA mRNA decay under normoxic conditions. Under hypoxic conditions pVHL is downregulated, while AUF1 and HuR binding to VEGF mRNA is maintained, and this complex is associated with stabilized mRNA. These studies suggest that AUF1 and HuR bind to VEGFA ARE RNA under both normoxic and hypoxic conditions, and that a pVHL-RNP complex determines VEGFA mRNA decay. These studies further implicate the ubiquitin-proteasome system in ARE-mediated RNA degradation.
von Hippel-Lindau; hypoxia; VEGF; AUF1; HuR; hnRNP
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site; and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes, but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF1) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScoreCSD and ITScore/SE, and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp/) and the LigScore web server (http://salilab.org/ligscore/).
statistical potential; reference state; binding pose; ligand enrichment
An enzyme of unknown function within the amidohydrolase superfamily was discovered to catalyze the hydrolysis of N-6-substituted adenine derivatives, several of which are cytokinins. Cytokinins are a common type of plant hormone and N-6-substituted adenines are also found as modifications to tRNA. Patl2390, from Pseudoalteromonas atlantica T6c, was shown to hydrolytically deaminate N-6-isopentenyladenine to hypoxanthine and isopentenylamine with a kcat/Km of 1.2 × 107 M−1 s−1. Additional substrates include N-6-benzyl adenine, cis- and trans-zeatin, kinetin, O-6-methylguanine, N-6-butyladenine, N-6-methyladenine, N,N-dimethyladenine, 6-methoxypurine, 6-chloropurine, and 6-thiomethylpurine. This enzyme does not catalyze the deamination of adenine or adenosine. A comparative model of Patl2390 was computed using the three-dimensional crystal structure of Pa0148 (PDB code: 3PAO) as a structural template and docking was used to refine the model to accommodate experimentally identified substrates. This is the first identification of an enzyme that will hydrolyze an N-6 substituted side chain larger than methylamine from adenine.
Virtual ligand screening uses computation to discover new ligands of a protein by screening one or more of its structural models against a database of potential ligands. Comparative protein structure modeling extends the applicability of virtual screening beyond the atomic structures determined by X-ray crystallography or NMR spectroscopy. Here, we describe an integrated modeling and docking protocol, combining comparative modeling by MODELLER and virtual ligand screening by DOCK.
comparative modeling; virtual screening; ligand docking
G-Protein coupled receptors (GPCRs) are intensely studied as drug targets and for their role in signaling. With the determination of the first crystal structures, interest in structure-based ligand discovery has increased. Unfortunately, most GPCRs lack experimental structures. The determination of the D3 receptor structure, and a community challenge to predict it, enabled a fully prospective comparison of ligand discovery from a modeled structure versus that of the subsequently released crystal structure. Over 3.3 million molecules were docked against a homology model, and 26 of the highest ranking were tested for binding. Six had affinities from 0.2 to 3.1μM. Subsequently, the crystal structure was released and the docking screen repeated. Of the 25 compounds selected, five had affinities from 0.3 to 3.0μM. One of the novel ligands from the homology model screen was optimized for affinity to 81nM. The feasibility of docking screens against modeled GPCRs more generally is considered.
Adenine deaminase (ADE) catalyzes the conversion of adenine to hypoxanthine and ammonia. The enzyme isolated from Escherichia coli using standard expression conditions was low for the deamination of adenine (kcat = 2.0 s−1; kcat/Km = 2.5 × 103 M−1 s−1). However, when iron was sequestered with a metal chelator and the growth medium was supplemented with Mn2+ prior to induction, the purified enzyme was substantially more active for the deamination of adenine with values of kcat and kcat/Km of 200 s−1 and 5 × 105 M−1s−1, respectively. The apo-enzyme was prepared and reconstituted with Fe2+, Zn2+, or Mn2+. In each case, two enzyme-equivalents of metal were necessary for reconstitution of the deaminase activity. This work provides the first example of any member within the deaminase sub-family of the amidohydrolase superfamily (AHS) to utilize a binuclear metal center for the catalysis of a deamination reaction. [FeII/FeII]-ADE was oxidized to [FeIII/FeIII]-ADE with ferricyanide with inactivation of the deaminase activity. Reducing [FeIII/FeIII]-ADE with dithionite restored the deaminase activity and thus the di-ferrous form of the enzyme is essential for catalytic activity. No evidence for spin-coupling between metal ions was evident by EPR or Mössbauer spectroscopies. The three-dimensional structure of adenine deaminase from Agrobacterium tumefaciens (Atu4426) was determined by X-ray crystallography at 2.2 Å resolution and adenine was modeled into the active site based on homology to other members of the amidohydrolase superfamily. Based on the model of the adenine-ADE complex and subsequent mutagenesis experiments, the roles for each of the highly conserved residues were proposed. Solvent isotope effects, pH rate profiles and solvent viscosity were utilized to propose a chemical reaction mechanism and the identity of the rate limiting steps.
Two enzymes of unknown function from the amidohydrolase superfamily were discovered to catalyze the deamination of N-6-methyladenine to hypoxanthine and methyl amine. The methylation of adenine in bacterial DNA is a common modification for the protection of host DNA against restriction endonucleases. The enzyme from Bacillus halodurans, Bh0637, catalyzes the deamination of N-6-methyladenine with a kcat of 185 s−1 and a kcat/Km of 2.5 × 106 M−1 s−1. Bh0637 catalyzes the deamination of N-6-methyladenine two orders of magnitude faster than adenine. A comparative model of Bh0637 was computed using the three-dimensional structure of Atu4426 (PDB code: 3NQB) as a structural template and computational docking was used to rationalize the preferential utilization of N-6-methyladenine over adenine. This is the first identification of an N-6-methyladenine deaminase (6-MAD).
Earthquakes, floods, droughts, storms, mudslides, landslides, and forest wild fires are serious threats to human lives and properties. The present study aimed to study the environmental characteristics and pathogenic traits, recapitulate experiences, and augment applications of medical reliefs in tropical regions.
Analysis was made on work and projects of emergency medical rescue, based on information and data collected from 3 emergency medical rescue missions of China International Search and Rescue Team to overseas earthquakes and tsunamis aftermaths in tropical disaster regions — Indonesia-Aceh, Indonesia-Yogyakarta, and Haiti-Port au Prince.
Shock, infection and heat stroke were frequently encountered in addition to outbreaks of infectious diseases, skin diseases, and diarrhea during post-disaster emergency medical rescue in tropical regions.
High temperature, high humidity, and proliferation of microorganisms and parasites are the characteristics of tropical climate that impose strict requirements on the preparation of rescue work including selective team members suitable for a particular rescue mission and the provisioning of medical equipment and life support materials. The overseas rescue mission itself needs a scientific, efficient, simple workflow for providing efficient emergency medical assistance. Since shock and infection are major tasks in post-disaster treatment of severely injured victims in tropical regions, the prevention and diagnosis of hyperthermia, insect-borne infectious diseases, tropic skin diseases, infectious diarrhea, and pest harms of disaster victims and rescue team staff should be emphasized during the rescue operations.
Disasters; Tropical regions; Earthquake; Emergency medical rescue
The purpose of this study was to explore correlations among constitution, stress, and discomfort symptoms during the first
trimester of pregnancy. We adopted a descriptive and correlational research design and collected data from 261 pregnant women during their first trimester in southern Taiwan using structured questionnaires. Results showed that (1) stress was significantly and positively correlated with Yang-Xu, Yin-Xu, and Tan-Shi-Yu-Zhi constitutions, respectively; (2) Yin-Xu and Tan-Shi-Yu-Zhi constitutions had significant correlations with all symptoms of discomfort, while Yang-Xu had significant correlations with all symptoms of discomfort except for “running nose”; (3) Tan-Shi-Yu-Zhi constitution and stress were two indicators for “fatigue”; Tan-Shi-Yu-Zhi was the indicator for “nausea”; Yang-Xu and Yin-Xu were indicators for “frequent urination.” Our findings also indicate that stress level affects constitutional changes and that stress and constitutional change affect the incidence of discomfort. This research can help healthcare professionals observe these discomforts and provide individualized care for pregnant women, to nurture pregnant women into neutral-type constitution, minimize their levels of discomfort, and promote the health of the fetus and the mother.
Population migrations in Southwest and South China have played an important role in the formation of East Asian populations and led to a high degree of cultural diversity among ethnic minorities living in these areas. To explore the genetic relationships of these ethnic minorities, we systematically surveyed the variation of 10 autosomal STR markers of 1,538 individuals from 30 populations of 25 ethnic minorities, of which the majority were chosen from Southwest China, especially Yunnan Province. With genotyped data of the markers, we constructed phylogenies of these populations with both DA and DC measures and performed a principal component analysis, as well as a clustering analysis by structure. Results showed that we successfully recovered the genetic structure of analyzed populations formed by historical migrations. Aggregation patterns of these populations accord well with their linguistic affiliations, suggesting that deciphering of genetic relationships does in fact offer clues for study of ethnic differentiation.
The human MUC7 gene encodes a low-molecular-weight mucin glycoprotein that functions in lubrication/protection of epithelial surfaces of the oral cavity and respiratory tract. This study was designed to evaluate the effect of cigarette smoke extract (CSE), cigarette smoke (CS), and Pseudomonas aeruginosa lipopolysaccharide (LPS), either alone or in the combination, on MUC7 expression in vitro and in vivo.
Materials and Methods:
qRT-PCR was used to determine the levels of mucin gene transcription in the human lung carcinoma cell line NCI-H292 (in vitro) and MUC7 transgenic mouse tissues (in vivo). ELISA was used to assess mucin glycoprotein levels in the cell line, and immunohistochemistry to assess mucins in lung and trachea sections.
In vitro treatment of cells with LPS (10 (µg/ml) or CSE (0.5, 1, 2.5 and 5%) alone, resulted in a statistically significant increase of MUC7 transcripts only with 1%CSE (3.2-fold). The combined CSE/LPS treatment resulted in a synergistic increase of MUC7 with 0.5%CSE/LPS (4.4 fold). MUC7 glycoprotein levels increased only minimally, the highest increase was seen with the 0.5%CSE/LPS combination treatment (1.3-fold). In vivo exposure of MUC7 transgenic mice to CS, LPS or CS/LPS combination resulted in significant increase in MUC7 transcripts only with LPS treatment (in both trachea and lung). Immunohistochemistry indicated variable increase in MUC7 glycoprotein with CS and LPS treatment, both in the trachea and lungs, but CS/LPS exposure appeared to yield the highest increase.
In vitro, CSE and a combination of CSE/LPS treatment upregulated MUC7 gene transcription. In vivo, LPS upregulated MUC7 transcription, and a combination of CS/LPS appeared to increase MUC7 glycoprotein.
MUC7 expression; cigarette smoke; LPS; NCI-H292 cells; lungs; trachea.
Two orders of magnitude more protein sequences can be modeled by comparative modeling than have been determined by X-ray crystallography and NMR spectroscopy. Investigators have nevertheless been cautious about using comparative models for ligand discovery because of concerns about model errors. We suggest how to exploit comparative models for molecular screens, based on docking against a wide range of crystallographic structures and comparative models with known ligands. To account for the variation in the ligand-binding pocket as it binds different ligands, we calculate “consensus” enrichment by ranking each library compound by its best docking score against all available comparative models and/or modeling templates. For the majority of the targets, the consensus enrichment for multiple models was better or comparable to that of the holo and apo X-ray structures. Even for single models, the models are significantly more enriching than the template structure if the template is paralogous and shares more than 25% sequence identity with the target.
comparative modeling; docking screens; consensus enrichment
A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5 to 1.0 Å were found for 15 out of the 21 cases (average 0.82 Å). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 Å lower than the starting value was found among the 5 best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 Å. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.
homology modeling; protein structure prediction; replica-exchange molecular dynamics; statistical potential; structure refinement
We report a serious problem associated with a number of current implementations of Andersen and Langevin dynamics algorithms. When long simulations are run in many segments, it is sometimes possible to have a repeating sequence of pseudorandom numbers enter the calcuation. We show that, if the sequence repeats rapidly, the resulting artifacts can quickly denature biomolecules and are then easily detectable. However, if the sequence repeats less frequently, the artifacts become subtle and easily overlooked. We derive a formula for the underlying cause of artifacts in the case of the Langevin thermostat, and find it vanishes slowly as the inverse square root of the number of time steps per simulation segment. Numerous examples of simulation artifacts are presented, including dissociation of a tetrameric protein after 110 ns of dynamics, reductions in atomic fluctuations for a small protein in implicit solvent, altered thermodynamic properties of a box of water molecules, and changes in the transition free energies between dihedral angle conformations. Finally, in the case of strong thermocoupling, we link the observed artifacts to previous work in nonlinear dynamics and show that it is possible to drive a 20-residue, implicitly solvated protein into periodic trajectories if the thermostat is not used properly. Our findings should help other investigators re-evaluate simulations that may have been corrupted and obtain more accurate results.
Eukaryotic precursor (pre)-tRNAs are processed at both ends prior to maturation. Pre-tRNAs and other nascent transcripts synthesized by RNA polymerase III are bound at their 3′ ends at the sequence motif UUUOH [3′ oligo(U)] by the La antigen, a conserved phosphoprotein whose role in RNA processing has been associated previously with 3′-end maturation only. We show that in addition to its role in tRNA 3′-end maturation, human La protein can also modulate 5′ processing of pre-tRNAs. Both the La antigen’s N-terminal RNA-binding domain and its C-terminal basic region are required for attenuation of pre-tRNA 5′ processing. RNA binding and nuclease protection assays with a variety of pre-tRNA substrates and mutant La proteins indicate that 5′ protection is a highly selective activity of La. This activity is dependent on 3′ oligo(U) in the pre-tRNA for interaction with the N-terminal RNA binding domain of La and interaction of the C-terminal basic region of La with the 5′ triphosphate end of nascent pre-tRNA. Phosphorylation of La is known to occur on serine 366, adjacent to the C-terminal basic region. We show that this modification interferes with the La antigen’s ability to protect pre-tRNAiMet from 5′ processing either by HeLa extract or purified RNase P but that it does not affect interaction with the 3′ end of pre-tRNA. These findings provide the first evidence to indicate that tRNA 5′-end maturation may be regulated in eukaryotes. Implications of triphosphate recognition is discussed as is a role for La phosphoprotein in controlling transcriptional and posttranscriptional events in the biogenesis of polymerase III transcripts.