Background
Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation.
Results
We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods.
Conclusions
We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
doi:10.1186/1471-2105-14-63
PMCID: PMC3598644
PMID: 23433045
Functional site; Catalytic residues; Neural network; Feature selection; Structural genomics
Gene regulatory networks show robustness to perturbations. Previous works identified robustness as an emergent property of gene network evolution but the underlying molecular mechanisms are poorly understood. We used a multi-tier modeling approach that integrates molecular sequence and structure information with network architecture and population dynamics. Structural models of transcription factor-DNA complexes are used to estimate relative binding specificities. In this model, mutations in the DNA cause changes on two levels: (a) at the sequence level in individual binding sites (modulating binding specificity), and (b) at the network level (creating and destroying binding sites). We used this model to dissect the underlying mechanisms responsible for the evolution of robustness in gene regulatory networks. Results suggest that in sparse architectures (represented by short promoters), a mixture of local-sequence and network-architecture level changes are exploited. At the local-sequence level, robustness evolves by decreasing the probabilities of both the destruction of existent and generation of new binding sites. Meanwhile, in highly interconnected architectures (represented by long promoters), robustness evolves almost entirely via network level changes, deleting and creating binding sites that modify the network architecture.
Author Summary
Development from egg to embryo depends to a large extent on regulatory networks of genes called transcription factors. Previous research has shown these gene regulatory networks to be robust to perturbations at the level of the connections between transcription factors. Here, we investigate the mechanisms underlying the evolution of robustness in gene networks using a modeling approach, which considers three levels: binding of individual transcription factors to DNA, dynamics of gene expression levels, and fitness effects at the population level. In our model the gene regulatory network is determined by transcription factor binding sites within DNA sequences, which undergo mutation. We categorize these mutations in a continuum ranging from silent mutations, which have no effect on regulation and change only the DNA sequence (local-sequence level), to mutations that change connections between genes in the network (network-architecture level). We find that in sparse networks, containing few connections between genes, a balance of local-sequence and network-architecture level mechanisms are responsible for the evolution of robustness, but when the network is densely connected the network-architecture level mechanisms become dominant. We argue that the shift towards the network-architecture level for more densely-connected networks offers a potential explanation for the evolution of increased complexity.
doi:10.1371/journal.pcbi.1002865
PMCID: PMC3536627
PMID: 23300434
Differential detergent fractionation (DDF) is frequently used to partition fresh cells and tissues into distinct compartments. We have tested whether DDF can reproducibly extract and fractionate cellular protein components from frozen tissues. Frozen kidneys were sequentially extracted with three different buffer systems. Analysis of the three fractions with LC-MS/MS identified 1,693 proteins, some of which were common to all fractions and others unique to specific fractions. Normalized spectral index values (SIN) obtained from these data were compared in order to evaluate both the reproducibility of the method as well as the efficiency of enrichment. SIN values between replicate fractions demonstrated a high correlation, confirming the reproducibility of the method. Correlation coefficients across the three fractions were significantly lower than those for the replicates, supporting the capability of DDF to differentially fractionate proteins into separate compartments. Subcellular annotation of the proteins identified in each fraction demonstrated a significant enrichment of cytoplasmic, cell membrane and nuclear proteins in the three respective buffer system fractions. We conclude that DDF can be applied to frozen tissue to generate reproducible proteome coverage discriminating subcellular compartments. This demonstrates the feasibility of analyzing cellular compartment specific proteins in archived tissue samples with the simple DDF method.
doi:10.1016/j.ab.2011.06.045
PMCID: PMC3164751
PMID: 21802400
Differential detergent fractionation; Normalized spectral index; Frozen tissue; Subcellular location
Mass spectrometry analysis of cross-linked peptides can be used to probe protein contact sites in macromolecular complexes. We have developed a photo-cleavable cross-linker that enhances peptide enrichment, improving the signal-to-noise ratio of the cross-linked peptides in mass spectrometry analysis. This cross-linker utilizes nitro-benzyl alcohol group that can be cleaved by UV irradiation and is stable during the multiple washing steps used for peptide enrichment. The enrichment method utilizes a cross-linker that aids in eliminating contamination resulting from protein based retrieval systems, and thus, facilitates the identification of cross-linked peptides. Homodimeric pilM protein from Pseudomonas aeruginosa 2192 (pilM) was investigated to test the specificity and experimental conditions. As predicted, the known pair of lysine side chains within 14Å was cross-linked. An unexpected cross-link involving the protein’s amino terminus was also detected. This is consistent with the predicted mobility of the amino terminus that may bring the amino groups within 19Å of one another in solution. These technical improvements allow this method to be used for investigating protein-protein interactions in complex biological samples.
doi:10.1002/pmic.201100015
PMCID: PMC3465073
PMID: 21834138
cross-link; enrichment; photo-cleavable; transient protein complex
Parthasarathy, Sampathkumar | Lu, Frances | Zhao, Xun | Li, Zhenzhen | Gilmore, Jeremiah | Bain, Kevin | Rutter, Marc E. | Gheyi, Tarun | Schwinn, Kenneth D. | Bonanno, Jeffrey B. | Pieper, Ursula | Fajardo, J. Eduardo | Fiser, Andras | Almo, Steven C. | Swaminathan, Subramanyam | Chance, Mark R. | Baker, David | Atwell, Shane | Thompson, Devon A. | Emtage, J. Spencer | Wasserman, Stephen R. | Sali, Andrej | Sauder, J. Michael | Burley, Stephen K.
The X-ray structure of a putative BenF-like (gene name: PFL1329) protein from Pseudomonas fluorescens Pf-5 (PflBenF) has been determined at 2.6Å resolution. X-ray crystallography revealed a canonical 18-stranded β-barrel fold that forms a central pore with a diameter of ∼4.6Å, which is consistent with the size and physicochemical properties of the presumed aromatic acid substrate, benzoate. Detailed comparisons with the previously-determined structure of Pseudomonas aeruginosa OpdK, a vanillate influx channel, revealed an arginine-rich aromatic acid selectivity filter of nearly identical structure composed of seven highly conserved residues Arg∼Asp∼Arg∼Arg∼Ser∼Asp∼Arg (R∼D∼R∼R∼S∼D∼R sequence motif, where ∼ denotes intervening residues) that define the narrowest part of the pore.
doi:10.1002/prot.22829
PMCID: PMC2989796
PMID: 20737437
BenF-like; substrate specific porin; OprD superfamily; OprD subfamily; OpdK subfamily; benzoate; Pseudomonas; integral membrane protein
Reciprocal interactions between glia and neurons are essential for the proper organization and function of the nervous system. Recently, the interaction between ErbB receptors (ErbB2 and ErbB3) on the surface of Schwann cells and neuronal Neuregulin-1 (NRG1) has emerged as the pivotal signal that controls Schwann cell development, association with axons, and myelination. To understand the function of NRG1-ErbB2/3 signaling axis in adult Schwann cell biology we are studying the specific role of ErbB3 receptor tyrosine kinase (RTK) since it is the receptor for NRG1 on the surface of Schwann cells. Here we show that alternative transcription initiation results in the formation of a nuclear variant of ErbB3 (nuc-ErbB3) in rat primary Schwann cells. Nuc-ErbB3 possesses a functional nuclear localization signal sequence and binds to chromatin. Using ChIP-ChIP arrays we identified the promoters that associate with nuc-ErbB3 and clustered the active promoters in Schwann cell gene expression. Nuc-ErbB3 regulates the transcriptional activity of ezrin and HMGB1 promoters while inhibition of nuc-ErbB3 expression results in reduced myelination and altered distribution of ezrin in the nodes of Ranvier. Finally, we reveal that NRG1 regulates the translation of nuc-ErbB3 in rat Schwann cells. For the first time, to our knowledge, we show that alternative transcription initiation from a gene that encodes a RTK is capable to generate a protein variant of the receptor with a distinct role in molecular and cellular regulation. We propose a new concept for the molecular regulation of myelination through the expression and distinct role of nuc-ErbB3.
doi:10.1523/JNEUROSCI.5635-10.2011
PMCID: PMC3086203
PMID: 21451047
ErbB3; Schwann cells; myelination; nodes; transcription; signaling
Wang, Li | Rubinstein, Rotem | Lines, Janet L. | Wasiuk, Anna | Ahonen, Cory | Guo, Yanxia | Lu, Li-Fan | Gondek, David | Wang, Yan | Fava, Roy A. | Fiser, Andras | Almo, Steve | Noelle, Randolph J.
VISTA suppresses T cell proliferation and cytokine production and can influence autoimmunity and antitumor responses in mice.
The immunoglobulin (Ig) superfamily consists of many critical immune regulators, including the B7 family ligands and receptors. In this study, we identify a novel and structurally distinct Ig superfamily inhibitory ligand, whose extracellular domain bears homology to the B7 family ligand PD-L1. This molecule is designated V-domain Ig suppressor of T cell activation (VISTA). VISTA is primarily expressed on hematopoietic cells, and VISTA expression is highly regulated on myeloid antigen-presenting cells (APCs) and T cells. A soluble VISTA-Ig fusion protein or VISTA expression on APCs inhibits T cell proliferation and cytokine production in vitro. A VISTA-specific monoclonal antibody interferes with VISTA-induced suppression of T cell responses by VISTA-expressing APCs in vitro. Furthermore, anti-VISTA treatment exacerbates the development of the T cell–mediated autoimmune disease experimental autoimmune encephalomyelitis in mice. Finally, VISTA overexpression on tumor cells interferes with protective antitumor immunity in vivo in mice. These findings show that VISTA, a novel immunoregulatory molecule, has functional activities that are nonredundant with other Ig superfamily members and may play a role in the development of autoimmunity and immune surveillance in cancer.
doi:10.1084/jem.20100619
PMCID: PMC3058578
PMID: 21383057
Toxoplasma gondii is an apicomplexan of both medical and veterinary importance which is classified as an NIH Category B priority pathogen. It is best known for its ability to cause congenital infection in immune competent hosts and encephalitis in immune compromised hosts. The highly stable and specialized microtubule-based cytoskeleton participates in the invasion process. The genome encodes three isoforms of both α- and β-tubulin and we show that the tubulin is extensively altered by specific post-translational modifications (PTMs) in this paper. T. gondii tubulin PTMs were analyzed by mass spectrometry and immunolabeling using specific antibodies. The PTMs identified on α-tubulin included acetylation of Lys40, removal of the last C-terminal amino acid residue Tyr453 (detyrosinated tubulin) and truncation of the last five amino acid residues. Polyglutamylation was detected on both α- and β-tubulins. An antibody directed against mammalian α-tubulin lacking the last two C-terminal residues (Δ2-tubulin) labeled the apical region of this parasite. Detyrosinated tubulin was diffusely present in subpellicular microtubules and displayed an apparent accumulation at the basal end. Methylation, a PTM not previously described on tubulin, was also detected. Methylated tubulins were not detected in the host cells, human foreskin fibroblasts, suggesting that this may be a modification specific to the Apicomplexa.
doi:10.1021/pr900699a
PMCID: PMC2813730
PMID: 19886702
Toxoplasma gondii; cytoskeleton; tubulin; post-translational modification; proteomics; microtubules; conoid
The microtubule cytoskeleton has proven to be an effective target for cancer therapeutics. One class of drugs, known as microtubule stabilizing agents (MSAs), binds to microtubule polymers and stabilizes them against depolymerization. The prototype of this group of drugs, Taxol, is an effective chemotherapeutic agent used extensively in the treatment of human ovarian, breast, and lung carcinomas. Although electron crystallography and photoaffinity labeling experiments determined that the binding site for Taxol is in a hydrophobic pocket in β-tubulin, little was known about the effects of this drug on the conformation of the entire microtubule. A recent study from our laboratory utilizing hydrogen-deuterium exchange (HDX) in concert with various mass spectrometry (MS) techniques has provided new information on the structure of microtubules upon Taxol binding. In the current study we apply this technique to determine the binding mode and the conformational effects on chicken erythrocyte tubulin (CET) of another MSA, discodermolide, whose synthetic analogues may have potential use in the clinic. We confirmed that like Taxol, discodermolide binds to the taxane binding pocket in β-tubulin. However, as opposed to Taxol, which has major interactions with the M-loop, discodermolide orients itself away from this loop and towards the N-terminal H1–S2 loop. Additionally, discodermolide stabilizes microtubules mainly via its effects on interdimer contacts, specifically on the α-tubulin side, and to a lesser extent on interprotofilament contacts between adjacent β-tubulin subunits. Also, our results indicate complementary stabilizing effects of Taxol and discodermolide on the microtubules, which may explain the synergy observed between the two drugs in vivo.
doi:10.1021/bi901351q
PMCID: PMC2845443
PMID: 19863156
microtubules; discodermolide; Taxol; mass spectrometry; hydrogen-deuterium exchange
X-linked dyskeratosis congenita (DC) is a rare bone marrow failure syndrome caused by mostly missense mutations in the pseudouridine synthase NAP57 (dyskerin/Cbf5). As part of H/ACA ribonucleoproteins (RNPs), NAP57 is important for the biogenesis of ribosomes, spliceosomal small nuclear RNPs, microRNAs and the telomerase RNP. DC mutations concentrate in the N- and C-termini of NAP57 but not in its central catalytic domain raising questions as to their impact. We demonstrate that the N- and C-termini together form the binding surface for the H/ACA RNP assembly factor SHQ1 and that DC mutations modulate the interaction between the two proteins. Pinpointing impaired interaction between NAP57 and SHQ1 as a potential molecular basis for X-linked DC has implications for therapeutic approaches, e.g. by targeting the NAP57–SHQ1 interface with small molecules.
doi:10.1093/hmg/ddp416
PMCID: PMC2773269
PMID: 19734544
Dessailly, Benoît H. | Nair, Rajesh | Jaroszewski, Lukasz | Fajardo, J. Eduardo | Kouranov, Andrei | Lee, David | Fiser, Andras | Godzik, Adam | Rost, Burkhard | Orengo, Christine
Summary
One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centres, targets representatives from large, structurally uncharacterised protein domain families, and from structurally uncharacterised subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly over-represented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first three years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.
doi:10.1016/j.str.2009.03.015
PMCID: PMC2920419
PMID: 19523904
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region.
Author Summary
Structural genomics efforts aim at exploring the repertoire of three-dimensional structures of protein molecules. While genome scale sequencing projects have already provided us with all the genes of many organisms, it is the three dimensional shape of gene encoded proteins that defines all the interactions among these components. Understanding the versatility and, ultimately, the role of all possible molecular shapes in the cell is a necessary step toward understanding how organisms function. In this work we explored the rules that identify certain shapes as novel compared to all already known structures. The findings of this work provide possible insights into the rules that can be used in future works to identify or design new molecular shapes or to relate folds with each other in a quantitative manner.
doi:10.1371/journal.pcbi.1000750
PMCID: PMC2858679
PMID: 20421995
Toxoplasma gondii is a ubiquitous, Apicomplexan parasite that, in humans, can cause several clinical syndromes, including encephalitis, chorioretinitis and congenital infection. T. gondii was described a little over 100 years ago in the tissues of the gundi (Ctenodoactylus gundi). There are a large number of applicable experimental techniques available for this pathogen and it has become a model organism for the study of intracellular pathogens. With the completion of the genomes for a type I (GT-1), type II (ME49) and type III (VEG) strains, proteomic studies on this organism have been greatly facilitated. Several subcellular proteomic studies have been completed on this pathogen. These studies have helped elucidate specialized invasion organelles and their composition, as well as proteins associated with the cytoskeleton. Global proteomic studies are leading to improved strategies for genome annotation in this organism and an improved understanding of protein regulation in this pathogen. Web-based resources, such as EPIC-DB and ToxoDB, provide proteomic data and support for studies on T. gondii. This review will summarize the current status of proteomic research on T. gondii.
doi:10.1586/epr.09.16
PMCID: PMC2741161
PMID: 19489701
Apicomplexa; cell biology; genome; proteomic; Toxoplasma gondii
Background
Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment.
Results
The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials.
Conclusions
Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.
doi:10.1186/1471-2105-11-128
PMCID: PMC2853469
PMID: 20226048
Cross-linking analysis of protein complexes and structures by tandem mass spectrometry (MS/MS) has advantages in speed, sensitivity, specificity, and the capability of handling complicated protein assemblies. However, detection and accurate assignment of the cross-linked peptides are often challenging due to their low abundance and complicated fragmentation behavior in collision-induced dissociation (CID). To simplify the MS analysis and improve the signal-to-noise ratio of the cross-linked peptides, we developed a novel peptide enrichment strategy that utilizes a cross-linker with a cryptic thiol group and using beads modified with a photocleavable cross-linker. The functional cross-linkers were designed to react with the primary amino groups in proteins. Human serum albumin was used as a model protein to detect intra- and intermolecular cross-linkages. Use of this protein-free selective retrieval method eliminates the contamination that can result from avidin–biotin based retrieval systems and simplifies data analysis. These features may make the method suitable to investigate protein–protein interactions in biological samples.
doi:10.1021/ac900360b
PMCID: PMC2765915
PMID: 19642656
Schwede, Torsten | Sali, Andrej | Honig, Barry | Levitt, Michael | Berman, Helen M. | Jones, David | Brenner, Steven E. | Burley, Stephen K. | Das, Rhiju | Dokholyan, Nikolay V. | Dunbrack, Roland L. | Fidelis, Krzysztof | Fiser, Andras | Godzik, Adam | Huang, Yuanpeng Janet | Humblet, Christine | Jacobson, Matthew P. | Joachimiak, Andrzej | Krystek, Stanley R. | Kortemme, Tanja | Kryshtafovych, Andriy | Montelione, Gaetano T. | Moult, John | Murray, Diana | Sanchez, Roberto | Sosnick, Tobin R. | Standley, Daron M. | Stouch, Terry | Vajda, Sandor | Vasquez, Max | Westbrook, John D. | Wilson, Ian A.
Summary
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
doi:10.1016/j.str.2008.12.014
PMCID: PMC2739730
PMID: 19217386
Nair, Rajesh | Liu, Jinfeng | Soong, Ta-Tsen | Acton, Thomas B. | Everett, John K. | Kouranov, Andrei | Fiser, Andras | Godzik, Adam | Jaroszewski, Lukasz | Orengo, Christine | Montelione, Gaetano T. | Rost, Burkhard
The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today’s UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849–851, 2007) has resulted from systematic targeting of large families. PSI’s per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another ~15 years to cover most sequences in the current UniProt database.
doi:10.1007/s10969-008-9055-6
PMCID: PMC2705706
PMID: 19194785
Protein structure determination; Structural genomics; Evolution; Protein universe
Background
High throughput proteomics experiments are useful for analyzing the protein expression of an organism, identifying the correct gene structure of a genome, or locating possible post-translational modifications within proteins. High throughput methods necessitate publicly accessible and easily queried databases for efficiently and logically storing, displaying, and analyzing the large volume of data.
Description
EPICDB is a publicly accessible, queryable, relational database that organizes and displays experimental, high throughput proteomics data for Toxoplasma gondii and Cryptosporidium parvum. Along with detailed information on mass spectrometry experiments, the database also provides antibody experimental results and analysis of functional annotations, comparative genomics, and aligned expressed sequence tag (EST) and genomic open reading frame (ORF) sequences. The database contains all available alternative gene datasets for each organism, which comprises a complete theoretical proteome for the respective organism, and all data is referenced to these sequences. The database is structured around clusters of protein sequences, which allows for the evaluation of redundancy, protein prediction discrepancies, and possible splice variants. The database can be expanded to include genomes of other organisms for which proteome-wide experimental data are available.
Conclusion
EPICDB is a comprehensive database of genome-wide T. gondii and C. parvum proteomics data and incorporates many features that allow for the analysis of the entire proteomes and/or annotation of specific protein sequences. EPICDB is complementary to other -genomics- databases of these organisms by offering complete mass spectrometry analysis on a comprehensive set of all available protein sequences.
doi:10.1186/1471-2164-10-38
PMCID: PMC2652494
PMID: 19159464
Dybas, Joseph M. | Madrid-Aliste, Carlos J. | Che, Fa-Yun | Nieves, Edward | Rykunov, Dmitry | Angeletti, Ruth Hogue | Weiss, Louis M. | Kim, Kami | Fiser, Andras | Salzberg, Steven L.
Background
Toxoplasma gondii is an obligate intracellular protozoan that infects 20 to 90% of the population. It can cause both acute and chronic infections, many of which are asymptomatic, and, in immunocompromized hosts, can cause fatal infection due to reactivation from an asymptomatic chronic infection. An essential step towards understanding molecular mechanisms controlling transitions between the various life stages and identifying candidate drug targets is to accurately characterize the T. gondii proteome.
Methodology/Principal Findings
We have explored the proteome of T. gondii tachyzoites with high throughput proteomics experiments and by comparison to publicly available cDNA sequence data. Mass spectrometry analysis validated 2,477 gene coding regions with 6,438 possible alternative gene predictions; approximately one third of the T. gondii proteome. The proteomics survey identified 609 proteins that are unique to Toxoplasma as compared to any known species including other Apicomplexan. Computational analysis identified 787 cases of possible gene duplication events and located at least 6,089 gene coding regions. Commonly used gene prediction algorithms produce very disparate sets of protein sequences, with pairwise overlaps ranging from 1.4% to 12%. Through this experimental and computational exercise we benchmarked gene prediction methods and observed false negative rates of 31 to 43%.
Conclusions/Significance
This study not only provides the largest proteomics exploration of the T. gondii proteome, but illustrates how high throughput proteomics experiments can elucidate correct gene structures in genomes.
doi:10.1371/journal.pone.0003899
PMCID: PMC2587701
PMID: 19065262
The Pentapeptide Repeat Protein (PRP) family has over 500 members in the prokaryotic and eukaryotic kingdoms. These proteins are composed of, or contain domains composed of, tandemly repeated amino acid sequences with a consensus sequence of [S,T,A,V][D,N][L,F]-[S,T,R][G]. The biochemical function of the vast majority of PRP family members is unknown. The three-dimensional structure of the first member of the PRP family was determined for the fluoroquinolone resistance protein (MfpA) from Mycobacterium tuberculosis. The structure revealed that the pentapeptide repeats encode the folding of a novel right-handed quadrilateral β-helix. MfpA binds to DNA gyrase and inhibits its activity. The rod-shaped, dimeric protein exhibits remarkable size, shape and electrostatic similarity to DNA.
doi:10.1021/bi052130w
PMCID: PMC2566302
PMID: 16388575
Zhan, Chenyang | Fedorov, Elena V. | Shi, Wuxian | Ramagopal, U. A. | Thirumuruhan, R. | Manjasetty, Babu. A. | Almo, Steve C. | Fiser, Andras | Chance, Mark R. | Fedorov, Alexander A.
The ybeY protein from E. coli is reported at a 2.7 Å resolution with a metal ion.
The three-dimensional crystallographic structure of the ybeY protein from Escherichia coli (SwissProt entry P77385) is reported at 2.7 Å resolution. YbeY is a hypothetical protein that belongs to the UPF0054 family. The structure reveals that the protein binds a metal ion in a tetrahedral geometry. Three coordination sites are provided by histidine residues, while the fourth might be a water molecule that is not seen in the diffraction map because of its relatively low resolution. X-ray fluorescence analysis of the purified protein suggests that the metal is a nickel ion. The structure of ybeY and its sequence similarity to a number of predicted metal-dependent hydrolases provides a functional assignment for this protein family. The figures and tables of this paper were prepared using semi-automated tools, termed the Autopublish server, developed by the New York Structural GenomiX Research Consortium, with the goal of facilitating the rapid publication of crystallographic structures that emanate from worldwide Structural Genomics efforts, including the NIH-funded Protein Structure Initiative.
doi:10.1107/S1744309105031131
PMCID: PMC1978141
PMID: 16511207
Protein Structure Initiative; metalloproteins; nickel; UPF0054 family
Pieper, Ursula | Eswar, Narayanan | Braberg, Hannes | Madhusudhan, M. S. | Davis, Fred P. | Stuart, Ashley C. | Mirkovic, Nebojsa | Rossi, Andrea | Marti-Renom, Marc A. | Fiser, Andras | Webb, Ben | Greenblatt, Daniel | Huang, Conrad C. | Ferrin, Thomas E. | Sali, Andrej
MODBASE (http://salilab.org/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on the MODELLER package for fold assignment, sequence–structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE uses the MySQL relational database management system for flexible querying and CHIMERA for viewing the sequences and structures (http://www.cgl.ucsf.edu/chimera/). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different data sets. The largest data set contains 1 262 629 models for domains in 659 495 out of 1 182 126 unique protein sequences in the complete Swiss-Prot/TrEMBL database (August 25, 2003); only models based on alignments with significant similarity scores and models assessed to have the correct fold despite insignificant alignments are included. Another model data set supports target selection and structure-based annotation by the New York Structural Genomics Research Consortium; e.g. the 53 new structures produced by the consortium allowed us to characterize structurally 24 113 sequences. MODBASE also contains binding site predictions for small ligands and a set of predicted interactions between pairs of modeled sequences from the same genome. Our other resources associated with MODBASE include a comprehensive database of multiple protein structure alignments (DBALI, http://salilab.org/dbali) as well as web servers for automated comparative modeling with MODPIPE (MODWEB, http://salilab.org/modweb), modeling of loops in protein structures (MODLOOP, http://salilab.org/modloop) and predicting functional consequences of single nucleotide polymorphisms (SNPWEB, http://salilab.org/snpweb).
doi:10.1093/nar/gkh095
PMCID: PMC308829
PMID: 14681398
Eswar, Narayanan | John, Bino | Mirkovic, Nebojsa | Fiser, Andras | Ilyin, Valentin A. | Pieper, Ursula | Stuart, Ashley C. | Marti-Renom, Marc A. | Madhusudhan, M. S. | Yerkovich, Bozidar | Sali, Andrej
The following resources for comparative protein structure modeling and analysis are described (http://salilab.org): MODELLER, a program for comparative modeling by satisfaction of spatial restraints; MODWEB, a web server for automated comparative modeling that relies on PSI-BLAST, IMPALA and MODELLER; MODLOOP, a web server for automated loop modeling that relies on MODELLER; MOULDER, a CPU intensive protocol of MODWEB for building comparative models based on distant known structures; MODBASE, a comprehensive database of annotated comparative models for all sequences detectably related to a known structure; MODVIEW, a Netscape plugin for Linux that integrates viewing of multiple sequences and structures; and SNPWEB, a web server for structure-based prediction of the functional impact of a single amino acid substitution.
PMCID: PMC168950
PMID: 12824331