Secreted and cell surface-localized members of the immunoglobulin superfamily (IgSF) play central roles in regulating adaptive and innate immune responses, and are prime targets for the development of protein-based therapeutics. An essential activity of the ectodomains of these proteins is the specific recognition of cognate ligands, which are often other members of the IgSF. In this work we provide functional insight for this important class of proteins through the development of a clustering algorithm that groups together extracellular domains of the IgSF with similar binding preferences. Information from hidden Markov model-based sequence profiles and domain structure is calibrated against manually curated protein interaction data to define functional families of IgSF proteins. The method is able to assign 82% of the 477 extracellular IgSF protein to a functional family, while the rest are either single proteins with unique function or proteins that could not be assigned with the current technology. The functional clustering of IgSF proteins generates hypotheses regarding the identification of new cognate receptor:ligand pairs and reduces the pool of possible interacting partners to a manageable level for experimental validation.
Immunoglobulin superfamily; protein-protein interaction; functional prediction
Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments. TLX3 motif searches in human promoter regions identified a group of genes enriched in functions relating to hematopoiesis, tissue morphology, endocrine system and connective tissue development and function.
The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.
Structural genomics; Phosphatase; NYSGXRC; X-ray crystallography
Functional characterization of a protein is often facilitated by its 3D structure. However, the fraction of experimentally known 3D models is currently less than 1% due to the inherently time-consuming and complicated nature of structure determination techniques. Computational approaches are employed to bridge the gap between the number of known sequences and that of 3D models. Template-based protein structure modeling techniques rely on the study of principles that dictate the 3D structure of natural proteins from the theory of evolution viewpoint. Strategies for template-based structure modeling will be discussed with a focus on comparative modeling, by reviewing techniques available for all the major steps involved in the comparative modeling pipeline.
Homology modeling; Comparative protein structure modeling; Template-based modeling; Loop modeling; Side chain modeling; Sequence-to-structure alignment
A remaining challenge in protein modeling is to predict structures for sequences that do not share recognizable sequence similarity to any experimentally solved structure. This challenge can be addressed by hybrid algorithms that utilize easily obtainable experimental data and carry a limited amount of indirect structural information. Based on earlier observations, the library of protein super-secondary structure motifs (Smotifs) saturated about a decade ago, and new folds discovered since then are novel combinations of existing Smotifs. This observation suggests that it should be possible to build any structure, of either a known or yet to be discovered fold, from a combination of existing Smotifs derived from already known structures. In the absence of any sequence similarity signal, limited experimental data can be used to relate the backbone conformations of Smotifs between target proteins and known experimental structures. Here we present a modeling algorithm that relies on an exhaustive Smotif library and on NMR chemical shift patterns without any input of primary sequence information. In a test of 102 proteins with unique folds, the algorithm delivered 90 homology model quality models, among them 24 high quality ones, and a topologically correct solution for almost all cases. Detailed analysis of the method’s performance suggests that further improvement can be achieved by improving sampling algorithms and developing more precise tools that predict dihedral angle preferences from chemical shift assignments. The current approach opens a venue to address the modeling of larger protein structures for which chemical shifts are available.
Ab initio modeling; protein structure modeling; Smotif; NMR chemical shift
Import-Karyopherin or Importin proteins bind nuclear localization signals (NLSs) to mediate the import of proteins into the cell nucleus. Karyopherin β2 or Kapβ2, also known as Transportin, is a member of this transporter family responsible for the import of numerous RNA binding proteins. Kapβ2 recognizes a targeting signal termed the PY-NLS that lies within its cargos to target them through the nuclear pore complex. The recognition of PY-NLS by Kapβ2 is conserved throughout eukaryotes. Kap104, the Kapβ2 homolog in Saccharomyces cerevisiae, recognizes PY-NLSs in cargos Nab2, Hrp1, and Tfg2. We have determined the crystal structure of Kapβ2 bound to the PY-NLS of the mRNA processing protein Nab2 at 3.05-Å resolution. A seven-residue segment of the PY-NLS of Nab2 is observed to bind Kapβ2 in an extended conformation and occupies the same PY-NLS binding site observed in other Kapβ2•PY-NLS structures.
Karyopherin; Importin; Nuclear Import; Nuclear Localization Signal; Nucleocytoplasmic Transport; Nuclear pore; Nab2
The members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity, and are prime targets for the treatment of autoimmune diseases, infectious diseases and malignancies. We describe a computational method, termed the Brotherhood Algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies previously unrecognized functional relationships within the IgSF and predicts the existence of new receptor-ligand interactions. As a specific example, we describe new members of the nectin/nectin-like family of cell adhesion and signaling proteins, as well as new receptor-ligand interactions within this family. Guided by the Brotherhood approach, we present the high resolution structural characterization of a previously undescribed homophilic interaction involving the class-I MHC-restricted T-cell-associated molecule (CRTAM) – a newly defined nectin-like family member. The Brotherhood Algorithm is likely to have significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative.
Immune regulatory proteins; Brotherhood algorithm; Class-I MHC-restricted T-cell-associated molecule; CRTAM; Immunoglobulin Superfamily; functional classification
Pax6 is a key regulatory gene for eye, brain, and pancreas development. It acts as a transcriptional activator and repressor. Loss-of-function of Pax6 results in down- and upregulation of a comparable number of genes, although many are secondary targets. Recently, we found a prototype of a Pax6-binding site that acts as a transcriptional repressor. We also identified the Trpm3 gene as a Pax6-direct target containing the miR-204 gene located in intron 6. Thus, there are multiple Pax6-dependent mechanisms of transcriptional repression in the cell. More than 50 Pax6 missense mutations have been identified in humans and mice. Two of these mutations, N50K (Leca4) and R128C (Leca2), were analyzed in depth resulting in different numbers of regulated genes and different ratios of down- and upregulated targets. Thus, additional studies of these mutants are warranted to better understand the molecular mechanisms of the mutants’ action.
Mutations in PAX6 and PAX6(5a), including G18W, R26G, N50K, G64V, R128C, and R242T, were generated with site-directed mutagenesis. A panel of ten luciferase reporters driven by six copies of Pax6-binding sites representing a spectrum of sites that act as repressors, moderate activators, and strong activators were used. Two additional reporters, including the Pax6-regulated enhancer from mouse Trpm3 and six copies of its individual Pax6-binding site, were also tested in P19 cells.
PAX6 (N50K) acted either as a loss-of-function or neutral mutation. In contrast, PAX6 (R128C) and (R242T) acted as loss-, neutral, and gain-of-function mutations. With three distinct reporters, the PAX6 (N50K) mutation broke the pattern of effects produced by substitutions in the surrounding helices of the N-terminal region of the paired domain. All six mutations tested acted as loss-of-function using the Trpm3 Pax6-binding site.
These studies highlight the complexity of Pax6-dependent transcriptional activation and repression mechanisms, and identify the N50K and R128C substitutions as valuable tools for testing interactions between Pax6, Pax6 (N50K), and Pax6 (R128C) with other regulatory proteins, including chromatin remodelers.
Metabolomics offers a powerful means to investigate human malaria parasite biology and host-parasite interactions at the biochemical level, and to discover novel therapeutic targets and biomarkers of infection. Here, we used an approach based on liquid chromatography and mass spectrometry to perform an untargeted metabolomic analysis of metabolite extracts from Plasmodium falciparum–infected and uninfected patient plasma samples, and from an enriched population of in vitro cultured P. falciparum-infected and uninfected erythrocytes. Statistical modeling robustly segregated infected and uninfected samples based on metabolite species with significantly different abundances. Metabolites of the α-linolenic acid (ALA) pathway, known to exist in plants but not known to exist in P. falciparum until now, were enriched in infected plasma and erythrocyte samples. In vitro labeling with 13C-ALA showed evidence of plant-like ALA pathway intermediates in P. falciparum. Ortholog searches using ALA pathway enzyme sequences from 8 available plant genomes identified several genes in the P. falciparum genome that were predicted to potentially encode the corresponding enzymes in the hitherto unannotated P. falciparum pathway. These data suggest that our approach can be used to discover novel facets of host/malaria parasite biology in a high-throughput manner.
Infection by a human papillomavirus (HPV) may result in a variety of clinical conditions ranging from benign warts to invasive cancer depending on the viral type. The HPV E2 protein represses transcription of the E6 and E7 genes in integrated papillomavirus genomes and together with the E1 protein is required for viral replication. E2 proteins bind with high affinity to palindromic DNA sequences consisting of two highly conserved four base pair sequences flanking a variable ‘spacer’ of identical length. The E2 proteins directly contact the conserved DNA but not the spacer DNA. However, variation in naturally occurring spacer sequences results in differential protein binding affinity. This discrimination in binding is dependent on their sensitivity to the unique conformational and/or dynamic properties of the spacer DNA in a process termed ‘indirect readout’. This article explores the structure of the E2 proteins and their interaction with DNA and other proteins, the effects of ions on affinity and specificity, and the phylogenetic and biophysical nature of this core viral protein. We have analyzed the sequence conservation and electrostatic features of three-dimensional models of the DNA binding domains of 146 papillomavirus types and variants with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, three-dimensional structure, and the electrostatic features of E2 protein DNA binding domain showed high conservation among all papillomavirus types. This indicates that the specific interactions between the E2 protein and its binding sites on DNA have been conserved throughout PV evolution. Analysis of the E2 protein’s transactivation domain showed that unlike the DNA binding domain, the transactivation domain does not have extensive surfaces of highly conserved residues. Rather, the regions of high conservation are localized to small surface patches. The invariance of the E2 DNA binding domain structure, electrostatics and sequence suggests that it may be a suitable target for the development of vaccines effective against a broad spectrum of HPV types.
Papillomavirus; DNA; Protein-DNA interactions; Electrostatics; E2; Review
Proteins can be decomposed into supersecondary structure modules. We used a generic definition of supersecondary structure elements, so-called Smotifs, which are composed of two flanking regular secondary structures connected by a loop, to explore the evolution and current variety of structure building blocks. Here, we discuss recent observations about the saturation of Smotif geometries in protein structures and how it opens new avenues in protein structure modeling and design. As a first application of these observations we describe our loop conformation modeling algorithm, ArchPred that takes advantage of Smotifs classification. In this application, instead of focusing on specific loop properties the method narrows down possible template conformations in other, often not homologous structures, by identifying the most likely supersecondary structure environment that cradles the loop. Beyond identifying the correct starting supersecondary structure geometry, it takes into account information of fit of anchor residues, sterical clashes, match of predicted and observed dihedral angle preferences, and local sequence signal.
Secondary structure; Supersecondary Structure; Smotif; Loop modeling; Protein Structure Evolution; Protein Structure Modeling; Protein Structure Design
Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation.
We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods.
We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
Functional site; Catalytic residues; Neural network; Feature selection; Structural genomics
Gene regulatory networks show robustness to perturbations. Previous works identified robustness as an emergent property of gene network evolution but the underlying molecular mechanisms are poorly understood. We used a multi-tier modeling approach that integrates molecular sequence and structure information with network architecture and population dynamics. Structural models of transcription factor-DNA complexes are used to estimate relative binding specificities. In this model, mutations in the DNA cause changes on two levels: (a) at the sequence level in individual binding sites (modulating binding specificity), and (b) at the network level (creating and destroying binding sites). We used this model to dissect the underlying mechanisms responsible for the evolution of robustness in gene regulatory networks. Results suggest that in sparse architectures (represented by short promoters), a mixture of local-sequence and network-architecture level changes are exploited. At the local-sequence level, robustness evolves by decreasing the probabilities of both the destruction of existent and generation of new binding sites. Meanwhile, in highly interconnected architectures (represented by long promoters), robustness evolves almost entirely via network level changes, deleting and creating binding sites that modify the network architecture.
Development from egg to embryo depends to a large extent on regulatory networks of genes called transcription factors. Previous research has shown these gene regulatory networks to be robust to perturbations at the level of the connections between transcription factors. Here, we investigate the mechanisms underlying the evolution of robustness in gene networks using a modeling approach, which considers three levels: binding of individual transcription factors to DNA, dynamics of gene expression levels, and fitness effects at the population level. In our model the gene regulatory network is determined by transcription factor binding sites within DNA sequences, which undergo mutation. We categorize these mutations in a continuum ranging from silent mutations, which have no effect on regulation and change only the DNA sequence (local-sequence level), to mutations that change connections between genes in the network (network-architecture level). We find that in sparse networks, containing few connections between genes, a balance of local-sequence and network-architecture level mechanisms are responsible for the evolution of robustness, but when the network is densely connected the network-architecture level mechanisms become dominant. We argue that the shift towards the network-architecture level for more densely-connected networks offers a potential explanation for the evolution of increased complexity.
Differential detergent fractionation (DDF) is frequently used to partition fresh cells and tissues into distinct compartments. We have tested whether DDF can reproducibly extract and fractionate cellular protein components from frozen tissues. Frozen kidneys were sequentially extracted with three different buffer systems. Analysis of the three fractions with LC-MS/MS identified 1,693 proteins, some of which were common to all fractions and others unique to specific fractions. Normalized spectral index values (SIN) obtained from these data were compared in order to evaluate both the reproducibility of the method as well as the efficiency of enrichment. SIN values between replicate fractions demonstrated a high correlation, confirming the reproducibility of the method. Correlation coefficients across the three fractions were significantly lower than those for the replicates, supporting the capability of DDF to differentially fractionate proteins into separate compartments. Subcellular annotation of the proteins identified in each fraction demonstrated a significant enrichment of cytoplasmic, cell membrane and nuclear proteins in the three respective buffer system fractions. We conclude that DDF can be applied to frozen tissue to generate reproducible proteome coverage discriminating subcellular compartments. This demonstrates the feasibility of analyzing cellular compartment specific proteins in archived tissue samples with the simple DDF method.
Differential detergent fractionation; Normalized spectral index; Frozen tissue; Subcellular location
Mass spectrometry analysis of cross-linked peptides can be used to probe protein contact sites in macromolecular complexes. We have developed a photo-cleavable cross-linker that enhances peptide enrichment, improving the signal-to-noise ratio of the cross-linked peptides in mass spectrometry analysis. This cross-linker utilizes nitro-benzyl alcohol group that can be cleaved by UV irradiation and is stable during the multiple washing steps used for peptide enrichment. The enrichment method utilizes a cross-linker that aids in eliminating contamination resulting from protein based retrieval systems, and thus, facilitates the identification of cross-linked peptides. Homodimeric pilM protein from Pseudomonas aeruginosa 2192 (pilM) was investigated to test the specificity and experimental conditions. As predicted, the known pair of lysine side chains within 14Å was cross-linked. An unexpected cross-link involving the protein’s amino terminus was also detected. This is consistent with the predicted mobility of the amino terminus that may bring the amino groups within 19Å of one another in solution. These technical improvements allow this method to be used for investigating protein-protein interactions in complex biological samples.
cross-link; enrichment; photo-cleavable; transient protein complex
The X-ray structure of a putative BenF-like (gene name: PFL1329) protein from Pseudomonas fluorescens Pf-5 (PflBenF) has been determined at 2.6Å resolution. X-ray crystallography revealed a canonical 18-stranded β-barrel fold that forms a central pore with a diameter of ∼4.6Å, which is consistent with the size and physicochemical properties of the presumed aromatic acid substrate, benzoate. Detailed comparisons with the previously-determined structure of Pseudomonas aeruginosa OpdK, a vanillate influx channel, revealed an arginine-rich aromatic acid selectivity filter of nearly identical structure composed of seven highly conserved residues Arg∼Asp∼Arg∼Arg∼Ser∼Asp∼Arg (R∼D∼R∼R∼S∼D∼R sequence motif, where ∼ denotes intervening residues) that define the narrowest part of the pore.
BenF-like; substrate specific porin; OprD superfamily; OprD subfamily; OpdK subfamily; benzoate; Pseudomonas; integral membrane protein
Reciprocal interactions between glia and neurons are essential for the proper organization and function of the nervous system. Recently, the interaction between ErbB receptors (ErbB2 and ErbB3) on the surface of Schwann cells and neuronal Neuregulin-1 (NRG1) has emerged as the pivotal signal that controls Schwann cell development, association with axons, and myelination. To understand the function of NRG1-ErbB2/3 signaling axis in adult Schwann cell biology we are studying the specific role of ErbB3 receptor tyrosine kinase (RTK) since it is the receptor for NRG1 on the surface of Schwann cells. Here we show that alternative transcription initiation results in the formation of a nuclear variant of ErbB3 (nuc-ErbB3) in rat primary Schwann cells. Nuc-ErbB3 possesses a functional nuclear localization signal sequence and binds to chromatin. Using ChIP-ChIP arrays we identified the promoters that associate with nuc-ErbB3 and clustered the active promoters in Schwann cell gene expression. Nuc-ErbB3 regulates the transcriptional activity of ezrin and HMGB1 promoters while inhibition of nuc-ErbB3 expression results in reduced myelination and altered distribution of ezrin in the nodes of Ranvier. Finally, we reveal that NRG1 regulates the translation of nuc-ErbB3 in rat Schwann cells. For the first time, to our knowledge, we show that alternative transcription initiation from a gene that encodes a RTK is capable to generate a protein variant of the receptor with a distinct role in molecular and cellular regulation. We propose a new concept for the molecular regulation of myelination through the expression and distinct role of nuc-ErbB3.
ErbB3; Schwann cells; myelination; nodes; transcription; signaling
VISTA suppresses T cell proliferation and cytokine production and can influence autoimmunity and antitumor responses in mice.
The immunoglobulin (Ig) superfamily consists of many critical immune regulators, including the B7 family ligands and receptors. In this study, we identify a novel and structurally distinct Ig superfamily inhibitory ligand, whose extracellular domain bears homology to the B7 family ligand PD-L1. This molecule is designated V-domain Ig suppressor of T cell activation (VISTA). VISTA is primarily expressed on hematopoietic cells, and VISTA expression is highly regulated on myeloid antigen-presenting cells (APCs) and T cells. A soluble VISTA-Ig fusion protein or VISTA expression on APCs inhibits T cell proliferation and cytokine production in vitro. A VISTA-specific monoclonal antibody interferes with VISTA-induced suppression of T cell responses by VISTA-expressing APCs in vitro. Furthermore, anti-VISTA treatment exacerbates the development of the T cell–mediated autoimmune disease experimental autoimmune encephalomyelitis in mice. Finally, VISTA overexpression on tumor cells interferes with protective antitumor immunity in vivo in mice. These findings show that VISTA, a novel immunoregulatory molecule, has functional activities that are nonredundant with other Ig superfamily members and may play a role in the development of autoimmunity and immune surveillance in cancer.
Toxoplasma gondii is an apicomplexan of both medical and veterinary importance which is classified as an NIH Category B priority pathogen. It is best known for its ability to cause congenital infection in immune competent hosts and encephalitis in immune compromised hosts. The highly stable and specialized microtubule-based cytoskeleton participates in the invasion process. The genome encodes three isoforms of both α- and β-tubulin and we show that the tubulin is extensively altered by specific post-translational modifications (PTMs) in this paper. T. gondii tubulin PTMs were analyzed by mass spectrometry and immunolabeling using specific antibodies. The PTMs identified on α-tubulin included acetylation of Lys40, removal of the last C-terminal amino acid residue Tyr453 (detyrosinated tubulin) and truncation of the last five amino acid residues. Polyglutamylation was detected on both α- and β-tubulins. An antibody directed against mammalian α-tubulin lacking the last two C-terminal residues (Δ2-tubulin) labeled the apical region of this parasite. Detyrosinated tubulin was diffusely present in subpellicular microtubules and displayed an apparent accumulation at the basal end. Methylation, a PTM not previously described on tubulin, was also detected. Methylated tubulins were not detected in the host cells, human foreskin fibroblasts, suggesting that this may be a modification specific to the Apicomplexa.
Toxoplasma gondii; cytoskeleton; tubulin; post-translational modification; proteomics; microtubules; conoid
The microtubule cytoskeleton has proven to be an effective target for cancer therapeutics. One class of drugs, known as microtubule stabilizing agents (MSAs), binds to microtubule polymers and stabilizes them against depolymerization. The prototype of this group of drugs, Taxol, is an effective chemotherapeutic agent used extensively in the treatment of human ovarian, breast, and lung carcinomas. Although electron crystallography and photoaffinity labeling experiments determined that the binding site for Taxol is in a hydrophobic pocket in β-tubulin, little was known about the effects of this drug on the conformation of the entire microtubule. A recent study from our laboratory utilizing hydrogen-deuterium exchange (HDX) in concert with various mass spectrometry (MS) techniques has provided new information on the structure of microtubules upon Taxol binding. In the current study we apply this technique to determine the binding mode and the conformational effects on chicken erythrocyte tubulin (CET) of another MSA, discodermolide, whose synthetic analogues may have potential use in the clinic. We confirmed that like Taxol, discodermolide binds to the taxane binding pocket in β-tubulin. However, as opposed to Taxol, which has major interactions with the M-loop, discodermolide orients itself away from this loop and towards the N-terminal H1–S2 loop. Additionally, discodermolide stabilizes microtubules mainly via its effects on interdimer contacts, specifically on the α-tubulin side, and to a lesser extent on interprotofilament contacts between adjacent β-tubulin subunits. Also, our results indicate complementary stabilizing effects of Taxol and discodermolide on the microtubules, which may explain the synergy observed between the two drugs in vivo.
microtubules; discodermolide; Taxol; mass spectrometry; hydrogen-deuterium exchange
X-linked dyskeratosis congenita (DC) is a rare bone marrow failure syndrome caused by mostly missense mutations in the pseudouridine synthase NAP57 (dyskerin/Cbf5). As part of H/ACA ribonucleoproteins (RNPs), NAP57 is important for the biogenesis of ribosomes, spliceosomal small nuclear RNPs, microRNAs and the telomerase RNP. DC mutations concentrate in the N- and C-termini of NAP57 but not in its central catalytic domain raising questions as to their impact. We demonstrate that the N- and C-termini together form the binding surface for the H/ACA RNP assembly factor SHQ1 and that DC mutations modulate the interaction between the two proteins. Pinpointing impaired interaction between NAP57 and SHQ1 as a potential molecular basis for X-linked DC has implications for therapeutic approaches, e.g. by targeting the NAP57–SHQ1 interface with small molecules.
One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centres, targets representatives from large, structurally uncharacterised protein domain families, and from structurally uncharacterised subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly over-represented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first three years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region.
Structural genomics efforts aim at exploring the repertoire of three-dimensional structures of protein molecules. While genome scale sequencing projects have already provided us with all the genes of many organisms, it is the three dimensional shape of gene encoded proteins that defines all the interactions among these components. Understanding the versatility and, ultimately, the role of all possible molecular shapes in the cell is a necessary step toward understanding how organisms function. In this work we explored the rules that identify certain shapes as novel compared to all already known structures. The findings of this work provide possible insights into the rules that can be used in future works to identify or design new molecular shapes or to relate folds with each other in a quantitative manner.
Toxoplasma gondii is a ubiquitous, Apicomplexan parasite that, in humans, can cause several clinical syndromes, including encephalitis, chorioretinitis and congenital infection. T. gondii was described a little over 100 years ago in the tissues of the gundi (Ctenodoactylus gundi). There are a large number of applicable experimental techniques available for this pathogen and it has become a model organism for the study of intracellular pathogens. With the completion of the genomes for a type I (GT-1), type II (ME49) and type III (VEG) strains, proteomic studies on this organism have been greatly facilitated. Several subcellular proteomic studies have been completed on this pathogen. These studies have helped elucidate specialized invasion organelles and their composition, as well as proteins associated with the cytoskeleton. Global proteomic studies are leading to improved strategies for genome annotation in this organism and an improved understanding of protein regulation in this pathogen. Web-based resources, such as EPIC-DB and ToxoDB, provide proteomic data and support for studies on T. gondii. This review will summarize the current status of proteomic research on T. gondii.
Apicomplexa; cell biology; genome; proteomic; Toxoplasma gondii
Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment.
The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials.
Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.