Search tips
Search criteria

Results 1-25 (123)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  S-Nitrosylation—Mediated Redox Transcriptional Switch Modulates Neurogenesis and Neuronal Cell Death 
Cell reports  2014;8(1):217-228.
Redox-mediated posttranslational modifications represent a molecular switch that controls major mechanisms of cell function. Nitric oxide (NO) can mediate redox reactions via S-nitrosylation, representing transfer of an NO group to a critical protein thiol. NO is known to modulate neurogenesis and neuronal survival in various brain regions in disparate neurodegenerative conditions. However, a unifying molecular mechanism linking these phenomena remains unknown. Here we report that S-nitrosylation of myocyte enhancer factor 2 (MEF2) transcription factors acts as a redox switch to inhibit both neurogenesis and neuronal survival. Structure-based analysis reveals that MEF2 dimerization creates a pocket, facilitating S-nitrosylation at an evolutionally conserved cysteine residue in the DNA binding domain. S-Nitrosylation disrupts MEF2-DNA binding and transcriptional activity, leading to impaired neurogenesis and survival in vitro and in vivo. Our data define a novel molecular switch whereby redox-mediated posttranslational modification controls both neurogenesis and neurodegeneration via a single transcriptional signaling cascade.
PMCID: PMC4114155  PMID: 25001280
2.  Crystal structure of a putative quorum sensing-regulated protein (PA3611) from the Pseudomonas-specific DUF4146 family 
Proteins  2013;82(6):1086-1092.
Pseudomonas aeruginosa is an opportunistic pathogen commonly found in humans and other organisms and is an important cause of infection, especially in patients with compromised immune defense mechanisms. The PA3611 gene of P. aeruginosa PAO1 encodes a secreted protein of unknown function, which has been recently classified into a small Pseudomonas-specific protein family called DUF4146. As part of our effort to extend structural coverage of novel protein space and provide a structure-based functional insight into new protein families, we report the crystal structure of PA3611, the first structural representative of the DUF4146 protein family.
PMCID: PMC4006323  PMID: 24174223
Pseudomonas-specific protein family; DUF4146; Pfam PF13652; virulence factor; quorum-sensing; JCSG; structural genomics
3.  Evolution of the Animal Apoptosis Network 
The number of available eukaryotic genomes has expanded to the point where we can evaluate the complete evolutionary history of many cellular processes. Such analyses for the apoptosis regulatory networks suggest that this network already existed in the ancestor of the entire animal kingdom (Metazoa) in a form more complex than in some popular animal model organisms. This supports the growing realization that regulatory networks do not necessarily evolve from simple to complex and that the relative simplicity of these networks in nematodes and insects does not represent an ancestral state, but is the result of secondary simplifications. Network evolution is not a process of monotonous increase in complexity, but a dynamic process that includes lineage-specific gene losses and expansions, protein domain reshuffling, and emergence/reemergence of similar protein architectures by parallel evolution. Studying the evolution of such networks is a challenging yet interesting subject for research and investigation, and such studies on the apoptosis networks provide us with interesting hints of how these networks, critical in so many human diseases, have developed.
Regulatory networks do not necessarily evolve from simple to complex. The apoptosis machinery may have been more complex in ancestral organisms than it is in modern model organisms (e.g., C. elegans).
PMCID: PMC3578353  PMID: 23457257
4.  FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking 
Bioinformatics  2013;30(5):660-667.
Motivation: Homology detection enables grouping proteins into families and prediction of their structure and function. The range of application of homology-based predictions can be significantly extended by using sequence profiles and incorporation of local structural features. However, incorporation of the latter terms varies a lot between existing methods, and together with many examples of distant relations not recognized even by the best methods, suggests that further improvements are still possible.
Results: Here we describe recent improvements to the fold and function assignment system (FFAS) method, including adding optimized structural features (experimental or predicted), ‘symmetrical’ Z-score calculation and re-ranking the templates with a neural network. The alignment accuracy in the new FFAS-3D is now 11% higher than the original and comparable with the most accurate template-based structure prediction algorithms. At the same time, FFAS-3D has high success rate at the Structural Classification of Proteins (SCOP) family, superfamily and fold levels. Importantly, FFAS-3D results are not highly correlated with other programs suggesting that it may significantly improve meta-predictions. FFAS-3D does not require 3D structures of the templates, as using predicted features instead of structure-derived does not lead to the decrease of accuracy. Because of that, FFAS-3D can be used for databases other than Protein Data Bank (PDB) such as Protein families database or Clusters of orthologous groups thus extending its applications to functional annotations of genomes and protein families.
Availability and implementation: FFAS-3D is available at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3933871  PMID: 24130308
5.  Improving the chances of successful protein structure determination with a random forest classifier 
Using an extended set of protein features calculated separately for protein surface and interior, a new version of XtalPred based on a random forest classifier achieves a significant improvement in predicting the success of structure determination from the primary amino-acid sequence.
Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007 ▶), Protein Sci. 16, 2472–2482] was developed. XtalPred classifies proteins into five ‘crystallization classes’ based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The new XtalPred-RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred-RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted targets.
PMCID: PMC3949519  PMID: 24598732
structural genomics; target selection; machine-learning methods; XtalPred
6.  Structure and sequence analyses of Bacteroides proteins BVU_4064 and BF1687 reveal presence of two novel predominantly-beta domains, predicted to be involved in lipid and cell surface interactions 
BMC Bioinformatics  2015;16(1):7.
N-terminal domains of BVU_4064 and BF1687 proteins from Bacteroides vulgatus and Bacteroides fragilis respectively are members of the Pfam family PF12985 (DUF3869). Proteins containing a domain from this family can be found in most Bacteroides species and, in large numbers, in all human gut microbiome samples. Both BVU_4064 and BF1687 proteins have a consensus lipobox motif implying they are anchored to the membrane, but their functions are otherwise unknown. The C-terminal half of BVU_4064 is assigned to protein family PF12986 (DUF3870); the equivalent part of BF1687 was unclassified.
Crystal structures of both BVU_4064 and BF1687 proteins, solved at the JCSG center, show strikingly similar three-dimensional structures. The main difference between the two is that the two domains in the BVU_4064 protein are connected by a short linker, as opposed to a longer insertion made of 4 helices placed linearly along with a strand that is added to the C-terminal domain in the BF1687 protein. The N-terminal domain in both proteins, corresponding to the PF12985 (DUF3869) domain is a β–sandwich with pre-albumin-like fold, found in many proteins belonging to the Transthyretin clan of Pfam. The structures of C-terminal domains of both proteins, corresponding to the PF12986 (DUF3870) domain in BVU_4064 protein and an unclassified domain in the BF1687 protein, show significant structural similarity to bacterial pore-forming toxins. A helix in this domain is in an analogous position to a loop connecting the second and third strands in the toxin structures, where this loop is implicated to play a role in the toxin insertion into the host cell membrane. The same helix also points to the groove between the N- and C-terminal domains that are loosely held together by hydrophobic and hydrogen bond interactions. The presence of several conserved residues in this region together with these structural determinants could make it a functionally important region in these proteins.
Structural analysis of BVU_4064 and BF1687 points to possible roles in mediating multiple interactions on the cell-surface/extracellular matrix. In particular the N-terminal domain could be involved in adhesive interactions, the C-terminal domain and the inter-domain groove in lipid or carbohydrate interactions.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0434-7) contains supplementary material, which is available to authorized users.
PMCID: PMC4387736  PMID: 25592227
DUF3869; DUF3870; Domain of unknown function; Protein structure; Beta-sandwich; Membrane-associated protein; Transthyretin superfamily; Bacterial pore-forming toxins
7.  Structures of a bifunctional cell-wall hydrolase CwlT containing a novel bacterial lysozyme and an NlpC/P60 dl-endopeptidase 
Journal of molecular biology  2013;426(1):10.1016/j.jmb.2013.09.011.
Tn916-like conjugative transposons carrying antibiotic resistance genes are found in a diverse range of bacteria. Orf14 within the conjugation module encodes a bifunctional cell-wall hydrolase CwlT that consists of an N-terminal bacterial lysozyme domain (N-acetylmuramidase, bLysG) and a C-terminal NlpC/P60 domain (γ-d-glutamyl-l-diamino acid endopeptidase) and is expected to play an important role in the spread of the transposons. We determined the crystal structures of two CwlT from pathogens Staphylococcus aureus mu50 (SaCwlT) and Clostridium difficile 630 (CdCwlT). These structures reveal that NlpC/P60 and LysG domains are compact and conserved modules, connected by a short flexible linker. The LysG domain represents a novel family of widely distributed bacterial lysozymes. The overall structure and the active site of bLysG bear significant similarity to other members of the glycoside hydrolase family 23 (GH23), such as the g-type lysozyme (LysG) and Escherichia coli lytic transglycosylase MltE. The active site of bLysG contains a unique structural and sequence signature (DxxQSSES+S) that is important for coordinating a catalytic water. Molecular modeling suggests that the bLysG domain may recognize glycan in a similar manner to MltE. The C-terminal NlpC/P60 domain contains a conserved active site (Cys-His-His-Tyr) that appears to be specific for tetrapeptide. Access to the active site is likely regulated by isomerism of a side chain atop the catalytic cysteine, allowing substrate entry or product release, or closing during catalysis.
PMCID: PMC3872209  PMID: 24051416
bifunctional cell-wall lysin; bacterial lysozyme; muramidase; NlpC/P60 endopeptidase; Tn916 family conjugative transposons
8.  Analysis of Individual Protein Regions Provides Novel Insights on Cancer Pharmacogenomics 
PLoS Computational Biology  2015;11(1):e1004024.
The promise of personalized cancer medicine cannot be fulfilled until we gain better understanding of the connections between the genomic makeup of a patient's tumor and its response to anticancer drugs. Several datasets that include both pharmacologic profiles of cancer cell lines as well as their genomic alterations have been recently developed and extensively analyzed. However, most analyses of these datasets assume that mutations in a gene will have the same consequences regardless of their location. While this assumption might be correct in some cases, such analyses may miss subtler, yet still relevant, effects mediated by mutations in specific protein regions. Here we study such perturbations by separating effects of mutations in different protein functional regions (PFRs), including protein domains and intrinsically disordered regions. Using this approach, we have been able to identify 171 novel associations between mutations in specific PFRs and changes in the activity of 24 drugs that couldn't be recovered by traditional gene-centric analyses. Our results demonstrate how focusing on individual protein regions can provide novel insights into the mechanisms underlying the drug sensitivity of cancer cell lines. Moreover, while these new correlations are identified using only data from cancer cell lines, we have been able to validate some of our predictions using data from actual cancer patients. Our findings highlight how gene-centric experiments (such as systematic knock-out or silencing of individual genes) are missing relevant effects mediated by perturbations of specific protein regions. All the associations described here are available from
Author Summary
There is increasing evidence that altering different functional regions within the same protein can lead to dramatically distinct phenotypes. Here we show how, by focusing on individual regions instead of whole proteins, we are able to identify novel correlations that predict the activity of anticancer drugs. We have also used proteomic data from both cancer cell lines and actual cancer patients to explore the molecular mechanisms underlying some of these region-drug associations. We finally show how associations found between protein regions and drugs using only data from cancer cell lines can predict the survival of cancer patients.
PMCID: PMC4287345  PMID: 25568936
9.  Crystal structure of a member of a novel family of dioxygenases (PF10014) reveals a conserved cupin fold and active site 
Proteins  2013;82(1):164-170.
PF10014 is a novel family of 2-oxyglutarate-Fe2+-dependent dioxygenases that are involved in biosynthesis of antibiotics and regulation of biofilm formation, likely by catalyzing hydroxylation of free amino acids or other related ligands. The crystal structure of a PF10014 member from Methylibium petroleiphilum at 1.9 Å resolution shows strong structural similarity to cupin dioxygenases in overall fold and active site, despite very remote homology. However, one of the β-strands of the cupin catalytic core is replaced by a loop that displays conformational isomerism that likely regulates the active site.
PMCID: PMC3920835  PMID: 23852666
PF10014/BsmA; cupin dioxygenase; free amino acids; 2-oxyglutarate; ferrous iron
10.  Cancer3D: understanding cancer mutations through protein structures 
Nucleic Acids Research  2014;43(Database issue):D968-D973.
The new era of cancer genomics is providing us with extensive knowledge of mutations and other alterations in cancer. The Cancer3D database at gives an open and user-friendly way to analyze cancer missense mutations in the context of structures of proteins in which they are found. The database also helps users analyze the distribution patterns of the mutations as well as their relationship to changes in drug activity through two algorithms: e-Driver and e-Drug. These algorithms use knowledge of modular structure of genes and proteins to separately study each region. This approach allows users to find novel candidate driver regions or drug biomarkers that cannot be found when similar analyses are done on the whole-gene level. The Cancer3D database provides access to the results of such analyses based on data from The Cancer Genome Atlas (TCGA) and the Cancer Cell Line Encyclopedia (CCLE). In addition, it displays mutations from over 14 700 proteins mapped to more than 24 300 structures from PDB. This helps users visualize the distribution of mutations and identify novel three-dimensional patterns in their distribution.
PMCID: PMC4383948  PMID: 25392415
11.  Phylogenomic analysis of glycogen branching and debranching enzymatic duo 
Branched polymers of glucose are universally used for energy storage in cells, taking the form of glycogen in animals, fungi, Bacteria, and Archaea, and of amylopectin in plants. Some enzymes involved in glycogen and amylopectin metabolism are similarly conserved in all forms of life, but some, interestingly, are not. In this paper we focus on the phylogeny of glycogen branching and debranching enzymes, respectively involved in introducing and removing of the α(1–6) bonds in glucose polymers, bonds that provide the unique branching structure to glucose polymers.
We performed a large-scale phylogenomic analysis of branching and debranching enzymes in over 400 completely sequenced genomes, including more than 200 from eukaryotes. We show that branching and debranching enzymes can be found in all kingdoms of life, including all major groups of eukaryotes, and thus were likely to have been present in the last universal common ancestor (LUCA) but have been lost in seemingly random fashion in numerous single-celled eukaryotes. We also show how animal branching and debranching enzymes evolved from their LUCA ancestors by acquiring additional domains. Furthermore, we show that enzymes commonly perceived as orthologous, such as human branching enzyme GBE1 and E. coli branching enzyme GlgB, are in fact related by a gene duplication and consequently paralogous.
Despite being usually associated with animal liver glycogen and plant starch, energy storage in the form of branched glucose polymers is clearly an ancient process and has probably been present in the last universal common ancestor of all present life. The evolution of the enzymes enabling this form of energy storage is more complex than previously thought and illustrates the need for explicit phylogenomic analysis in the study of even seemingly “simple” metabolic enzymes. Patterns of conservation in the evolution of the glycogen/starch branching and debranching enzymes hint at some as yet unknown mechanisms, as mutations disrupting these patterns lead to a variety of genetic diseases in humans and other mammals.
PMCID: PMC4236520  PMID: 25148856
Glycogen; Starch; Branching; Debranching; Glycogen storage disease; AGL; GBE1; GlgB; GlgX; TreX
12.  PubServer: literature searches by homology 
Nucleic Acids Research  2014;42(Web Server issue):W430-W435.
PubServer, available at, is a tool to automatically collect, filter and analyze publications associated with groups of homologous proteins. Protein entries in databases such as Entrez Protein database at NCBI contain information about publications associated with a given protein. The scope of these publications varies a lot: they include studies focused on biochemical functions of individual proteins, but also reports from genome sequencing projects that introduce tens of thousands of proteins. Collecting and analyzing publications related to sets of homologous proteins help in functional annotation of novel protein families and in improving annotations of well-studied protein families or individual genes. However, performing such collection and analysis manually is a tedious and time-consuming process. PubServer automatically collects identifiers of homologous proteins using PSI-Blast, retrieves literature references from corresponding database entries and filters out publications unlikely to contain useful information about individual proteins. It also prepares simple vocabulary statistics from titles, abstracts and MeSH terms to identify the most frequently occurring keywords, which may help to quickly identify common themes in these publications. The filtering criteria applied to collected publications are user-adjustable. The results of the server are presented as an interactive page that allows re-filtering and different presentations of the output.
PMCID: PMC4086066  PMID: 24957597
13.  Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of Glycoside Hydrolase 
BMC Bioinformatics  2014;15:196.
Gut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain.
Genomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded β/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333.
We suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity.
PMCID: PMC4071793  PMID: 24938123
Carbohydrate metabolism; Glycoside hydrolase; Polysaccharide Utilization Locus; PUL; Protein function prediction; JCSG; 3D structure; Protein family; Gut microbiota
14.  Structure and Function of a Novel ld-Carboxypeptidase A Involved in Peptidoglycan Recycling 
Journal of Bacteriology  2013;195(24):5555-5566.
Approximately 50% of cell wall peptidoglycan in Gram-negative bacteria is recycled with each generation. The primary substrates used for peptidoglycan biosynthesis and recycling in the cytoplasm are GlcNAc-MurNAc(anhydro)-tetrapeptide and its degradation product, the free tetrapeptide. This complex process involves ∼15 proteins, among which the cytoplasmic enzyme ld-carboxypeptidase A (LdcA) catabolizes the bond between the last two l- and d-amino acid residues in the tetrapeptide to form the tripeptide, which is then utilized as a substrate by murein peptide ligase (Mpl). LdcA has been proposed as an antibacterial target. The crystal structure of Novosphingobium aromaticivorans DSM 12444 LdcA (NaLdcA) was determined at 1.89-Å resolution. The enzyme was biochemically characterized and its interactions with the substrate modeled, identifying residues potentially involved in substrate binding. Unaccounted electron density at the dimer interface in the crystal suggested a potential site for disrupting protein-protein interactions should a dimer be required to perform its function in bacteria. Our analysis extends the identification of functional residues to several other homologs, which include enzymes from bacteria that are involved in hydrocarbon degradation and destruction of coral reefs. The NaLdcA crystal structure provides an alternate system for investigating the structure-function relationships of LdcA and increases the structural coverage of the protagonists in bacterial cell wall recycling.
PMCID: PMC3889619  PMID: 24123814
15.  POSA: a user-driven, interactive multiple protein structure alignment server 
Nucleic Acids Research  2014;42(Web Server issue):W240-W245.
POSA (Partial Order Structure Alignment), available at, is a server for multiple protein structure alignment introduced in 2005 (Ye,Y. and Godzik,A. (2005) Multiple flexible structure alignment using partial order graphs. Bioinformatics, 21, 2362–2369). It is free and open to all users, and there is no login requirement, albeit there is an option to register and store results in individual, password-protected directories. In the updated POSA server described here, we introduce two significant improvements. First is an interface allowing the user to provide additional information by defining segments that anchor the alignment in one or more input structures. This interface allows users to take advantage of their intuition and biological insights to improve the alignment and guide it toward a biologically relevant solution. The second improvement is an interactive visualization with options that allow the user to view all superposed structures in one window (a typical solution for visualizing results of multiple structure alignments) or view them individually in a series of synchronized windows with extensive, user-controlled visualization options. The user can rotate structure(s) in any of the windows and study similarities or differences between structures clearly visible in individual windows.
PMCID: PMC4086100  PMID: 24838569
16.  AIDA: ab initio domain assembly server 
Nucleic Acids Research  2014;42(Web Server issue):W308-W313.
AIDA: ab initio domain assembly server, available at is a tool that can identify domains in multi-domain proteins and then predict their 3D structures and relative spatial arrangements. The server is free and open to all users, and there is an option for a user to provide an e-mail to get the link to result page. Domains are evolutionary conserved and often functionally independent units in proteins. Most proteins, especially eukaryotic ones, consist of multiple domains while at the same time, most experimentally determined protein structures contain only one or two domains. As a result, often structures of individual domains in multi-domain proteins can be accurately predicted, but the mutual arrangement of different domains remains unknown. To address this issue we have developed AIDA program, which combines steps of identifying individual domains, predicting (separately) their structures and assembling them into multiple domain complexes using an ab initio folding potential to describe domain–domain interactions. AIDA server not only supports the assembly of a large number of continuous domains, but also allows the assembly of domains inserted into other domains. Users can also provide distance restraints to guide the AIDA energy minimization.
PMCID: PMC4086082  PMID: 24831546
17.  Divergent evolution of protein conformational dynamics in dihydrofolate reductase 
Nature structural & molecular biology  2013;20(11):10.1038/nsmb.2676.
Molecular evolution is driven by mutations, which may affect the fitness of an organism and are then subject to natural selection or genetic drift. Analysis of primary protein sequences and tertiary structures has yielded valuable insights into the evolution of protein function, but little is known about evolution of functional mechanisms, protein dynamics and conformational plasticity essential for activity. We characterized the atomic-level motions across divergent members of the dihydrofolate reductase (DHFR) family. Despite structural similarity, E. coli and human DHFRs use different dynamic mechanisms to perform the same function, and human DHFR cannot complement DHFR-deficient E. coli cells. Identification of the primary sequence determinants of flexibility in DHFRs from several species allowed us to propose a likely scenario for the evolution of functionally important DHFR dynamics, following a pattern of divergent evolution that is tuned by the cellular environment.
PMCID: PMC3823643  PMID: 24077226
18.  ConSole: using modularity of Contact maps to locate Solenoid domains in protein structures 
BMC Bioinformatics  2014;15:119.
Periodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins.
ConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families.
The ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at
PMCID: PMC4021314  PMID: 24766872
Protein repeat detection; Solenoid structure; Contact map; Template matching; Machine learning
19.  Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase 
BMC Bioinformatics  2014;15:112.
Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism.
BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications.
Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.
PMCID: PMC4032388  PMID: 24742328
Glycoside hydrolase; Carbohydrate metabolism; 3D structure; Protein family; Protein function prediction; Domain of unknown function; DUF
20.  Structural systems biology: from bacterial to cancer networks 
BMC Genomics  2014;15(Suppl 2):O14.
PMCID: PMC4075449
21.  Structure and computational analysis of a novel protein with metallopeptidase-like and circularly permuted winged-helix-turn-helix domains reveals a possible role in modified polysaccharide biosynthesis 
BMC Bioinformatics  2014;15:75.
CA_C2195 from Clostridium acetobutylicum is a protein of unknown function. Sequence analysis predicted that part of the protein contained a metallopeptidase-related domain. There are over 200 homologs of similar size in large sequence databases such as UniProt, with pairwise sequence identities in the range of ~40-60%. CA_C2195 was chosen for crystal structure determination for structure-based function annotation of novel protein sequence space.
The structure confirmed that CA_C2195 contained an N-terminal metallopeptidase-like domain. The structure revealed two extra domains: an α+β domain inserted in the metallopeptidase-like domain and a C-terminal circularly permuted winged-helix-turn-helix domain.
Based on our sequence and structural analyses using the crystal structure of CA_C2195 we provide a view into the possible functions of the protein. From contextual information from gene-neighborhood analysis, we propose that rather than being a peptidase, CA_C2195 and its homologs might play a role in biosynthesis of a modified cell-surface carbohydrate in conjunction with several sugar-modification enzymes. These results provide the groundwork for the experimental verification of the function.
PMCID: PMC4000134  PMID: 24646163
CA_C2195; Peptidase; DUF4910; DUF2172; HTH_47; Structural genomics
22.  Polysaccharides utilization in human gut bacterium Bacteroides thetaiotaomicron: comparative genomics reconstruction of metabolic and regulatory networks 
BMC Genomics  2013;14:873.
Bacteroides thetaiotaomicron, a predominant member of the human gut microbiota, is characterized by its ability to utilize a wide variety of polysaccharides using the extensive saccharolytic machinery that is controlled by an expanded repertoire of transcription factors (TFs). The availability of genomic sequences for multiple Bacteroides species opens an opportunity for their comparative analysis to enable characterization of their metabolic and regulatory networks.
A comparative genomics approach was applied for the reconstruction and functional annotation of the carbohydrate utilization regulatory networks in 11 Bacteroides genomes. Bioinformatics analysis of promoter regions revealed putative DNA-binding motifs and regulons for 31 orthologous TFs in the Bacteroides. Among the analyzed TFs there are 4 SusR-like regulators, 16 AraC-like hybrid two-component systems (HTCSs), and 11 regulators from other families. Novel DNA motifs of HTCSs and SusR-like regulators in the Bacteroides have the common structure of direct repeats with a long spacer between two conserved sites.
The inferred regulatory network in B. thetaiotaomicron contains 308 genes encoding polysaccharide and sugar catabolic enzymes, carbohydrate-binding and transport systems, and TFs. The analyzed TFs control pathways for utilization of host and dietary glycans to monosaccharides and their further interconversions to intermediates of the central metabolism. The reconstructed regulatory network allowed us to suggest and refine specific functional assignments for sugar catabolic enzymes and transporters, providing a substantial improvement to the existing metabolic models for B. thetaiotaomicron. The obtained collection of reconstructed TF regulons is available in the RegPrecise database (
PMCID: PMC3878776  PMID: 24330590
Regulatory network; Regulon; Transcription factor; BACTEROIDES; Carbohydrate utilization
23.  Structural systems biology evaluation of metabolic thermotolerance in Escherichia coli 
Science (New York, N.Y.)  2013;340(6137):1220-1223.
Genome scale network reconstruction has enabled predictive modeling of metabolism for many systems. Traditionally, protein structural information has not been represented in such reconstructions. Expanding a genome-scale model of Escherichia coli metabolism by including experimental and predicted protein structures enabled the analysis of protein thermostability in a network context, allowing prediction of protein activities that limit network function at super-optimal temperature and mechanistic interpretations of mutations found in strains adapted to heat. Predicted growth-limiting factors for thermotolerance were validated through nutrient supplementation experiments and defined metabolic sensitivities to heat stress, providing evidence that metabolic enzyme thermostability is rate limiting at super-optimal temperature. Inclusion of structural information expanded the content and predictive capability of genome-scale metabolic networks enabling structural systems biology of metabolism.
PMCID: PMC3777776  PMID: 23744946
24.  LUD, a new protein domain associated with lactate utilization 
BMC Bioinformatics  2013;14:341.
A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family.
JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome.
We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.
PMCID: PMC3924224  PMID: 24274019
LUD; DUF162; LutB; LutC; Domain of unknown function; Deinococcus radiodurans
25.  bNAber: database of broadly neutralizing HIV antibodies 
Nucleic Acids Research  2013;42(Database issue):D1133-D1139.
The discovery of broadly neutralizing antibodies (bNAbs) has provided an enormous impetus to the HIV vaccine research and to entire immunology. The bNAber database at provides open, user-friendly access to detailed data on the rapidly growing list of HIV bNAbs, including neutralization profiles, sequences and three-dimensional structures (when available). It also provides an extensive list of visualization and analysis tools, such as heatmaps to analyse neutralization data as well as structure and sequence viewers to correlate bNAbs properties with structural and sequence features of individual antibodies. The goal of the bNAber database is to enable researchers in this field to easily compare and analyse available information on bNAbs thereby supporting efforts to design an effective vaccine for HIV/AIDS. The bNAber database not only provides easy access to data that currently is scattered in the Supplementary Materials sections of individual papers, but also contributes to the development of general standards of data that have to be presented with the discovery of new bNAbs and a universal mechanism of how such data can be shared.
PMCID: PMC3964981  PMID: 24214957

Results 1-25 (123)