Search tips
Search criteria

Results 1-25 (27)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  The promise and peril of chemical probes 
Nature chemical biology  2015;11(8):536-541.
Chemical probes are powerful reagents with increasing impacts on biomedical research. However, probes of poor quality or that are used incorrectly generate misleading results. To help address these shortcomings, we will create a community-driven wiki resource to improve quality and convey current best practice.
PMCID: PMC4706458  PMID: 26196764
2.  Optimizing Production of Antigens and Fabs in the Context of Generating Recombinant Antibodies to Human Proteins 
PLoS ONE  2015;10(10):e0139695.
We developed and optimized a high-throughput project workflow to generate renewable recombinant antibodies to human proteins involved in epigenetic signalling. Three different strategies to produce phage display compatible protein antigens in bacterial systems were compared, and we found that in vivo biotinylation through the use of an Avi tag was the most productive method. Phage display selections were performed on 265 in vivo biotinylated antigen domains. High-affinity Fabs (<20nM) were obtained for 196. We constructed and optimized a new expression vector to produce in vivo biotinylated Fabs in E. coli. This increased average yields up to 10-fold, with an average yield of 4 mg/L. For 118 antigens, we identified Fabs that could immunoprecipitate their full-length endogenous targets from mammalian cell lysates. One Fab for each antigen was converted to a recombinant IgG and produced in mammalian cells, with an average yield of 15 mg/L. In summary, we have optimized each step of the pipeline to produce recombinant antibodies, significantly increasing both efficiency and yield, and also showed that these Fabs and IgGs can be generally useful for chromatin immunoprecipitation (ChIP) protocols.
PMCID: PMC4593582  PMID: 26437229
3.  Biochemical characterization of FIKK8 – A unique protein kinase from the malaria parasite Plasmodium falciparum and other apicomplexans 
Graphical abstract
•We studied FIKK kinases from Plasmodium falciparum and Cryptosporidium parvum.•Soluble and active samples of PfFIKK8 and CpFIKK contain a N-terminal extension.•Both FIKK samples preferentially phosphorylated serines with flanking arginines.
FIKKs are protein kinases with distinctive sequence motifs found exclusively in Apicomplexa. Here, we report on the biochemical characterization of Plasmodium falciparum FIKK8 (PfFIKK8) and its Cryptosporidium parvum orthologue (CpFIKK) – the only member of the family predicted to be cytosolic and conserved amongst non-Plasmodium parasites. Recombinant protein samples of both were catalytically active. We characterized their phosphorylation ability using an enzymatic assay and substrate specificities using an arrayed positional scanning peptide library. Our results show that FIKK8 targets serine, preferably with arginine in the +3 and −3 positions. Furthermore, the soluble and active FIKK constructs in our experiments contained an N-terminal extension (NTE) conserved in FIKK8 orthologues from other apicomplexan species. Based on our results, we propose that this NTE is an integral feature of the FIKK subfamily.
PMCID: PMC4576209  PMID: 26112892
Apicomplexa; FIKK kinase; Kinase; Cryptosporidium
5.  Tail tip proteins related to bacteriophage λ gpL coordinate an iron-sulphur cluster 
Journal of molecular biology  2013;425(14):2450-2462.
The assembly of long non-contractile phage tails begins with the formation of the tail tip complex. Tail tip complexes are multi-functional protein structures that mediate host cell adsorption and genome injection. The tail tip complex of phage λ is assembled from multiple copies of eight different proteins, including gpL. Purified preparations of gpL and several homologues all displayed a distinct reddish colour, suggesting the binding of iron by these proteins. Further characterization the gpL homologue from phage N15, which was most amenable to in vitro analyses, showed that it contains two domains. The C-terminal domain was demonstrated to coordinate an iron-sulphur cluster, providing the first example of a viral structural protein binding to this type of metal group. We characterized the iron-sulphur cluster using inductively coupled plasma-atomic emission spectroscopy, absorbance spectroscopy, and electron paramagnetic resonance spectroscopy and found that it is an oxygen-sensitive [4Fe-4S]2+ cluster. Four highly conserved cysteine residues were shown to be required for coordinating the iron-sulphur cluster, and substitution of any of these Cys residues with Ser or Ala within the context of λ gpL abolished biological activity. These data imply that the intact iron-sulphur cluster is required for function. The presence of four conserved Cys residues in the C-terminal regions of very diverse gpL homologues suggest that utilization of an iron-sulphur cluster is a widespread feature of non-contractile tailed phages that infect Gram-negative bacteria. In addition, this is the first example of a viral structural protein that binds an iron-sulphur cluster.
PMCID: PMC4061613  PMID: 23542343
6.  Structural and Functional Studies of gpX of Escherichia coli Phage P2 Reveal a Widespread Role for LysM Domains in the Baseplates of Contractile-Tailed Phages 
Journal of Bacteriology  2013;195(24):5461-5468.
A variety of bacterial pathogenicity determinants, including the type VI secretion system and the virulence cassettes from Photorhabdus and Serratia, share an evolutionary origin with contractile-tailed myophages. The well-characterized Escherichia coli phage P2 provides an excellent system for studies related to these systems, as its protein composition appears to represent the “minimal” myophage tail. In this study, we used nuclear magnetic resonance (NMR) spectroscopy to determine the solution structure of gpX, a 68-residue tail baseplate protein. Although the sequence and structure of gpX are similar to those of LysM domains, which are a large family associated with peptidoglycan binding, we did not detect a peptidoglycan-binding activity for gpX. However, bioinformatic analysis revealed that half of all myophages, including all that possess phage T4-like baseplates, encode a tail protein with a LysM-like domain, emphasizing a widespread role for this domain in baseplate function. While phage P2 gpX comprises only a single LysM domain, many myophages display LysM domain fusions with other tail proteins, such as the DNA circulation protein found in Mu-like phages and gp53 of T4-like phages. Electron microscopy of P2 phage particles with an incorporated gpX-maltose binding protein fusion revealed that gpX is located at the top of the baseplate, near the junction of the baseplate and tail tube. gpW, the orthologue of phage T4 gp25, was also found to localize to this region. A general colocalization of LysM-like domains and gpW homologues in diverse phages is supported by our bioinformatic analysis.
PMCID: PMC3889624  PMID: 24097944
7.  Crystal structure of the CorA Mg2+ transporter 
Nature  2006;440(7085):10.1038/nature04642.
The magnesium ion, Mg2+, is essential for myriad biochemical processes and remains the only major biological ion whose transport mechanisms remain unknown. The CorA family of magnesium transporters is the primary Mg2+ uptake system of most prokaryotes1–3 and a functional homologue of the eukaryotic mitochondrial magnesium transporter4. Here we determine crystal structures of the full-length Thermotoga maritima CorA in an apparent closed state and its isolated cytoplasmic domain at 3.9 Å and 1.85Å resolution, respectively. The transporter is a funnel-shaped homopentamer with two transmembrane helices per monomer. The channel is formed by an inner group of five helices and putatively gated by bulky hydrophobic residues. The large cytoplasmic domain forms a funnel whose wide mouth points into the cell and whose walls are formed by five long helices that are extensions of the transmembrane helices. The cytoplasmic neck of the pore is surrounded, on the outside of the funnel, by a ring of highly conserved positively charged residues. Two negatively charged helices in the cytoplasmic domain extend back towards the membrane on the outside of the funnel and abut the ring of positive charge. An apparent Mg2+ ion was bound between monomers at a conserved site in the cytoplasmic domain, suggesting a mechanism to link gating of the pore to the intra-cellular concentration of Mg2+.
PMCID: PMC3836678  PMID: 16598263
9.  The Bacteriophage HK97 gp15 Moron Element Encodes a Novel Superinfection Exclusion Protein 
Journal of Bacteriology  2012;194(18):5012-5019.
A phage moron is a DNA element inserted between a pair of genes in one phage genome that are adjacent in other related phage genomes. Phage morons are commonly found within phage genomes, and in a number of cases, they have been shown to mediate phenotypic changes in the bacterial host. The temperate phage HK97 encodes a moron element, gp15, within its tail morphogenesis region that is absent in most closely related phages. We show that gp15 is actively expressed from the HK97 prophage and is responsible for providing the host cell with resistance to infection by phages HK97 and HK75, independent of repressor immunity. To identify the target(s) of this gp15-mediated resistance, we created a hybrid of HK97 and the related phage HK022. This hybrid phage revealed that the tail tube or tape measure proteins likely mediate the susceptibility of HK97 to inhibition by gp15. The N terminus of gp15 is predicted with high probability to contain a single membrane-spanning helix by several transmembrane prediction programs. Consistent with this putative membrane localization, gp15 acts to prevent the entry of phage DNA into the cytoplasm, acting in a manner reminiscent of those of several previously characterized superinfection exclusion proteins. The N terminus of gp15 and its phage homologues bear sequence similarity to YebO proteins, a family of proteins of unknown function found ubiquitously in enterobacteria. The divergence of their C termini suggests that phages have co-opted this bacterial protein and subverted its activity to their advantage.
PMCID: PMC3430355  PMID: 22797755
10.  A generalizable pre-clinical research approach for orphan disease therapy 
With the advent of next-generation DNA sequencing, the pace of inherited orphan disease gene identification has increased dramatically, a situation that will continue for at least the next several years. At present, the numbers of such identified disease genes significantly outstrips the number of laboratories available to investigate a given disorder, an asymmetry that will only increase over time. The hope for any genetic disorder is, where possible and in addition to accurate diagnostic test formulation, the development of therapeutic approaches. To this end, we propose here the development of a strategic toolbox and preclinical research pathway for inherited orphan disease. Taking much of what has been learned from rare genetic disease research over the past two decades, we propose generalizable methods utilizing transcriptomic, system-wide chemical biology datasets combined with chemical informatics and, where possible, repurposing of FDA approved drugs for pre-clinical orphan disease therapies. It is hoped that this approach may be of utility for the broader orphan disease research community and provide funding organizations and patient advocacy groups with suggestions for the optimal path forward. In addition to enabling academic pre-clinical research, strategies such as this may also aid in seeding startup companies, as well as further engaging the pharmaceutical industry in the treatment of rare genetic disease.
PMCID: PMC3458970  PMID: 22704758
Orphan disease therapy; Preclinical drug development; Generalizable screening methods; Translational toolbox
11.  In situ proteolysis for protein crystallization and structure determination 
Nature Methods  2007;4(12):1019-1021.
We tested the general applicability of in situ proteolysis to form protein crystals suitable for structure determination by adding a protease (chymotrypsin or trypsin) digestion step to crystallization trials of 55 bacterial and 14 human proteins that had proven recalcitrant to our best efforts at crystallization or structure determination. This is a work in progress; so far we determined structures of 9 bacterial proteins and the human aminoimidazole ribonucleotide synthetase (AIRS) domain.
PMCID: PMC3366506  PMID: 17982461
13.  A dual function of the CRISPR-Cas system in bacterial antivirus immunity and DNA repair 
Molecular microbiology  2010;79(2):484-502.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and the associated proteins (Cas) comprise a system of adaptive immunity against viruses and plasmids in prokaryotes. Cas1 is a CRISPR-associated protein that is common to all CRISPR-containing prokaryotes but its function remains obscure. Here we show that the purified Cas1 protein of Escherichia coli (YgbT) exhibits nuclease activity against single-stranded and branched DNAs including Holliday junctions, replication forks, and 5′-flaps. The crystal structure of YgbT and site-directed mutagenesis have revealed the potential active site. Genome-wide screens show that YgbT physically and genetically interacts with key components of DNA repair systems, including recB, recC and ruvB. Consistent with these findings, the ygbT deletion strain showed increased sensitivity to DNA damage and impaired chromosomal segregation. Similar phenotypes were observed in strains with deletion of CRISPR clusters, suggesting that the function of YgbT in repair involves interaction with the CRISPRs. These results show that YgbT belongs to a novel, structurally distinct family of nucleases acting on branched DNAs and suggest that, in addition to antiviral immunity, at least some components of the CRISPR-Cas system have a function in DNA repair.
PMCID: PMC3071548  PMID: 21219465
Cas1; CRISPR; DNA recombination; DNA repair; nuclease; YgbT
15.  Crystal structure and molecular modeling study of N-carbamoylsarcosine amidase Ta0454 from Thermoplasma acidophilum 
Journal of structural biology  2009;169(3):304-311.
A crystal structure of the putative N-carbamoylsarcosine amidase (CSHase) Ta0454 from Thermoplasma acidophilum was solved by single-wavelength anomalous diffraction and refined at a resolution of 2.35 Å. CSHases are involved in the degradation of creatinine. Ta0454 shares a similar fold and a highly conserved C-D-K catalytic triad (Cys123, Asp9, and Lys90) with the structures of three cysteine hydrolases (PDB codes 1NBA, 1IM5, and 2H0R). Molecular dynamics (MD) simulations of Ta0454/N-carbamoylsarcosine and Ta0454/pyrazinamide complexes were performed to determine the structural basis of the substrate binding pattern for each ligand. Based on the MD simulated-trajectories, the MM/PBSA method predicts binding free energies of −24.5 and −17.1 kcal/mol for the two systems, respectively. The predicted binding free energies suggest that Ta0454 is selective for N-carbamoylsarcosine over pyrazinamide, and zinc ions play an important role in the favorable substrate bound states.
PMCID: PMC2830209  PMID: 19932181
N-carbamoylsarcosine amidase; C-D-K catalytic triad; creatinine degradation; crystal structure; MM/PBSA; molecular dynamics simulations
16.  Structure- and Function-based Characterization of a New Phosphoglycolate Phosphatase from Thermoplasma acidophilum* 
The Journal of biological chemistry  2003;279(1):517-526.
The protein TA0175 has a large number of sequence homologues, most of which are annotated as unknown and a few as belonging to the haloacid dehalogenase superfamily, but has no known biological function. Using a combination of amino acid sequence analysis, three-dimensional crystal structure information, and kinetic analysis, we have characterized TA0175 as phosphoglycolate phosphatase from Thermoplasma acidophilum. The crystal structure of TA0175 revealed two distinct domains, a larger core domain and a smaller cap domain. The large domain is composed of a centrally located five-stranded parallel β-sheet with strand order S10, S9, S8, S1, S2 and a small β-hairpin, strands S3 and S4. This central sheet is flanked by a set of three α-helices on one side and two helices on the other. The smaller domain is composed of an open faced β-sandwich represented by three antiparallel β-strands, S5, S6, and S7, flanked by two oppositely oriented α-helices, H3 and H4. The topology of the large domain is conserved; however, structural variation is observed in the smaller domain among the different functional classes of the haloacid dehalogenase superfamily. Enzymatic assays on TA0175 revealed that this enzyme catalyzed the dephosphorylation of phosphoglycolate in vitro with similar kinetic properties seen for eukaryotic phosphoglycolate phosphatase. Activation by divalent cations, especially Mg2+, and competitive inhibition behavior with Cl− ions are similar between TA0175 and phosphoglycolate phosphatase. The experimental evidence presented for TA0175 is indicative of phosphoglycolate phosphatase.
PMCID: PMC2795321  PMID: 14555659
17.  Crystal structure of the hypothetical protein TA1238 from Thermoplasma acidophilum: a new type of helical super-bundle 
The crystal structure of the hypothetical protein TA1238 from Thermoplasma acidophilum was solved with multiple-wavelength anomalous diffraction and refined at 2.0 Å resolution. The molecule consists of a typical four-helix antiparallel bundle with overhand connection. However, its oligomerization into a trimer leads to a coiled ‘super-helix’ which is novel for such bundles. Its central feature, a six-stranded coiled coil, is also novel for proteins. TA1238 does not have significant sequence relatives in databases, but shows strong structural homologues with some proteins in the Protein Data Bank. The function could not be inferred from the sequence but the structure, with some rearrangement, bears some resemblance to the active site region of cobalamin adenosyltransferase (TA1434). Specifically, TA1238 retains Arg104, which is structurally equivalent to functionally critical Arg119 of TA1434. For such conformational change, the overhand connection of TA1238 might need to be involved in a gating mechanism that might be modulated by ligands and/or by interactions with the physiological partners. This allowed us to hypothesize that TA1238 could be involved in cobalamin biosyntheses.
PMCID: PMC2792032  PMID: 15704011
cobalamin biosynthesis; crystal structure; four-helix bundle; gating mechanism; MAD phasing; overhand connection; six-stranded coiled coil
18.  Structure of Escherichia coli Ribose-5-Phosphate Isomerase: A Ubiquitous Enzyme of the Pentose Phosphate Pathway and the Calvin Cycle 
Ribose-5-phosphate isomerase A (RpiA; EC interconverts ribose-5-phosphate and ribulose-5-phosphate. This enzyme plays essential roles in carbohydrate anabolism and catabolism; it is ubiquitous and highly conserved. The structure of RpiA from Escherichia coli was solved by multiwavelength anomalous diffraction (MAD) phasing, and refined to 1.5 Å resolution (R factor 22.4%, Rfree 23.7%). RpiA exhibits an α/β/(α/β)/β/α fold, some portions of which are similar to proteins of the alcohol dehydrogenase family. The two subunits of the dimer in the asymmetric unit have different conformations, representing the opening/closing of a cleft. Active site residues were identified in the cleft using sequence conservation, as well as the structure of a complex with the inhibitor arabinose-5-phosphate at 1.25 Å resolution. A mechanism for acid-base catalysis is proposed.
PMCID: PMC2792023  PMID: 12517338
ribose-5-phosphate isomerase; MAD; X-ray crystallography; pentose phosphate pathway; Calvin cycle; arabinose-5-phosphate
20.  The 2.2 Å Resolution Structure of RpiB/AlsB from Escherichia coli Illustrates a New Approach to the Ribose-5-phosphate Isomerase Reaction 
Journal of molecular biology  2003;332(5):1083-1094.
Ribose-5-phosphate isomerases (EC interconvert ribose 5-phosphate and ribulose 5-phosphate. This reaction permits the synthesis of ribose from other sugars, as well as the recycling of sugars from nucleotide breakdown. Two unrelated types of enzyme can catalyze the reaction. The most common, RpiA, is present in almost all organisms (including Escherichia coli), and is highly conserved. The second type, RpiB, is present in some bacterial and eukaryotic species and is well conserved. In E. coli, RpiB is sometimes referred to as AlsB, because it can take part in the metabolism of the rare sugar, allose, as well as the much more common ribose sugars. We report here the structure of RpiB/AlsB from E. coli, solved by multi-wavelength anomalous diffraction (MAD) phasing, and refined to 2.2 Å resolution. RpiB is the first structure to be solved from pfam02502 (the RpiB/LacAB family). It exhibits a Rossmann-type αβα-sandwich fold that is common to many nucleotide-binding proteins, as well as other proteins with different functions. This structure is quite distinct from that of the previously solved RpiA; although both are, to some extent, based on the Rossmann fold, their tertiary and quaternary structures are very different. The four molecules in the RpiB asymmetric unit represent a dimer of dimers. Active-site residues were identified at the interface between the subunits, such that each active site has contributions from both subunits. Kinetic studies indicate that RpiB is nearly as efficient as RpiA, despite its completely different catalytic machinery. The sequence and structural results further suggest that the two homologous components of LacAB (galactose-6-phosphate isomerase) will compose a bi-functional enzyme; the second activity is unknown.
PMCID: PMC2792017  PMID: 14499611
ribose-5-phosphate isomerase; pentose phosphate pathway; galactose-6-phosphate isomerase; MAD; X-ray crystallography
21.  Integrating Structure, Bioinformatics, and Enzymology to Discover Function 
The Journal of biological chemistry  2003;278(28):26039-26045.
Structural proteomics projects are generating three-dimensional structures of novel, uncharacterized proteins at an increasing rate. However, structure alone is often insufficient to deduce the specific biochemical function of a protein. Here we determined the function for a protein using a strategy that integrates structural and bioinformatics data with parallel experimental screening for enzymatic activity. BioH is involved in biotin biosynthesis in Escherichia coli and had no previously known biochemical function. The crystal structure of BioH was determined at 1.7 Å resolution. An automated procedure was used to compare the structure of BioH with structural templates from a variety of different enzyme active sites. This screen identified a catalytic triad (Ser82, His235, and Asp207) with a configuration similar to that of the catalytic triad of hydrolases. Analysis of BioH with a panel of hydrolase assays revealed a carboxylesterase activity with a preference for short acyl chain substrates. The combined use of structural bioinformatics with experimental screens for detecting enzyme activity could greatly enhance the rate at which function is determined from structure.
PMCID: PMC2792009  PMID: 12732651
22.  Biochemical and structural characterization of a novel family of cystathionine beta-synthase domain proteins fused to a Zn ribbon-like domain 
Journal of molecular biology  2007;375(1):301-315.
We have identified a novel family of proteins, in which the N-terminal Cystathionine Beta-Synthase (CBS) domain is fused to the C-terminal Zn ribbon domain. Four proteins were over-expressed in E. coli and purified: TA0289 from Thermoplasma acidophilum, TV1335 from Thermoplasma vulcanum, PF1953 from Pyrococcus furiosus, and PH0267 from Pyrococcus horikoshii. The purified proteins had red/purple color in solution and an absorption spectrum typical of rubredoxins. Metal analysis of purified proteins revealed the presence of several metals with iron and zinc being the most abundant metals (2 to 67% of iron and 12 to 74% of zinc). Crystal structures of both mercury- and iron-bound TA0289 (1.5–2.0 Å resolution) revealed a dimeric protein whose inter-subunit contacts are formed exclusively by the α helices of two CBS sub-domains, whereas the C-terminal domain has a classical Zn-ribbon planar architecture. All proteins were reversibly reduced by chemical reductants (ascorbate or dithionite) or by the general rubredoxin reductase NorW from E. coli in the presence of NADH. Reduced TA0289 was found to be able to transfer electrons to cytochrome C from horse heart. Likewise, the purified Zn ribbon protein KTI11 from Saccharomyces cerevisiae had purple color in solution and a rubredoxin-like absorption spectrum, contained both iron and zinc, and was reduced by the rubredoxin reductase NorW from E. coli. Thus, recombinant Zn ribbon domains from archaea and yeast demonstrate a rubredoxin-like electron carrier activity in vitro. We suggest that in vivo some Zn ribbon domains might also bind iron and therefore possess an electron carrier activity adding another physiological role to this large family of important proteins.
PMCID: PMC2613313  PMID: 18021800
23.  MAID : An effect size based model for microarray data integration across laboratories and platforms 
BMC Bioinformatics  2008;9:305.
Gene expression profiling has the potential to unravel molecular mechanisms behind gene regulation and identify gene targets for therapeutic interventions. As microarray technology matures, the number of microarray studies has increased, resulting in many different datasets available for any given disease. The increase in sensitivity and reliability of measurements of gene expression changes can be improved through a systematic integration of different microarray datasets that address the same or similar biological questions.
Traditional effect size models can not be used to integrate array data that directly compare treatment to control samples expressed as log ratios of gene expressions. Here we extend the traditional effect size model to integrate as many array datasets as possible. The extended effect size model (MAID) can integrate any array datatype generated with either single or two channel arrays using either direct or indirect designs across different laboratories and platforms. The model uses two standardized indices, the standard effect size score for experiments with two groups of data, and a new standardized index that measures the difference in gene expression between treatment and control groups for one sample data with replicate arrays. The statistical significance of treatment effect across studies for each gene is determined by appropriate permutation methods depending on the type of data integrated. We apply our method to three different expression datasets from two different laboratories generated using three different array platforms and two different experimental designs. Our results indicate that the proposed integration model produces an increase in statistical power for identifying differentially expressed genes when integrating data across experiments and when compared to other integration models. We also show that genes found to be significant using our data integration method are of direct biological relevance to the three experiments integrated.
High-throughput genomics data provide a rich and complex source of information that could play a key role in deciphering intricate molecular networks behind disease. Here we propose an extension of the traditional effect size model to allow the integration of as many array experiments as possible with the aim of increasing the statistical power for identifying differentially expressed genes.
PMCID: PMC2483727  PMID: 18616827
24.  Structural and Chemical Profiling of the Human Cytosolic Sulfotransferases  
PLoS Biology  2007;5(5):e97.
The human cytosolic sulfotransfases (hSULTs) comprise a family of 12 phase II enzymes involved in the metabolism of drugs and hormones, the bioactivation of carcinogens, and the detoxification of xenobiotics. Knowledge of the structural and mechanistic basis of substrate specificity and activity is crucial for understanding steroid and hormone metabolism, drug sensitivity, pharmacogenomics, and response to environmental toxins. We have determined the crystal structures of five hSULTs for which structural information was lacking, and screened nine of the 12 hSULTs for binding and activity toward a panel of potential substrates and inhibitors, revealing unique “chemical fingerprints” for each protein. The family-wide analysis of the screening and structural data provides a comprehensive, high-level view of the determinants of substrate binding, the mechanisms of inhibition by substrates and environmental toxins, and the functions of the orphan family members SULT1C3 and SULT4A1. Evidence is provided for structural “priming” of the enzyme active site by cofactor binding, which influences the spectrum of small molecules that can bind to each enzyme. The data help explain substrate promiscuity in this family and, at the same time, reveal new similarities between hSULT family members that were previously unrecognized by sequence or structure comparison alone.
Author Summary
We metabolize many hormones, drugs, and bioactive chemicals and toxins from the environment. One family of enzymes that participate in the metabolic process consists of the cytosolic sulfotransferases, or SULTs. SULTs have a variety of mechanisms of action—sometimes they inactivate the biological activity of the chemical (e.g., in the case of estrogen). At other times, the enzymes make the chemical more toxic (e.g., for certain carcinogens). Humans have 12 distinct SULT enzymes. Determining how each of these human enzymes recognizes and distinguishes between the thousands of chemicals we confront each day is essential for understanding hormone regulation, assessing environmental risk, and eventually developing better, more-effective drugs. We have studied the human SULT family of enzymes to profile which small molecules are recognized by each enzyme. We also visualized and compared the detailed structural features that determine which enzyme interacts with which molecule. By studying the entire family, we discovered new ways in which chemicals interact with each enzyme. Furthermore, we identified new inhibitors and inhibitory mechanisms. Finally, we discovered functions for many of the human enzymes that were previously uncharacterized.
Structural genomics and substrate screening provide "chemical fingerprints" and insights into substrate promiscuity for the human family of drug- and hormone-metabolizing cytosolic sulfotransferase enzymes.
PMCID: PMC1847840  PMID: 17425406
25.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics 
Nucleic Acids Research  2001;29(13):2884-2898.
High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information about cloning, expression, purification, biophysical characterization and structure determination via NMR spectroscopy or X-ray crystallography. It will be essential to develop specifications and ontologies for standardizing this information to make it amenable to retrospective analysis. To this end we created the SPINE database and analysis system for the Northeast Structural Genomics Consortium. SPINE, which is available at or, is specifically designed to enable distributed scientific collaboration via the Internet. It was designed not just as an information repository but as an active vehicle to standardize proteomics data in a form that would enable systematic data mining. The system features an intuitive user interface for interactive retrieval and modification of expression construct data, query forms designed to track global project progress and external links to many other resources. Currently the database contains experimental data on 985 constructs, of which 740 are drawn from Methanobacterium thermoautotrophicum, 123 from Saccharomyces cerevisiae, 93 from Caenorhabditis elegans and the remainder from other organisms. We developed a comprehensive set of data mining features for each protein, including several related to experimental progress (e.g. expression level, solubility and crystallization) and 42 based on the underlying protein sequence (e.g. amino acid composition, secondary structure and occurrence of low complexity regions). We demonstrate in detail the application of a particular machine learning approach, decision trees, to the tasks of predicting a protein’s solubility and propensity to crystallize based on sequence features. We are able to extract a number of key rules from our trees, in particular that soluble proteins tend to have significantly more acidic residues and fewer hydrophobic stretches than insoluble ones. One of the characteristics of proteomics data sets, currently and in the foreseeable future, is their intermediate size (∼500–5000 data points). This creates a number of issues in relation to error estimation. Initially we estimate the overall error in our trees based on standard cross-validation. However, this leaves out a significant fraction of the data in model construction and does not give error estimates on individual rules. Therefore, we present alternative methods to estimate the error in particular rules.
PMCID: PMC55760  PMID: 11433035

Results 1-25 (27)