With these considerations in mind, we proposed formation of the Enzyme Function Initiative (EFI) in which computation-based prediction of substrate specificity is the centerpiece of a multidisciplinary strategy for functional assignment of unknown enzymes (2
). The strategy includes bioinformatics, experimental structural biology, structural modeling and docking, and experimental enzymology to assign in vitro
substrate specificities and enzymatic functions as well as microbiology (phenotypic analyses, genetics, and transcriptomics); it also includes metabolomics to validate (or disprove) the predicted and experimentally confirmed in vitro
enzymatic function as the authentic in vivo
The goal of the EFI is to develop a multidisciplinary, high throughput strategy for functional assignment of unknown enzymes.
The EFI started in May 2010 with the support of a Large Scale Collaborative Project (U54GM093342) from the National Institute of General Medical Sciences (NIGMS). The EFI is a five-year cooperative agreement among NIGMS, the host institution (University of Illinois, Urbana-Champaign), and the subcontracting institutions (refer to the author list for details). A cooperative agreement is a support mechanism in which NIGMS provides substantial scientific and programmatic involvement, i.e., program staff assist, guide, coordinate, and/or participate in project activities. The EFI is reviewed by NIGMS on a continuing basis, with formal reviews after 18 and 36 months. This modus operandi differs from investigator-initiated research grants (R01) and program project grants (P01) where the scientific direction and progress usually are not subject to active oversight by NIGMS staff during the project period. Peter Preusch, chief of the Biophysics Branch in the NIGMS Division of Cell Biology and Biophysics, is the Scientific Officer and a member of the EFI’s internal Steering Committee. Warren C. Jones, chief of the Biochemistry and Biorelated Chemistry Branch in the NIGMS Division of Pharmacology, Physiology, and Biological Chemistry, is the Program Officer who oversees the budgetary and administrative aspects of the EFI within NIGMS. An external Scientific Advisory Committee meets annually with the EFI to assess progress and provide guidance for programmatic direction; the members include Helen Berman, Rutgers University and Director of the Protein Data Bank (PDB); Benjamin Cravatt, The Scripps Research Institute; Barry Honig, Columbia University Medical Center; Eaton Lattman, Hauptman-Woodward Medical Research Institute, University at Buffalo; and Rowena Matthews, University of Michigan.
The EFI’s strategy for functional assignment can be summarized by the “funnel” depicted in . With the available resources, the initial computational prediction of substrate specificity can be performed in a relatively high throughput (tens of enzymes per month); the subsequent experimental enzymology that tests the computational predictions can be performed with modest throughput (several enzymes per month); and in vivo studies of the in vitro assigned functions are labor and time intensive and, therefore, low throughput (one or two per month), limiting the number of in vivo functions that can be evaluated. However, without reliable computational prediction, experimental evaluation would be a random walk through substrate space, preventing efficient functional assignment. Furthermore, without in vivo “testing”, the in vitro assigned functions may be uninformative about the in vivo function (vide infra) or enzymes with promiscuous in vitro substrate specificities could have uncertain physiological importance.
The “funnel” for functional assignment, showing the roles and relative throughputs of the computational and experimental stages in functional assignment.
The protein “targets” selected to develop the strategy for functional assignment are members of functionally diverse enzyme superfamilies (conserved partial reactions or chemical capability but divergent overall function) so that assignment of function is not trivial, i.e., homology inferred from simple sequence comparisons alone does not allow assignment of function (3
). For example, the members of the functionally diverse enolase superfamily catalyze different reactions that always are initiated by Mg2+
-assisted enolization of carboxylate anions and include β-elimination (dehydration, deamination, and cycloisomeriation) and 1,1-proton transfer (racemization and epimerization) reactions (5
). In another example, members of the functionally diverse amidohydrolase superfamily catalyze metal-assisted hydrolysis of C-O, C-N, and P-O bonds in diverse substrates (7
Briefly, our approach (“pipeline” in ) is to 1)
use sequence relationships to identify putative isofunctional families within functionally diverse superfamilies from which targets are selected to develop, test, and improve the strategy; 2)
for bacterial enzymes, analyze the genome/operon contexts within the families to identify other enzymes that are part of the same metabolic pathway to provide additional functional clues; 3)
when possible, purify and structurally characterize the targets and, when appropriate, other enzymes in the metabolic pathway; 4)
if structures cannot be determined experimentally, use homology modeling to obtain reliable models; 5)
perform in silico
ligand docking to generate rank-ordered lists of predicted substrates; 6)
experimentally screen predicting substrates for activity, as well as synthesize and screen novel compounds suggested by docking, to determine in vitro
determine structures of liganded complexes so that the predicted and experimental binding “poses” of the substrate (or analog/product) can be compared to both evaluate as well as improve the computational procedures for homology modeling and/or ligand docking; 8)
when possible, elucidate the in vivo
function by a combination of focused genetics (knockouts and overexpression), transcriptomics, and metabolomics; and 9)
when possible to do so with high confidence, transfer annotations from the proteins for which the EFI has established reliable functions to other unknowns (1
). Elements of this strategy had been demonstrated by some of the authors (J.A.G., S.C.A, P.C.B., M.P.J., F.M.R., A.S., and B.K.S.) for the functionally diverse amidohydrolase and enolase superfamilies (vide infra
); with the support of the EFI those efforts are being expanded to include dedicated protein production and structure determination for targets from additional functionally diverse superfamilies as well as microbiology and metabolomics.
The pipeline for functional assignment adopted by the EFI.
The EFI’s efforts are not organized according to Specific Aims that are integral to traditional research grants, e.g., NIH R01 and P01 funding mechanisms. Instead, the EFI focuses on deliverables that will benefit the biomedical community. These deliverables include:
- Development of a multidisciplinary sequence/structure-based strategy for predicting the functions of unknown enzymes discovered in genome sequencing projects.
- Dissemination of the strategy to the community by publications, web-based interfaces, workshops, symposia, and collaboration of external investigators with the bioinformatics and computational components of the EFI.
- Development of computational and bioinformatic tools for utilizing the strategy.
- The genes encoding all targets are made available to the community via the PSI-MR (http://psimr.asu.edu/). To the extent possible, compounds used for experimental studies of enzymatic activity will be disseminated; if these are not available in sufficient quantities to allow distribution, the procedures for their synthesis will be made available. Protocols for protein expression and functional assays also will be available via PepcDB (pepcdb.sbkb.org) and the EFI’s website (enzymefunction.org), respectively.
- Dissemination of both computational predictions and experimental data via the EFI’s website.
In the following sections, we describe the organization of the EFI as well as its internal collaborative interactions and operations.