|Home | About | Journals | Submit | Contact Us | Français|
Metabolites comprise the molar majority of chemical substances in living cells, and metabolite-protein interactions are expected to be quite common. Many interactions have already been identified and have been shown to be involved in the regulation of different types of cellular processes including signaling events, enzyme activities, protein localizations and interactions. Recent technological advances have greatly facilitated the detection of metabolite-protein interactions at high sensitivity and some of these have been applied on a large scale. In this manuscript, we review the available in vitro, in silico and in vivo technologies for mapping small-molecule-protein interactions. Although some of these were developed for drug-protein interactions they can be applied for mapping metabolite-protein interactions. Information gained from the use of these approaches can be applied to the manipulation of cellular processes and therapeutic applications.
Natural small metabolites comprise the numerical majority of cellular molecules. Their intracellular concentrations vary across a broad range, and they participate in a wide variety of biochemical processes. Small metabolites interact with various proteins enzymatically as substrates and products, or allosterically as cofactors or ligands. These interactions are known to control material and energy flux in biochemical reactions and can regulate biological processes through signaling cascades.
Of particular interest in the study of metabolite-protein interactions are those that serve as key regulators for protein functions and biological processes. Many metabolite modulators of protein or enzyme activities have been documented in numerous biochemical studies over the past few decades, and some of the key findings have revolutionized people’s view towards molecular regulation. Two classic examples of metabolite-regulation of protein function involve the lac repressor and pyruvate kinase. Lactose binds the lac repressor; in its presence the bound repressor can no longer bind DNA and transcription ensues . For bacterial pyruvate kinase, in vitro enzyme activity is regulated by allosteric activation by several different compounds: AMP, ribose-5-phosphate and glucose-6-phosphate, and allosteric inhibition by ATP and ortho-phosphate . Enzyme activity is modulated by different concentrations of various metabolites thereby allowing different regulation of enzyme activity.
Although researchers have already discovered a significant number of metabolite-protein interactions, only a small portion of interaction networks has been identified. Documenting and interpreting interactions between metabolites and proteins in the biological context is expected to be important for human health and medicine, by helping researchers understand the molecular basis of healthy and diseased states. In particular, metabolite regulators of disease-related proteins can provide novel strategies for potential therapeutic interventions.
Recently, a number of highly sensitive methods have been developed for the analysis of small-molecule-protein interactions. Although some of these have been developed for drug-protein interactions, they can be applied to the analysis of metabolite-protein interactions. This review intends to provide an overview of various in vitro, in silico and in vivo techniques to investigate interactions between proteins and a variety of small molecules, which include metabolites, drugs, metabolized drugs and other low molecular weight molecules. (For the purposes of this review, the term “metabolite” will refer to endogenous molecules.) The pros and cons for those techniques are summarized (Table 1). Although enzyme-substrate interactions are an important part of interaction networks, this review will primarily focus on regulatory interactions.
Although there are many methods for studying small-molecule-protein interactions, in vitro assay is the most popular. A wide variety of techniques have been developed and many of these are quite sensitive. These assays generally involve two consecutive steps: 1) bind a single metabolite or a mixture of metabolites to a protein or lysate, and then separate the bound complexes from the free ones in the interaction mixture and 2) detect the interacting molecules. Technological advances such as those involving mass spectrometry and microarrays have occurred that greatly facilitate these steps . We first review the various methods and advances in the detection because the choice of methods since binding and separation is largely dependent upon this step.
Detection of small molecules is challenging because a) there is no molecular amplification to boost detection sensitivity and b) they are chemically diverse. A variety of common and well-developed detection schemes have been used for in vitro binding experiments including radiometric analysis, fluorimetric analysis, surface plasmon resonance, calorimetry, nuclear magnetic resonance and crystallography . These methods measure different types of signals including absorption, emission, scattering, heat release, resonance and reflection; the signals usually originate from various types of materials such as nuclei, atoms, molecules and crystals. These methods are well suited for studying binding affinities and mechanisms and some of them can be used to identify and characterize binding sites, elucidate the structures of metabolite-protein complexes and study the kinetics of binding events. In addition, in recent years mass spectrometry has become quite popular because it is highly sensitive and can both identify unknown molecules and detect multiple molecules simultaneously. Each of these different methods will be elaborated upon in the following sections.
Radio-isotopic labeling is widely applied to either small molecules or proteins [4, 5]. Typically, labeled small molecules are incubated with proteins of interest and the bound compound is measured using a scintillation counter. Enzymatic activity can also be measured using radioactive substrates. Schuster et al. screened for 400 potential inhibitors for CYP24 by radioactively labeling the substrate and analyzing metabolite profiles; they found more than 50 inhibitors using this approach . Commonly used isotopic labels include 3H, 14C, 32P and 125I. The labeling does not affect the structure or reactivity of the molecules, and the detection has good sensitivity and specificity. However, its disadvantages include disposal of radioactive waste, relatively long read times (radioisotopic detection requires minutes while fluorescence requires only milliseconds), high costs due to extra protection and equipment required for handling radioisotopes and short shelf lives of some radioisotopes (half-lives for 32P and 125I are 14.3 days and 59 days, respectively) . An additional concern of radio-isotopic labeling is that radioisotopes and the methods for synthesizing isotopic labeled compounds need to be available. The aforementioned commonly used radioisotopes match this criterion, since they have relatively simple chemical synthesis methods .
Fluorescent labeling has long been used for studying protein-metabolite interactions. Sudlow et al. detected effects of stearic acid binding to human serum albumin on the conformational changes of two binding sites by measuring fluorescence intensity as an indication of the displacement of two fluorescent probes specifically located on the two sites . Recent development of fluorophores has provided superior tools for fluorescence detection . For example, some fluorophores based on oxobenzopyran and fluorescein have high quantum yields and photostability, and near-infrared fluorophores derived from squaraine, cyanine, thiazine and oxazine have low background fluorescence and allow deep penetration of radiation into samples due to low light scattering. Zhu et al. printed 5800 yeast proteins on proteome microarray chips and screened for their ability to interact with biotinylated phospholipids, which can be detected by Cy3-streptavidin. They found 150 proteins interacted significantly with phospholipids . One disadvantage of fluorescent labeling is that the attachment of a relatively bulky fluorophore can introduce steric hindrance and alter the protein recognition for the metabolite. Background autofluorescence can also interfere with the detection. Some related fluorescence assays include those based on fluorescence resonance energy transfer (FRET), fluorescence polarization (FP) and other techniques .
The different radioactivity and fluorescence methods listed above require labeling either the small molecule or the protein, thus limiting their use in untargeted interaction screenings.
Surface plasmon resonance (SPR) is another widely used spectroscopic method. When light strikes the interface between media that have different refractive indices, both reflection and refraction occur. Above a critical incidence angle, total internal reflection is observed and no light is refracted. The electromagnetic field component of a polarized light with total internal reflection can penetrate a short distance into the medium with a lower refractive index in the form of a wave, and resonance energy will be transferred between the wave and the surface plasmon. A linear relationship is observed between the resonance energy and the mass of the immobilized protein or small molecule on the surface.
SPR technique can be used to observe analyte association and dissociation in real time and provide kinetic information about the binding events. Although it does not require labeling, it requires immobilization of the receptor protein or the small molecule ligand on the sensor chip surface. Another issue of SPR is the choice of immobilization molecules. SPR is dependent on changes in mass, so it is beneficial to attach molecules with lower molecular weight to the surface and measure the binding of the heavier binding partner, which ensures the mass change is more drastic. However, due to the difficulties of immobilization of small molecules without loss of binding affinity, many researchers choose to attach the protein to the surface, thus sacrificing the signal intensity for low molecular weight molecule binding events. By using highly sensitive SPR instruments, Frostell-Karlsson et al. were able to detect immobilized human serum albumin binding to molecules with molecular weight less than 140 Da . Gestwicki et al. monitored the binding of small molecules to immobilized maltose-binding protein and tissue trans-glutaminase by SPR . Owing to the finding that alterations in the conformation of immobilized proteins are able to generate a detectable SPR signal change, this study was able to screen small molecules binding to the target protein by detecting conformational changes of immobilized protein triggered by binding. This approach measured conformation-induced SPR signals rather than mass-induced signals, thereby overcoming the limitation of SPR in which ligands of high molecular weight are necessary for sensitive binding event detection.
Nuclear magnetic resonance (NMR) is widely used for elucidating the structure of biomolecules at the atomic level. Isotopes that contain an odd number of protons or neutrons have nonzero spin values and are able to generate NMR signals, when they are placed in a magnetic field and subjected to an appropriate electromagnetic frequency. Therefore, specific isotopes, from either natural occurrence or direct labeling on target biomolecules, have been utilized in NMR studies. In particular, the 2H, 13C and 15N protein labeling techniques connect spin systems over the peptide bonds, thus enabling NMR to resolve protein structures . In addition, labeling helps correlate relaxation time with atomic motions within a molecule, thereby enabling studying the dynamics of residue interactions by NMR. As a result, NMR has been an important tool in understanding the binding affinity and binding site structure, as well as characterizing the dynamics of conformational changes during protein-metabolite binding processes [15, 16]. Apart from the studies of the effect of a ligand on a single protein, the effect on protein-protein interactions can also be investigated by NMR. D’Silva et al. monitored the influence of ligands on protein-protein interactions by NMR spectra broadening upon adding the antagonist .
One can not only label proteins with stable isotopes, but also label small molecules. Small molecule labeling techniques can either be traditional stable isotopic labeling using 2H and 13C , or spin labeling . Spin labels like the NO radical have a free electron; therefore, they are paramagnetic and can be detected by NMR. Jahnke et al. used a spin-labeled adenine analogue to identify allosteric interactions with kinases using NMR . Spin labeling permitted the identification of a second ligand that bound simultaneously and within a distance of 15–20Å of the first ligand. This method is suitable for identifying allosteric interactions when the binding site of a known ligand is close to an allosteric site.
The limitations of NMR spectroscopy are low sensitivity (sample concentration in micromolar to millimolar range is required for NMR acquisitions) and low throughput (protein NMR can take hours or even days in order to obtain adequate signal-to-noise ratios).
Crystallography is a technique that analyzes the arrangement of atoms in proteins by generating diffraction patterns using a beam, mostly using X-rays. Therefore, one way to analyze the structure of protein-metabolite interacting complexes is by X-ray crystallography . In order to obtain the crystal structure of an enzyme-substrate complex, one may need extra steps to interrupt the enzyme’s catalytic activity and stabilize the transition state. This can mainly be done in two ways: forming transition state analog adducts or product complexes , or conducting site-directed mutagenesis/chemical modification on the enzyme active site . The limitations of crystallography are that many complexes are difficult to crystallize and the crystallization process is very time-consuming and tedious, so it cannot be used for high-throughput analysis.
Isothermal titration calorimetry (ITC) is the most commonly used calorimetric approach. Heat is absorbed or given off during a binding event determined by the enthalpy of the binding reaction. The obtained thermodynamic data can be used to deduce interaction mechanisms. Therefore, one is able to measure heat change upon binding and evaluate the binding affinity by ITC . ITC experiments do not require labeling or immobilization on a support. However, heat measurement is not very sensitive or stable, and impurities can generate artifactual signals. Hence at least milligram levels of highly purified proteins (e.g. 2.5 mg for a 50 kDa protein) are required for accurate measurements. In addition, the measurements take a long time to complete (a full titration experiment takes at least 2.5 h).
Mass spectrometry (MS) directly measures the mass-to-charge ratios of ionized molecules. It has the ability to identify and quantify thousands of molecules in a single sample without labeling as long as molecules differ in mass-to-charge values by more than two parts per million. A disadvantage of MS is low tolerance to non-volatile salts that are commonly used in biological experiments. Moreover, it is difficult to control the ionization efficiency among different analytes in a complex sample. As a result, MS is often placed after a separation technique, mostly chromatography, to reduce the number of metabolites simultaneously detected. Liquid chromatography (LC)  and gas chromatography (GC)  are commonly used. Recent advances in capillary electrophoresis (CE)  also make it an appealing choice complementing LC/GC.
Mass spectrometry is particularly useful to detect unknown metabolites and monitor enzymatic conversion. Various mass spectrometry based techniques have been developed and nicely reviewed [27, 28]. As an example, Yu et al. developed a library of synthetic carbohydrate substrates tagged with different alcohol linkers, which are effective differentiators of carbohydrates with identical molecular weight. This library was then used to identify glycosidases by determining the substrates that were cleaved using a quadrupole MS instrument . Fischbach et al. characterized the kinetics of enzyme-substrate interactions using quadrupole MS and MALDI-TOF MS . Morozov et al. coupled protein microarrays with triple quadrupole MS to allow rapid parallel assays of specific proteins interacting with multiple metabolites . In addition, recent development of MS machinery has added more merits to this technique. For example, MS with an Orbitrap analyzer can now achieve sub picogram sensitivity for small molecules, 5ppm mass accuracy, and with resolution of 100,000. Mass spectrometry is gaining popularity in analyzing small-molecule-protein interactions, owing to its increasing resolution and accuracy. Furuya et al. characterized monoxygenase activities using a Fourier transform ion cyclotron resonance mass spectrometer (FTICR-MS), which also has high mass resolution and accuracy . Clarke et al. developed an automated quench-flow micro reactor able to operate at millisecond temporal resolution interfaced with FTICR-MS, and successfully monitored the kinetics of hydrolysis of nitrophenyl acetate catalyzed by chymotrypsin . In Clarke et al.’s work, a transient enzyme-bound intermediate was observed and the deduced rate constants were in agreement with previous publications.
If the protein of interest has some specific enzymatic activity involving the metabolite, the enzymatic activity can be used to characterize the interaction. For example, Ferreira et al. used ATPase assay to monitor the effects of DDE, the major metabolite of DDT, on mitochondrial ATPase by measuring pH changes in association with ATP hydrolysis .
If metabolite binding affects protein-protein interactions, the binding event can be measured by changes in protein-protein interaction patterns. Krey et al. devised a method called co-activator-dependent receptor ligand assay (CARLA) , which was based on ligand-induced binding of transcriptional mediators to nuclear receptors. The application of this method is limited to those specific interactions that can be affected by metabolites. Only metabolite activators can be detected, not disruptors. On the other hand, some techniques have been especially designed for identifying metabolite disruptors of protein-protein interactions. Lemmens et al. reported a technique called reverse mammalian protein-protein interaction trap (reverse MAPPIT) . Unlike the traditional forward mode, reverse MAPPIT employs prey proteins and bait receptors with compromised signaling competence when the bait and prey come together. Upon binding to a peptide or organic compound that disrupts the interaction, the receptor restores signaling function, resulting in a positive signal.
Lomenick et al. developed an untargeted and universally applicable method called drug affinity responsive target stability (DARTS), which measures indirect binding. Given that a protein might be less susceptible to proteolysis when drug-bound, DARTS monitors the reduction in the protease susceptibility of the target protein upon ligand binding . DARTS requires no modification of the ligand or protein and is independent of the mechanism of interaction, so it can be useful in identifying drug targets and in mapping protein-metabolite interaction networks. One limitation of this method is that proteins vary dramatically in their sensitivity to proteolysis. A large fraction of background proteins will remain if they are refractory to digestion, complicating the proteomic analysis in identification by a bias towards high abundance proteins. Consequently, finding a cocktail of proteases to break down all the background proteins but leave the target proteins intact might be a challenge for high-throughput analysis .
In order to detect small-molecule-protein interactions in vitro, molecules are first bound to proteins and the free molecules are separated from bound ones traditionally by filtration, centrifugation or dialysis . In recent years, other more powerful techniques such as chromatography and immobilization have been employed. Immobilization methods such as microarrays have enabled the analysis of thousands of interactions simultaneously.
Equilibrium dialysis (ED) separates free ligands from protein-ligand complexes by semipermeable membranes . A semipermeable membrane only allows molecules smaller than a certain size to pass through, and is normally designed to let small molecules permeate but not proteins or protein-ligand conjugates. After an appropriate incubation time, equilibrium is reached and the resulting free small molecule concentration can be measured in a compartment free of proteins. In the study of lac repressor , radioisotope-labeled small molecules were incubated with fractionated proteins in the dialysis sac. The enrichment of labeled small molecules inside versus outside the sac was used to determine the binding affinity. The binding constants of a few metabolites to the lac repressor ranged between 0.5 mM for galactose to 0.5 μM for IPTG. The same technique was later used to determine the binding constant of tryptophan to trp aporepressor as 16 μM . In a recent paper, Orsak et al. developed a method called MIDAS, which uses ED coupled with LC/GC-MS to identify low affinity protein metabolite interactions . One disadvantage of ED is that it needs a long equilibration time (typically 5–48 h) and therefore cannot be used in a high-throughput way. Another problem associated with ED is the volume shift caused by the interactions between the semipermeable membrane and proteins. The change can be up to 30% . Nonspecific adsorption of metabolites or proteins onto the membrane is also difficult to avoid.
Ultrafiltration is a rapid version of ED which applies pressure. Comess et al. developed a technique called affinity selection mass spectrometry (ASMS), which enables equilibrium binding followed by ultrafiltration and ligand identification by mass spectrometry . The pressure applied during ultrafiltration can be problematic for maintaining the stability of the binding equilibrium. Both ED and ultrafiltration suffer from similar issues such as non-specific binding of compounds to the membrane. In contrast, ultracentrifugation forces the proteins to settle on the bottom of the tube and generates a protein concentration gradient based on the protein density. The top protein-free fraction can be carefully collected to measure small molecule concentrations as an estimate of unbound levels . In order to achieve maximum protein separation, a sample is usually subjected to ultracentrifugation through a density gradient, which contains heavier media at the bottom and lighter media at the top of the gradient. After the sample is placed at the top of the appropriate gradient media and centrifuged, the particles in the sample will migrate to a position where the surrounding density matches the particle density, so they can be effectively separated. As an example, Kobayashi et al. successfully separated different forms of phosphoprotein phosphatases by sucrose density gradient ultracentrifugation . Ultracentrifugation avoids the membrane effects; however, this method is low-throughput and the error due to sedimentation can be very large (up to 40%) .
Size-exclusion chromatography (SEC) uses a column packed with porous materials to separate molecules based on their sizes in the solution. Free proteins and protein complexes are too large to be retained in the pores and are eluted first, while small molecules are eluted later. Although column efficiency can sometimes be a problem, SEC offers additional protein stabilization  via the molecular crowding effect . Molecular crowding means concentrated macromolecules reduce the solvent volume for other molecules and elevate their effective concentrations. This effect has been found to increase thermal stability of cellular proteins and to place steric restrictions on the unfolding of proteins . This protein-stabilizing phenomenon makes SEC a powerful choice for studying protein-ligand interactions. Muckenschnabel et al. used SEC to remove unbound small molecules after incubation with protein and they analyzed the protein fraction by LC-MS for bound ligands .
Capillary electrophoresis (CE) separates species based on their size to charge ratio in the interior of a small capillary filled with an electrolyte. Hoffmann et al. used CE to separate chymotrypsin-chymostatin complexes from unbound chymotrypsin and impurities. They then selectively dissociated these enzyme-inhibitor complexes by using a collision-induced dissociation (CID) process in MS/MS, to analyze the binding strength of inhibitors . CE has high efficiency and low sample consumption, but risks protein adsorption on the capillary.
Many assays immobilize proteins or small molecules on a solid phase support to purify and enrich protein-metabolite interactions . This technique has great potential for development of high-throughput assays but researchers should be aware that some drawbacks may affect the precision of the results. Immobilized receptor proteins on a solid support may be subjected to denaturation and improper orientation that may lead to loss of native structure and function of proteins. Immobilization of metabolite ligands requires the metabolite to have a functional group for derivatization, and introduces steric hindrance for protein binding. Both methods may give rise to non-specific binding to the matrix and thus interfere with the determination of both the affinity and specificity of true interactions. Moreover, the immobilization process requires individual optimization and thus limits its throughput.
One popular immobilization platform is affinity chromatography. The capturing proteins are cross-linked on the column, and a solution containing binding molecules flows through it. Small molecules with high affinity with the capturing proteins will stay on the column until they are washed off by another solution. Affinity chromatography can be used to determine receptor-ligand binding affinities when coupled with different detection techniques . Since human serum albumin is a major transporter for a variety of drugs, hormones and other solutes, and studying the interactions between albumin and drugs or metabolized drugs is an integral part of drug pharmacokinetics characterization, albumin is one of the most studied proteins using affinity chromatography . This method is effective in enriching and screening multiple small molecules in a single run, and the column can be reused. However, molecules with strong binding affinity can be quite challenging because they are difficult to elute off the column.
A number of other platforms are also used. In one explorative report, Tagore et al. immobilized GST fusion proteins on a solid support and profiled interacting metabolites by LC-MS after incubation with a metabolite mixture . Conversely, Kalisiak et al. immobilized metabolites on agarose beads and analyzed binding proteins by LC-MS .
Northen et al. developed a nanostructure-initiator mass spectrometry enzymatic (Nimzyme) assay, which retains fluorous tagged enzyme substrates on the fluorous-phase surface . A “fluorous tag” is a highly fluorinated alkane that has high affinity towards fluorous-derivatized solid phases due to the fact that fluorine-rich compounds dissolve preferentially in fluorine-rich solvents, so the tag is often incorporated into an organic molecule for attachment and purification purposes. “Fluorous phase” is a state formed by highly fluorinated compounds, and is immiscible with either aqueous or organic phase. In this study, the metabolite substrates were noncovalently attached to the surface coated with fluorinated siloxane “initiator” through a fluorous tag. A five-carbon linker was incorporated on the substrates to reduce steric hindrance for protein binding and arginine was added for effective ionization. After the attached substrates were reacted with enzymes, laser irradiation was applied to heat the surface rapidly and caused a violent expansion of the initiator, which in turn triggered the vaporization of the adsorbed analyte molecules and generation of intact analyte ions. The ions could then be analyzed by mass spectrometry. In this way, the method not only spatially confines metabolites by attachment, but also allows efficient desorption/ionization upon vaporization and better conformational flexibility than traditional rigid covalent immobilization techniques.
Another popular immobilization platform is microarrays, which consist of ordered arrays that contain small amounts of known small molecules/proteins spotted at high density. The advantage of this technique lies in its ability to measure multiple binding interactions at the same time (multiplexing), which is very suitable for systematic and high-throughput analysis. Some studies have used protein arrays to screen for binding metabolites, such as carbohydrates and lipids , and other studies have used small-molecule arrays to detect binding proteins [53–56]. Microarrays can be coupled to a wide range of detection methods, the most popular of which are fluorescence and mass spectrometry.
Some immobilization studies even immobilize both proteins and metabolites. Roelofs et al. developed differential radial capillary action of ligand assay (DRaCALA), which separates free ligands from bound protein-ligand complexes by dry nitrocellulose through capillary action . The researchers deposited a mixture of proteins and radiolabeled small molecules onto a nitrocellulose membrane. The free small molecules moved with liquid through capillary action, while bound small molecules remained immobilized with proteins. In a few seconds, the capillary action was completed, and the amount of ligands that bound to proteins could be quantified. This method does not require a wash step after binding, which thus increases the throughput. In addition, DRaCALA assay allows near-equilibrium kinetic measurements. As a result of its simplicity and generality, DRaCALA has the potential for universal application in studying various interactions involving mobile ligands, including interactions between small molecules and proteins, DNA and proteins, as well as small molecules and DNA . However, ligands of interest need to be radioactively labeled in this assay, which limits its use for untargeted screenings.
In recent years, with the advances in computer science, in silico tools are increasingly popular for analyzing protein-ligand interactions. Various in silico methods are widely used, some of which are based on molecular or quantum mechanics, such as docking .
Docking is a powerful tool to allow accurate structural modeling and prediction of activities between small molecules and the binding sites of target proteins through proper search algorithms and scoring functions, as long as the structure of the binding site is known . This method is rapid and easily applicable to high-throughput screening. In addition, most available docking programs are able to predict with an accuracy of 1.5 to 2 Å and success rates of approximately 70%–80% . However, docking also has some weaknesses. For example, simplifications made in the scoring functions limit the further improvement of accuracy; the solvation effects and protein flexibility are often not accurately addressed .
Zhou et al. proposed a method called MetaSite to predict metabolic sites of proteins . This methodology compresses the protein-ligand interaction profile to a distance-based descriptor using GRID molecular interaction fields (GRID-MIFs). GRID-MIFs determine the energy of binding sites based on the force fields derived from X-ray protein-ligand complexes. In this study, MetaSite prediction had a success rate of 78%, an improvement over molecular docking (69% success rate), owing to more flexibility in use of protein structures allowed in MetaSite than in molecular docking.
If the three-dimensional structure of the target is unavailable, a homology model can be created. Homology modeling is a process that constructs a model of the characteristics of the target protein from its amino acid sequence and the available structures of its template homologous proteins . Kim et al. obtained the conformations of potential ligand binding sites for H+/K+ ATPase by homology modeling and ligand docking . Mandava et al. used a method similar to homology modeling called combinatorial peptide sequence analysis. This method compares sequences of a peptide population with the sequences of known structures containing a small-molecule binding site, and selects peptides with a binding site mimic. These small molecule affinity-selected peptide sequences can then be used to predict ligand-binding proteins without known structures based on sequence similarity. The authors created a database containing over 5000 peptide sequences selected for affinity to metabolites such as ATP, GTP and glucose and drugs, providing a powerful source for studying protein-metabolite interactions .
Machine learning methods can make predictions for new protein-ligand binding events based on existing interactions involving similar proteins and/or ligands from the training data. The concern with machine learning methods lies in their bias towards known interactions. Faulon et al. used a supervised learning method to predict enzyme-metabolite binding based on metabolic reaction and protein sequence information . This method utilized a graph-based representation of molecules known as “signature” to compare the similarity of two molecules. “Signature” represents a molecule by characterizing the molecular graph through decomposition into subgraphs, and the similarity between two molecules can be compared by their subgraphs. This novel study handled both proteins and chemicals using “signature” as a common representation, and enabled training directly on protein-chemical pairs.
A new trend in small-molecule-protein interaction research is to use in vitro binding results combined with computational methods such as molecular docking to obtain detailed binding information. Using this approach, the throughput of in vitro binding assays can be improved, and computational results can be validated [67, 68]. Ahlstrom et al. performed MetaSite and docking to characterize the interactions between proteins and their inhibitors and substrates, with subsequent validation by LC-MS/MS . Bertini et al. identified binding sites using NMR and used docking to optimize the structure of the binding adducts . This method overcame the speed limitation of NMR while providing stronger evidence for docking results. This method is more suitable when the protein structural change upon binding is not dramatic, since the calculations are based on the 3D structure of the free protein and any big structural change upon interaction would affect the accuracy of the prediction.
Although various in vitro and in silico methods have been developed and have revealed numerous interactions, those results cannot be easily translated to in vivo knowledge. The living cell is a very complicated system and biomolecules do not function alone. Moreover, our knowledge about cellular metabolites is limited so in vitro and in silico methods are not ideal for the discovery of novel metabolites. Finally, many proteins are subjected to post-translational regulations in vivo and this changes protein interactions with metabolites. Therefore, studying in vivo interactions is critical for understanding what really happens in a living cell . However, in contrast to the prosperity in studies using in vitro and in silico methods, few in vivo studies have been carried out largely because few in vivo methods have been developed.
Direct administration of drugs to humans and measurement of the drugs or metabolized drugs in urine or blood is a widely used in vivo method for studying pharmacokinetics of drugs, especially for drug-drug interactions . Reaction kinetics of drugs with drug metabolizing proteins such as the cytochrome P450 family is of particular interest . However, the information revealed from those studies is very limited. One must know about the substrate (drug) and its metabolized form before the study, so the method cannot be used in an untargeted approach. The metabolizing process within the body is complicated, so the effects observed may not be simply ascribed to direct interaction between the drug and the protein under investigation. Consequently, these studies do not provide straightforward answers on their own but must be combined with functional studies such as gene inactivation.
Inactivating the gene of a target protein is a relatively straightforward way to prove the existence of protein-metabolite interactions in vivo. If the protein-metabolite interaction is functional, the concentration or distribution of the metabolite will be altered after the protein-coding gene is inactivated . Saghatelian et al. knocked out fatty acid amide hydrolase (FAAH) in mice and compared metabolite profiles in knock-out mice with wild-type ones using mass spectrometry . This paper successfully identified both known and novel brain lipids regulated by the FAAH in vivo. The lipid metabolites identified in vivo by knockout experiments were significantly different from FAAH’s substrates in vitro, suggesting the presence of competing metabolic pathways in vivo. Non-substrate metabolites associated with FAAH can also be uncovered by this method. However, the gene knockout method shares the same flaw as direct administration of drugs. The change in metabolite levels may result from secondary effects as the result of deletion, such as some downstream regulatory interactions. Moreover, low throughput also affects its popularity.
Li et al. designed an in vivo protein-metabolite interaction profiling technology based on affinity purification . The authors overexpressed proteins in yeast cells tagged with IgG binding domain, and used the tag to pull down the proteins, which were still binding the metabolites from the in vivo condition. The bound metabolites were extracted and analyzed by LC-MS. This method is capable of directly identifying in vivo protein-metabolite interactions, and is capable of multiplexing during both sample preparation and metabolite identification processes, thus ensuring high-throughput and ease for systematic study. Due to the concern of losing hydrophilic molecules during purification, this study was limited to hydrophobic molecules since their binding to proteins are quite stable during purification. Transient interactions could not be identified for the same reason.
Previous studies have employed an abundant variety of in vitro and in silico techniques for investigating small-molecule-protein interactions, which are easily applicable for studying metabolite-protein interactions, especially for confirmative assays. However, to avoid artifactual interactions that often occur in non-physiological conditions, new techniques that detect in vivo interactions are increasingly utilized.
Currently, with the increasing awareness of the significance of both systems biology and personalized medicine, untargeted, high-throughput and systematic methodologies are increasingly valuable to reveal unexpected interactions and expand current knowledge of metabolite-protein interactions. These approaches can help researchers assemble and understand enzymatic and regulatory networks and connect biological pathways. Unraveling the interaction networks by systematic studies can further help in advancing medicine, such as preventing potential off-target effects in drug design. A metabolite regulating the disease-related protein could be used as the blueprint for a drug. In addition, other proteins whose functions are modulated by the same metabolite could be strictly monitored for possible side effects at early drug development stages to avoid greater financial loss at later stages. In extreme cases, metabolites application and dietary restriction might be used to treat patients since their toxicity will likely be lower than any other synthetic compounds.
Additionally, highly desired features of new techniques are those for detecting interactions in vivo, increased sensitivity, improved reliability, and more capability for systematic and high-throughput investigation. In particular, to study regulatory interactions, untargeted metabolite profiling followed by confirmative binding assays at the system scale will be more robust for providing quantitative information than either alone. Integration of novel metabolite-protein interactions into existing global networks will require new bioinformatics tools to extract overall information to describe how biological components operate at the system level [75–77]. Ultimately, temporal, spatial, and binding kinetic factors will have to be incorporated to build more accurate mathematical models that may lead to the discovery of novel natural metabolites as hub regulators. With a detailed understanding of global regulatory networks at the molecular level, personalized diagnostics and medicine will become more realistic.
We thank Dominic Wang for valuable comments on this manuscript. This work was supported by grants from the NIH.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.