|Home | About | Journals | Submit | Contact Us | Français|
Functional protein microarrays are emerging as a promising new tool for large-scale and high-throughput studies. In this article, we will review their applications in basic proteomics research, where various types of assays have been developed to probe binding activities to other biomolecules, such as proteins, DNA, RNA, small molecules, and glycans. We will also report recent progress of using functional protein microarrays in profiling protein posttranslational modifications, including phosphorylation, ubiquitylation, acetylation, and nitrosylation. Finally, we will discuss potential of functional protein microarrays in biomarker identification and clinical diagnostics. We strongly believe that functional protein microarrays will soon become an indispensible and invaluable tool in proteomics research and systems biology.
The fundamental principle of microarray technology was first put forward by Ekins et al over 20 years ago . Their ambient analyte theory stated that a tiny spot of purified antibody (or any other macromolecule) provides substantially better sensitivity than when used in conventional immunoassay formats. Fueled by large-scale genome sequencing projects, DNA microarray technology became the first application of this theory and has been widely used in gene expression profiling [2-6]. However, biological functions are carried out primarily by proteins rather than nucleic acids. Furthermore, RNA expression levels do not always correlate with protein expression levels, and it is almost impossible to predict biochemical properties of a protein encoded by a given gene simply based on its expression profiles [7,8]. Therefore, by focusing on studies of protein structures, functionalities, and protein-protein interactions one can more directly characterize biological function of a given gene.
Large-scale protein-centered analyses of gene function, however, have not been generally as fruitful as their DNA-centered counterparts for several reasons. First, the biochemical properties of proteins are far more diverse and complex than those of nucleic acids. Second, there is no ready method to amplify proteins for analysis, unlike PCR-based amplification of nucleic acids. Third, many proteins are prone to denature or degrade in standard buffer conditions and at ambient temperature, making them substantially more challenging to study. Therefore, technology for systematically assaying protein function that is both high-throughput and highly flexible is urgently needed. The past success of the DNA microarray technology highlights the power of a highly parallel, high-throughput platform that allows profiling of thousands of molecular targets in a single experiment. By the same token, protein microarray technology is now emerging as a promising new tool that can push proteomic studies to a new level. In the past decade, many methodologies based on the protein microarray technology have been successfully developed and applied to proteomic studies, including protein identification, quantification, and functional analysis of signaling pathways and networks, as well as clinical diagnostics and antibody characterizations.
A protein microarray, also known as a protein chip, is a solid surface (typically glass) on which thousands of different proteins (e.g., antigens, antibodies, enzymes, substrates, etc) are immobilized in discrete spatial locations, forming a high density protein dot matrix. Depending on their applications, protein microarrays can be classified into two types: the analytical and functional protein microarrays. Analytical protein microarrays are usually composed of well-characterized biomolecules with specific binding activities, such as antibodies, to analyze the components of complex biological samples (e.g., serum and cell lysates) or to determine whether a sample contains a specific protein of interest. They have been used for protein expression profiling, biomarker identification, cell surface marker/glycosylation profiling, clinical diagnosis, and environmental/food safety analysis. On the other hand, functional protein microarrays are constructed by printing a large number of individually purified proteins, and are mainly used to comprehensively query biochemistry properties and activities of those immobilized proteins. In principle, it is feasible to print arrays comprised of virtually all annotated proteins of a given organism, effectively comprising a whole proteome microarray. Functional protein microarrays have been successfully applied to identify protein-protein, protein-lipid, protein-antibody, protein-small molecules, protein-DNA, protein-RNA, lectin-glycan, and lectin-cell interactions, and to identify substrates or enzymes in phosphorylation, ubiquitylation, acetylation, and nitrosylation, as well as to profile immune response. In this review, we will mainly focus on the fabrication and application of functional protein microarrays.
Capture molecules with high specificity and selectivity are the essential prerequisite for the design and fabrication of both protein and DNA microarrays. Regardless of their sequences, the biochemical properties of DNA molecules are essentially the same, which allows the same chemistries to be applied to either immobilize to or synthesize in situ DNA strands on a solid surface. Therefore, the design and construction of oligonucleotide DNA microarrays are relatively straightforward. However, the protein world is much more complicated than DNA because of the vast differences among the structures, charge and hydrophobicity of individual proteins. This implies that the fabrication and analysis of protein microarrays is substantially less straightforward and standardizable than that of DNA microarrays. Unlike DNA or even RNA molecules, full-length proteins cannot be directly synthesized in vitro at high efficiency because of their complex chemistry. Although in vitro synthesis of peptides has been feasible for decades, it still suffers from low yield, high cost, and effective limitation to short sequences. Moreover, the vast majority of proteins must be correctly folded and modified to be functional after translation, which may require a complex molecular machinery of chaperones and other accessory molecules that cannot be fully recapitulated in vitro. Thus far, proteins used to construct high-content proteome microarrays have been all individually expressed and purified from live cells (see below).
Because proteins must fold correctly in order to be active, proteins are prone to inactivation due to loss of their native conformations when directly immobilized on a solid surface. Proteins vary greatly in categories and properties, while a carrier surface can only be modified by one or two kinds of chemical or biological group. All these factors pose challenges for optimizing the physical attachment of proteins used in microarray construction to the slide surface.
Choosing a proper surface for protein immobilization is crucial to the success of any assay performed using protein microarrays. An ideal surface should be able to retain protein functionality with relatively high signal-to-noise ratios, and possess both high protein-binding capacity and long shelf-life . Glass slides covered with polyvinylidene fluoride (PVDF), nitrocellulose membrane, or polystyrene were popular for protein microarray fabrication in the early days of the technology [10-12]. However, PVDF and polystyrene are relatively soft, allowing lateral spread of printed proteins, and hence limited density of proteins to be printed. Nitrocellulose membranes, in addition, tend to generate high background and low signal-to-noise ratio for most applications.
To bypass these shortcomings, researchers developed three dimensional matrix arrays, in which glass slides are coated with polyacrylamide or agarose to form a porous hydrophilic matrix in which proteins or antibodies are trapped within the pores and lateral diffusion is restricted, reducing the size of printed protein spots and thus increasing the maximal complexity of the array [13,14]. Protein activity is generally well-preserved in such matrix arrays, and their protein binding capacity is relatively high. For instance, Zhu et al utilized soft lithography to generate nanowells on a polydimethylsiloxane (PDMS) sheet placed on top of microscope slides . These nanowell chips were used to immobilize substrate proteins to profile phosphorylation specificity of 119 kinases encoded by budding yeast. The open structure of nanowells provides physical barriers and allows for sequential adding of different buffers, which is critical for multi-step experiments. The main disadvantage of this method is the requirement of specialized equipment needed to load nanowells at high density.
Other researchers printed proteins, antigens, or antibodies directly onto plain glass slides, which are usually coated with a bifunctional cross-linker with two functional groups, one reacting with the glass surface and the other with the desired proteins. For example, Schweitzer et al demonstrated in their study that protein microarrays fabricated on glass surface possess high sensitivities, wide dynamic range, and decent spot-to-spot reproducibility . MacBeath and Schreiber demonstrated with three proteins that thousands of protein spots could be immobilized to aldehyde-activated plain glass surfaces to form a high-density protein microarray that was suitable for a range of different classes of assays .
Although the technologies of arraying proteins on various types of surfaces at high-density were starting to mature by the end of last millennium, the main hurdle to their more general use remained the difficulty in producing the large number of different proteins needed for construction of a truly high-content array. Obviously, a readily useable high-throughput protocol for parallel production of thousands of different proteins is the key to this.
An early attempt led by the Lehrach group was to express human proteins in bacteria using a library consisting of random cDNAs . Individual cDNA clones of this library were robotically arrayed onto PVDF membrane laid on top of agar media and allowed to grow to full size. These cells were then lysed in situ to extract proteins, and then by incubation with a labeled test protein to identify interacting partners. Strictly speaking, human proteins bound to the nitrocellulose membranes were not purified – the vast majority of the proteins in every spot were bacterial proteins. Furthermore, the proteins were neither unique nor in native conformation, given the redundancy of the library and denaturing conditions used to break cells open. Though powerful as a screening technique in early days, this particular experimental strategy had limited general application.
To overcome these hurdles, the Snyder group created a high-throughput protein purification protocol in the budding yeast . Using a homologous recombination-based strategy, more than 5,800 full-length yeast open reading frames (ORFs) were cloned into a yeast expression vector that, upon galactose induction, produces glutathione-S-transferase (GST)-tagged N-terminal fusion proteins. The purification protocol took advantage of both a 96-well format and immobilized affinity chromatography. This strategy allowed parallel purification of unprecedented numbers of proteins – up to 1,152 per day. The success of this approach is built upon several unique aspects: First, it utilizes a eukaryotic expression system that both generates high levels of recombinant proteins and tends to produce a high fraction of soluble proteins. Compared with bacterial expression systems, in which a large fraction of recombinant proteins end up in inclusion bodies, this is a huge advantage when a large number of eukaryotic proteins are being generated. Second, the expression of recombinant proteins is only induced over about two total cell cycles, which greatly reduces toxicity and cell death. Third, a foreign eukaryotic protein purified from yeast is more likely to be active because post-translational modifications (PTMs) necessary for function are more likely to occur correctly than in either bacteria or a cell-free system. Forth, the use of an N-terminal GST tag helps protein fold correctly and therefore, improve its stability and solubility. Other commonly used tags include the so-called TAP-tag, MPB, and 6xHis, to name a few. In fact the same group later went on to build a TAP-tagged yeast ORF collection and purified >5,000 yeast proteins .
Another commonly used expression system is Escherichia coli. The procedures for automatic high-throughput protein expression/purification using the 6xHis tag has been developed . The subsequent protein purification takes advantage of immobilized Ni-NTA affinity chromatography . The 6xHis tag usually does not alter the properties of the fusion proteins, and the increment of molecular weight is less than 1 kDa.
Furthermore, it is selective and stable even under severe denaturing conditions [22,23]. Our group has recent reported a high-throughput protein purification protocol for 6xHis-tagged proteins in E. coli .
Despite the fact that high-throughput protein production in both prokaryotes and eukaryotes is now increasingly feasible, these protocols are still both labor-intensive and costly. Aside from the cost of protein production, fabrication of a proteome microarray requires construction of an expressible collection of full-length ORFs, which can be both challenging and expensive when dealing with higher eukaryotes with a large number of genes, such as humans. To explore alternative approaches, several groups have attempted to test the in vitro transcription/translation systems, such as the E. coli, wheat germ, and rabbit reticulocyte systems. In these systems, proteins can be expressed directly from cDNA templates , which can be obtained through PCR amplification without the lengthy and costly process of subcloning. For example, the E. coli cell-free protein expression system has been used to synthesize proteins in a 96-well format , and the improved wheat germ cell-free protein synthesis system has been applied to the in vitro expression of 13,364 human proteins . Furthermore, these systems can significantly decrease the reaction volume required for generation of recombinant proteins , which is also one of the advantages because the cost of in vitro expression system is rather high.
Such systems can also be applied to directly synthesize proteins on glass slides to fabricate so-called “in situ protein microarrays.” In the PISA (Protein In Situ Array) method, proteins are expressed directly from DNA in vitro and become attached to the array surfaces through recognition of a sequence that serves as an affinity tag . Similarly, in the NAPPA (Nucleic Acid Programmable Protein Array) technology, biotinylated cDNA plasmids encoding proteins as GST fusions are printed onto avidin-coated slides, together with anti-GST antibodies as the capture molecules . The cDNA array is then incubated with rabbit reticulocyte lysate to express the proteins, which become trapped by the antibodies adjacent to each DNA spot. Recently, NAPPA has been successfully expanded to high density arrays of 1000 different proteins . In addition, Tao et al developed a different method in which ribosomes are installed at the end of an RNA template to allow for the capture of the nascent polypeptides by a puromycin moiety that is grafted at one end of an oligonucleotide immobilized on a solid surface .
Another similar method is called DAPA (DNA Array to Protein Array), in which proteins are synthesized between two glass slides, one of which is arrayed with DNA while the other carries a specific affinity reagent to capture the proteins . In this approach, tagged proteins are synthesized in parallel from the DNA array, spread across the gap between the two slides, and then bind to the tag-capturing reagents on the other slide to form a protein array. Unlike the NAPPA method in which proteins are present together with DNA and the DNA array can only be used once, DAPA generates multiple copies of ‘pure’ protein arrays on a separate surface from the same DNA template, with at least 20 copies capable of being produced from a single template.
With regard to spotting proteins the two major mechanisms are contact and non-contact printing. Adapted from DNA microarray fabrication, the robotic contact printing tool is the most suitable for producing protein microarrays of high content due to the requirement to array large numbers of different proteins. Metal pins with solid or quill tips are used in contact printers to deliver sub-nanoliter of protein samples to the slide surface. Quill pins, which have a larger sample capacity, can print hundreds of spots continuously after each sample loading. The printed spots are typically circular and the size depends largely on the pin tip dimension, surface chemistry, and the printing buffer. A significant advantage of this type of microarrayers is their speed and throughput - up to 48 pins can be loaded and more than 200 slides printed at a time. However, the pins are very fragile and expensive, and the pin tips may damage the slide surface, especially complex 3D substrates (e.g., nitrocellulose-coated slides). Furthermore, some proteins are sticky to metal and the general washing steps may not clean them completely from the pins, and thus give rise to cross-contamination of protein samples and carry-over problem.
To address these issues, non-contact dispensing techniques have been developed for printing protein microarrays, by which a small droplet of protein sample is delivered to the slide surface without touching it. Droplets can be generated by conventional ink-jet, piezoelectric pulsing, or electrospray deposition [34-36]. Unlike contact printing, the amount of liquid deposited by non-contact printers is not dependent on surface properties of the slide, and significant better spots morphology has been observed on hydrophobic surfaces using non-contact printing compared to contact printing . In addition to standard glass slides, non-contact printers can also print on membranes. However, such instruments usually suffer from longer printing time and fewer pins, which is a significant drawback when printing a large number of protein samples. Moreover, non-contact printers can sometimes misplace spots and/or generate satellite spots, resulting in a high failure rate . An additional disadvantage of non-contact printers is that they usually require a larger sample volume, which is challenging and expensive for high-throughput protein production.
The physical and chemical properties of different proteins vary greatly, and protein activities are closely related to their structures. Therefore, the development of a stable, universal immobilization method that does not change protein structures is one of the difficulties of protein microarray fabrication. So far several different methods have been used for protein immobilization on solid carrier surfaces, such as noncovalent adsorption, covalent binding, and affinity capture.
Noncovalent adsorption provides both high protein capacities and low impact on protein structures, but cannot control the amount and orientation of immobilized proteins. Thus the reaction efficiency, accuracy, and reproducibility of arrays produced in this manner are variable. Covalent binding, on the other hand, results in chemically cross-linked proteins via reactive residues (e.g., lysine and cystine) to surface-grafted ligands, such as aldehyde, epoxy, reactive ester, etc [17,39,40]. Lee et al developed novel calixcrown derivatives as a ProLinker that permits efficient immobilization of captured proteins on solid matrixes, and the immobilized proteins showed both consistent directionality and functionality . Covalent binding is suitable for immobilization of a wide range of proteins with strong conjunctions to the carrier surfaces. However, the modification of chemical groups can sometimes both alter the activities of target proteins and their binding to specific ligands.
Affinity capture is an attractive way to immobilize proteins that avoids many of the shortcomings of the previously detailed approaches. For example, biotinylated proteins have been used for protein immobilization to streptavidin-coated slides. The use of genetically encoded affinity tags, which can be fused to target proteins and bind to a specific slide surface, is an analogous approach. For example, 6xHis-tags have been utilized to immobilize proteins on nickel-NTA coated glass slides . Presumably, affinity-based protein immobilization should result in immobilization of proteins in relatively uniform orientation with minimum interruption of protein structure, and thus may be the best approach to for preserving the structure and function of printed. One important caveat to bear in mind, however, is that the incorporation of affinity tags may alter the protein structures.
One way to deal with this challenge was demonstrated by Zhang et al., who developed a flexible polypeptide scaffold consisting of a surface immobilization domain and a protein capture domain, which allows much greater flexibility in the immobilization of proteins on a microarray . Wacker et al compared the DNA-directed immobilization (DDI) method with both direct spotting and with biotin-streptavidin affinity immobilization for antibodies . DDI is based on the self-assembly of semisynthetic DNA-streptavidin conjugates that converts a DNA oligomer array into an antibody array . DDI and direct spotting showed the highest fluorescence intensities. DDI also performed the best in spot homogeneity and intra- and inter-experimental reproducibility. Moreover, DDI required the lowest amount of antibodies, at least 100-fold less than direct spotting. The drawback of DDI is that proteins have to be linked to DNA prior to immobilization, which increases the workload involved in generating microarrays.
The orientation of immobilized proteins may influence both their activity and their affinity for specific ligands. Peluso et al compared randomly versus specifically oriented capture agents based on both full-sized antibodies and Fab' fragments . The specific orientation of capture agents consistently increased the analyte-binding capacity of the surfaces up to 10-fold relative to surfaces with randomly oriented capture agents. When specifically oriented, Fab' fragments formed a dense monolayer and 90% of them were active, while randomly attached Fab's both packed at lower density and had lower specific activity.
In addition to optimized surface modification and optimized reaction condition, the detection sensitivity of samples bound on microarrays is another key parameter in the design of protein microarray assays. There are two basic detection methods: label-dependent and label-free detections.
Radioisotopes and fluorescent dyes are the two most common labeling methods for signal detection in protein microarray assays. Fluorescent dyes, such as Cy-3/5 and their equivalent, have been used as a popular labeling method. Because most good dyes have relatively narrow excitation and emission spectra, multi-color scheme can be readily implemented for simultaneous detection and direct comparison of different samples, both reducing cost and avoiding chip-to-chip variation. Semiconductor quantum dot labeling, which is brighter and more stable than organic dyes, has also been applied to protein microarrays [46,47].
In addition to fluorescent labeling, Huang et al detected multiple cytokines on an antibody array with enhanced chemiluminescence (ECL), providing an alternative detection method . Enzymatic signal amplification is also a valuable labeling method. Rolling circle amplification (RCA) has been developed for protein microarray assays. For low abundance protein samples, the sensitivity of traditional fluorescence or chemiluminescence detection is relatively low, while RCA can detect captured proteins at fmol level and is promising to improve the sensitivity of fluorescent detection [16,49-52]. Tyramide signal amplification (TSA) is another way to amplify signals with enzymes, which utilizes the horseradish peroxidase conjugated on secondary antibodies to convert the labeled substrates (tyramide) into short-lived, extremely reactive intermediates, which then very rapidly react with and covalently bind to adjacent proteins .
For some biochemical assays, especially enzymatic reactions, use of radioisotopes is the only detection method available (see below for more details). They still offer the most sensitive and reliable detection of PTM events when there is a lack of high quality and high affinity detection reagents, such as antibodies. We and others have successfully applied 32P-, 33P-, and 14C-labeled substrates to detect protein phosphorylation and acetylation events [54,55].
One obvious disadvantage of label-dependent detection is the requirement of either manipulating structure of a probe or a specific antibody. It is not amenable to real-time detection, which can provide important information when analyzing reaction dynamics. Therefore, label-free detection methods have also been investigated for protein microarrays. Surface Plasmon resonance (SPR) is a label-free technology to analyze biomolecular interactions in real-time, and has been adapted for protein microarray signal detection [56,57]. Based on the principle that incident light can resonate with plasma on a metal surface in total internal reflection, the resonance signals will change when ligands bind to (and dissociate from) ligands on the array surface. Binding event can thus be monitored and the kinetic parameters calculated in real-time. Mass spectrometry has also been used for detecting ligands bound to individual proteins printed on protein microarrays, with such approaches as MALDI-MS, SELDI-TOF-MS, and MALDI-TOF-MS used for this purpose [58-60]. The analysis is rapid and simple, requires small sample amount, and can be used for direct detection of analytes bound from complex samples, such as urine, serum, plasma, and cell lysates. Atomic force microscopy (AFM) uses surface topological changes to identify the analytes bound on the array [61,62]. More specifically, AFM detects the increase in height of the proteins/antibodies on the array, and thus is able to measure binding interactions.
Unlike the DNA/oligo microarray or analytical protein microarrays, functional protein microarrays provide a flexible platform that allows development and detection of a wide range of protein biochemical properties. To date, well-developed assays include detection of various types of protein-ligand interactions, such as protein-protein, protein-DNA, protein-RNA, protein-lipid, protein-drug, and protein-glycan interactions [17,18,24,63-69], and identification of substrates of various classes of enzymes, such as protein kinase, ubiquitin E3 ligase, and acetyltransferase, to name a few [15,54,55,70,71]. Application of these assays has had a profound impact on a wide range of research areas. This is especially true when they are used in large-scale, high-throughput projects, exemplified in both network construction and biomarker identification (see below and Table 1).
Among the first applications of protein microarrays was in the analysis of protein-protein and protein-lipid interactions, where test ligands were directly or indirectly labeled with fluorescent dyes. For example, Zhu et al developed the first proteome microarray composed of ~5800 recombinant yeast proteins (>85% of the yeast proteome) and identified binding partners of calmodulin and phosphatidylinositides (PIPs) . They first incubated the microarrays with biotinylated bovine calmodulin and discovered 39 new calmodulin binding partners. In addition, using liposomes as a carrier for various PIPs, they identified more than 150 binding proteins, >50% of which were known membrane-associated proteins. Popescu et al developed a protein microarray containing 1,133 Arabidopsis thaliana proteins and also used it to globally identify proteins bind to calmodulins or calmodulin-like proteins in Arabidopsis . A large number of previously known and novel targets were identified, including transcription factors (TFs), receptor and intracellular protein kinases, F-box proteins, RNA-binding proteins, and proteins of unknown function. Alternative approaches to identifying protein-protein interactions, such as the yeast two-hybrid system and protein complex purification coupled with mass spectrometry analysis, are well-established, however, and are used as standard high-throughput methods to detect protein-protein interactions in higher eukaryotes [72,73]. Thus, while protein microarray-based approaches provide a rapid approach to characterizing protein-protein interactions, they have much competition in this arena.
MacBeath and colleagues fabricated protein domain microarrays to investigate protein-peptide interactions in a semi-quantitative fashion that might play an important role in signaling . They constructed an array by printing 159 human Src homology 2 (SH2) and phosphotyrosine binding (PTB) domains on the aldehyde-modified glass substrates, and incubated the arrays with 61 peptides representing tyrosine phosphorylation sites on the four ErbB receptors. Eight concentrations of each peptide (10 nM to 5mM) were tested in the assay, allowing quantitative measurement of the binding affinity of each peptide to its protein ligand.
Protein microarrays have also been applied extensively and productively to characterize protein-DNA interactions (PDIs). In an earlier study, Snyder and colleagues screened for novel DNA-binding proteins by probing the yeast proteome microarrays with fluorescently labeled yeast genomic DNA . Of the ~200 positive proteins, half were not previously known to bind to DNA. By focusing on a single yeast gene, ARG5,6, encoding two enzymes involved in arginine biosynthesis, they discovered that it bound to a specific DNA motif and associated with specific nuclear and mitochondrial loci in vivo.
In a later report, the Snyder and Johnston groups constructed a protein microarray with 282 known and predicted yeast TFs to indentify their interactions with 75 evolutionarily conserved DNA motifs . Over 200 specific PDIs were identified and >60% of them are previously unknown. The binding site of a previously uncharacterized DNA-binding protein, Yjl103p, was defined and a number of its target genes were identified, many of which are involved in stress response and oxidative phosphorylation.
Our team developed a bacterial proteome microarray composed of 4,256 proteins encoded by the Escherichia coli K12 strain (~99% coverage of the proteome) using a bacterial high-throughput protein purification protocol . To demonstrate the usefulness, end-labeled, double-stranded (ds) DNA probes carrying abasic or mismatched base pairs were used to identify proteins involved in DNA damage recognition. A small number of proteins were specifically recognized by each type of the probes with high affinity. Two of them, YbaZ and YbcN, were further characterized to encode base-flipping activity using biochemical assays.
Recently, our group also undertook a large-scale analysis of human PDIs using a protein microarray composed of 4,191 unique human proteins in full-length, including ~90% of the annotated TFs and a wide range of other protein categories, such as RNA-binding proteins, chromatin-associated proteins, nucleotide-binding proteins, transcription co-regulators, mitochondrial proteins and protein kinases . The protein microarrays were probed with 400 predicted and 60 known DNA motifs and a total of 17,718 PDIs were identified. Many known PDIs and a large number of new PDIs for both well characterized and predicted TFs were recovered, and new consensus sites for over 200 TFs were determined, which doubled the number of previously reported consensus sites for human TFs [66,75]. Surprisingly, over 300 proteins that were previously unknown to specifically interact with DNA showed sequence-specific PDIs, suggesting that many human proteins may bind specific DNA sequences as a moonlighting function. To further investigate whether the DNA-binding activities of these unconventional DNA binding proteins (uDBPs) were physiologically relevant, we carried out in-depth analysis on a well studied protein kinase, Erk2, to determine the potential mechanism behind its DNA-binding activity. Using a series of in vitro and in vivo approaches, such as EMSA, luciferase assay, mutagenesis, and ChIP, we demonstrated that the DNA-binding activity of Erk2 is independent of its protein kinase activity and it acts as a transcription repressor of transcripts induced by interferon gamma signaling . Our study suggests that moonlighting functions of uDBPs based on their sequence-specific DNA-binding activity may be a widespread phenomenon in humans.
Discovering new drug molecules and drug targets is another field in which protein microarrays have shown its potential. For example, Huang et al incubated biotinylated small-molecule inhibitors of rapamycin (SMIRs) on the yeast proteome microarrays, and obtained the binding profiles of the SMIRs across the entire yeast proteome . They identified candidate target proteins of the SMIRs, including Tep1p, a homolog of the mammalian PTEN tumor suppressor, and Ybr077cp (Nir1p), a protein of previously unknown function, both of which are validated to associate with PI(3,4)P2, suggesting a novel mechanism by which phosphatidylinositides might modulate the TOR (target of rapamycin) pathway.
The yeast proteome microarray has been used to identify specific RNA-binding proteins for antiviral activities . In these experiments, arrays were incubated with a fluorescently tagged small RNA hairpin containing a clamped adenine motif (CAM), which is required for the replication of Brome Mosaic Virus (BMV), a plant-infecting RNA virus that can also replicate in the budding yeast. Two of the candidate proteins, Pseudouridine Synthase 4 (Pus4) and the Actin Patch Protein 1 (App1), were further characterized in Nicotiana benthamiana. Both of them modestly reduced BMV genomic plus-strand RNA accumulation and dramatically inhibited the spread of BMV in plants.
Protein glycosylation, a general PTM of proteins involved in cell membrane formation, is crucial to dictate proper conformation of many membrane proteins, retain stability on some secreted glycoproteins, and play a role in cell-cell adhesion. To further understand the roles of protein glycosylation in yeast, the Zhu and Snyder teams reasoned that since proteins on the yeast proteome microarrays are expressed in their original host and therefore, which should maintain most of their PTMs, these arrays can be used to profile glycosylation using fluorescently labeled lectins, such as Concanavalin A (ConA) and Wheat-Germ Agglutinin (WGA) . A total of 534 proteins were identified, 406 of which were previously not known to be glycosylated. Many proteins in the secretory pathway were identified, as well as other functional classes of proteins, including TFs and mitochondrial proteins. Upon treatment with tunicamycin, an inhibitor of N-linked protein glycosylation, two of the four mitochondrial proteins identified showed partial distribution to the cytosol and reduced localization to the mitochondria, suggesting a new role of protein glycosylation in mitochondrial protein function and localization.
Protein phosphorylation plays a central role in almost, if not all, aspects of cellular processes. The application of protein microarray technology to protein phosphorylation was first demonstrated by Zhu et al . They immobilized 17 different substrates on a nanowell protein microarray, followed by individual kinase assays with almost all of the yeast kinases (119/122). This approach allowed them to determine the substrate specificity of the yeast kinome and identify new tyrosine phosphorylation activity.
In a later report, Snyder's group accomplished a large scale “Phosphorylome Project” using the yeast proteome microarrays . Eighty seven purified yeast kinases or kinase complexes were individually incubated on the yeast proteome arrays in a kinase buffer in the presence of 33P-γ-ATP and a total of 1,325 distinct protein substrates were identified, representing a total of 4,129 phosphorylation events. These results provided a global network that connect kinases to their potential substrates and offered a new opportunity to identify new signaling pathways or cross-talk between pathways.
Several smaller scale studies of kinase-substrate interactions have been reported. For instance, Popescu et al probed 10 Arabidopsis mitogen-activated protein kinases (MPKs) to protein microarrays containing 2158 Arabidopsis proteins and identified 570 putative MPK phosphorylation targets, which were enriched in transcription factors involved in the regulation of development, defense, and stress responses . A commercially available human protein microarray comprised of approximately 3,000 individual proteins was used to identify substrates of cyclin-dependent kinase 5 (Cdk5), a serine/threonine kinase that plays an important role during CNS development .
Ubiquitylation is one of the most prevalent PTMs and controls almost all types of cellular events in eukaryotes. To establish a protein microarray-based approach for identification of ubiquitin E3 ligase substrates, Lu et al developed an assay for yeast proteome microarrays that utilizes a HECT-domain E3 ligase, Rsp5, in combination with the E1 and E2 enzymes . More than 90 new substrates were identified, eight of which were validated as in vivo substrates of Rsp5. Further in vivo characterization of two substrates, Sla1 and Rnr2, demonstrated that Rsp5-dependent ubiquitylation affects either posttranslational process of the substrate or subcellular localization.
Histone acetylation and deacetylation, which are catalyzed by histone acetyltransferases (HATs) and histone deacetylases (HDACs), respectively, are emerging as critical regulators of chromatin structure and transcription. However, it has been hypothesized that many HATs and HDACs might also modify non-histone substrates. For example, the core enzyme, Esa1, of the essential nucleosome acetyltransferase of H4 (NuA4) complex, is the only essential HAT in yeast, which strongly suggested that it may target additional non-histone proteins that are crucial for cell to survive. To identify non-histone substrates of the NuA4 complex, Lin et al established and performed acetylation reactions on the yeast proteome microarrays using the NuA4 complex in the presence of [14C]-Acetyl-CoA as a donor . Surprisingly, 91 proteins were found to be readily acetylated by the NuA4 complex on the array. To further validate these in vitro results, 20 of them were randomly chosen and 13 of them showed Esa1-dependent acetylation in cells. One of them, phosphoenolpyruvate carboxykinase (Pck1p), was further characterized to explore the possible link between acetylation and metabolism. Mass spectrometry assay revealed Lys19 and 514 as the acetylation sites of Pck1p, and mutagenesis analyses demonstrated that acetylation on K514 is critical to enhance Pck1p's enzyme activity and results in longer life span for yeast cells growing under starvation. This study offers a molecular link between the HDAC Sir2 and yeast longevity.
S-nitrosylation is independent of enzyme catalysis but an important PTM that affects a wide range of proteins involved in many cellular processes. Recently, Foster et al developed a protein microarray-based approach to detect proteins reactive to S-nitrosothiol (SNO), the donor of NO+ in S-nitrosylation, and to investigate determinants of S-nitrosylation . S-nitrosocysteine (CysNO), a highly reactive SNO, was added to the yeast proteome microarray and the nitrosylated proteins were then detected using a modified biotin switch technique. The top 300 proteins with the highest relative signal intensity were further analyzed and the results revealed that proteins with active-site Cys thiols residing at N termini of alpha-helices or within catalytic loops were particularly prominent. However, substantial variations of S-nitrosylation were observed even within these protein families, indicating that secondary structure or intrinsic nucleophilicity of Cys thiols was not sufficient to interpret the specificity of S-nitrosylation. Further analyses revealed that NO-donor stereochemistry and structure had significant impact on S-nitrosylation efficiency.
Though the applications described above are most useful in basic research, functional protein microarrays may have enormous impacts on clinical diagnosis and prognosis. When proteins on a functional protein microarray are viewed as potential antigens that may or may not associated with a particular disease, it becomes a powerful tool in biomarker identification. The principle is straightforward: when an auto-antibody presented in human sera associated with a human disease (e.g., auto-immune diseases) recognizes a human protein spotted on the array, it can be readily detected with fluorescently labeled anti-human immunoglobulin antibodies (e.g., anti-IgG) and a profile of auto-antibodies associated with a disease thus created, providing a rapid approach to identifying potential disease biomarkers. For example, Robinson et al reported the first application of protein microarray technology to profile multiple human disease sera . They constructed a microarray with 196 biomolecules shown to be autoantigens in eight human autoimmune diseases, including proteins, peptides, enzyme complexes, ribonucleoprotein complexes, DNA, and post-translationally modified antigens. The arrays were incubated with patient sera to study the specificity and pathogenesis of autoantibody responses, and were used to identify and define relevant auto-antigens in human autoimmune diseases, including systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA).
Hu et al reported a new approach for high-throughput characterization of monoclonal antibody (mAb) target specificity using a protein microarray composed of 1058 unique human liver proteins . They immunized mice with live cells from human livers, isolated 54 hybridomas with binding activities to human cells, and identified the corresponding antigens for 5 mAbs via screening on the protein microarray. Expression profiles of the corresponding antigens of the 5 antibodies were characterized by using tissue microarrays and one of the antigens, eIF1A, was found to be expressed in normal human liver but not in hepatocellular carcinoma. Other applications include biomarker identification for ovarian cancer , inflammatory bowel disease , alopecia areata , and autoimmune hepatitis , etc.
Protein microarrays can also be used for detection of infectious diseases. Zhu et al developed a coronavirus protein microarray for the diagnosis of severe acute respiratory syndrome (SARS), which included all the SARS-CoV proteins as well as proteins from five additional coronaviruses that can infect humans (HCoV-229E and HCoV-OC43), cows (BCV), cats (FIPV), and mice (MHVA59) . These microarrays could quickly distinguish patient serum samples as SARS-positive or SARS-negative based on the presence of human IgG and IgM antibodies against SARS-CoV proteins, with a 94% accuracy compared to standard diagnostic methods. Patients carrying antibodies against other coronavirus proteins were also identified. The advantages of this microarray-based assay to standard ELISA-based diagnostic methods include at least 100-fold higher sensitivity and the need for substantially less sample for analysis.
Recent years have witnessed a rapid growth in using functional protein microarrays for basic research . Although the technology is still at a relatively early stage of development, it has become obvious that the protein microarray platform can and will act as a versatile tool suitable for the large-scale, high-throughput biology, especially in the areas of profiling PTMs and in analysis of signal transduction networks and pathways [54,66]. As another crucial proteomics technology, recent progress in mass spectrometry has allowed global profiling of PTMs using a shotgun approach. For example, the Zhao, Mann, and Guan groups recently identified numerous acetylated lysine residues in metabolic enzymes in mice and human cells without knowing the upstream HATs [86-88]. In parallel, our team also identified many yeast metabolic enzymes as substrates of the NuA4 acetylation complex without knowing the actual modified sites. Therefore, we envision that the combination of the two technologies will have enormous potential to both identify critical regulatory PTMs at the resolution of modified individual amino acids and to identify the enzymes that mediate these effects. Another emerging direction is in the forefront of understanding the molecular mechanisms of pathogen-host interactions. In the same manner in which we identified host proteins that recognized the SLD loop of the BMV virus, functional protein microarrays (e.g., a human protein microarray) can be used to discover those host proteins targeted by pathogens (e.g., HIV, HCV, and SARS-CoV). The identification of the host targets of a virus will provide alternative therapeutics that cannot be rapidly evaded via mutation of the viral genomes. In conclusion, the potential of functional protein microarrays is only just now starting to reveal itself. It is expected that it will become an indispensible and invaluable tool in proteomics and systems biology research.
We thank the NIH for funding support.