|Home | About | Journals | Submit | Contact Us | Français|
Although many methods exist to study the recognition and signaling properties of proteins in isolation, it remains a challenge to perform these investigations on a system-wide or proteome-wide scale and within the context of biological complexity. Protein microarray technology provides a powerful tool to assess the selectivity of protein–protein interactions in high-throughput and to quantify the abundances and post-translational modification states of many different proteins in complex mixtures. Here, we provide an overview of the various applications of protein microarray technology and compare the strengths and technical challenges associated with each approach. Overall, we conclude that if this technology is to have a substantial impact on our understanding of cell biology and physiology, increased emphasis must be placed on obtaining rigorously controlled, quantitative data from protein function microarrays and on assessing the selectivity of reagents used in conjunction with protein-detecting microarrays.
Over the past two decades, new tools have been developed that allow researchers to study the genome in high-throughput and with high fidelity. Fewer tools exist, however, to study the proteome. This discrepancy stems from fundamental differences between nucleic acids and proteins. Nucleic acids are relatively uniform in their physicochemical properties and can be recognized with exquisite selectivity through the process of complementary base pairing. This mechanism forms the basis for all current techniques in genome analysis, including DNA sequencing and expression profiling. Proteins, on the other hand, are biochemically diverse and their functions are inextricably linked to their three-dimensional conformations. At present, we lack the ability to predict the recognition properties of proteins using primary sequences alone and to design protein-detecting reagents that recognize individual proteins in complex mixtures. As neither DNA sequence nor mRNA abundance fully informs protein function or activity, our understanding of the proteome lags far behind our understanding of the genome. In this review, we outline how protein microarray technology is currently being used to bridge this gap and what challenges must be overcome before it is established as a routine tool to dissect the proteome.
Protein microarrays can be divided into two general categories: ‘protein function microarrays’ and ‘protein-detecting microarrays’ (Fig. 1) . Protein function microarrays comprise purified proteins, protein domains, or peptides, and are generally used to study molecular recognition or to screen for putative interaction partners. Protein-detecting microarrays, on the other hand, rely on reagents that recognize proteins in a selective fashion (e.g., antibodies) and are used to quantify the abundances and post-translational modification states of proteins in complex mixtures (e.g., cell lysates, tumor biopsies, and serum). The following discussion will address each of these categories separately.
One of the primary goals of functional proteomics is to understand molecular recognition within the context of the proteome. Protein function microarrays provide a powerful way to assess binding selectivity across entire families of related proteins and, in the limit, across entire proteomes. In 2000, MacBeath and Schreiber showed that purified, recombinant proteins could be microarrayed on chemically derivatized glass substrates in a way that preserves their function . Since then, variations of this technology have been used to study large collections of recombinant proteins. One approach that has been particularly successful is to focus on families of interaction domains.
Protein interaction domains mediate the formation of multi-protein complexes that confine signaling proteins to appropriate subcellular locations and help determine the selectivity of enzyme–substrate interactions. In 2002, Espejo et al. showed that a variety of interaction domain families (WW, SH3, SH2, 14.3.3, FHA, PDZ, PH and FF) retain their ability to mediate selective interactions when abstracted from their full-length parent proteins and printed in a microarray format . Shortly afterwards, Newman and Keating used microarrays of coiled-coil domains to study the selectivity of pairing between human basic region leucine zipper (bZIP) transcription factors . This latter study provided an unbiased view of protein–protein interactions and revealed a surprisingly high level of binding-partner selectivity. More recently, Keating and coworkers have used their arrays to assess the selectivity of computationally designed peptide ligands . In a similar fashion, protein microarrays have been used to generate interaction maps for WW domains in yeast ; to investigate the association of the Kaposi Sarcoma-associated herpes virus protein K15 with the human endocytic machinery ; and to assess interactions between myelin basic proteins and SH3 domains . In each case, the information gained from screening a panel of potential binding partners was used to guide more in-depth investigations into protein function.
While these studies all uncovered valuable information regarding biophysical interactions, the data they generated were binary in nature: proteins were reported either to ‘interact’ or ‘not interact’. A complete understanding of molecular recognition, however, requires quantitative information. Additionally, by measuring binding affinities, researchers can gauge the physiological relevance of observed interactions and gain insight into the interplay of proteins in the complex environment of a living cell, where proteins often compete for the same targets. Quantitative protein microarrays were initially described in 2004, when Boutell et al. prepared microarrays comprising 50 allelic variants of p53 and probed the arrays with different concentrations of labeled GADD45-DNA . From the resulting saturation binding curves, they were able to obtain equilibrium dissociation constants (KD’s) for each binding event. In a similar fashion, Jones et al. prepared microarrays of human SH2 and PTB domains and probed the arrays with different concentrations of phosphotyrosine-containing peptides derived from the four human ErbB receptors (Fig. 2) . The resulting quantitative interaction maps revealed a previously unrecognized property of receptor tyrosine kinases (RTKs): they differ in the extent to which they become more promiscuous when overexpressed. Intriguingly, this property correlates with oncogenicity, suggesting that some RTKs may promote cancer by activating signaling pathways that are not turned on under normal conditions.
Protein microarrays have also been used in a quantitative fashion to characterize enzymes. For example, Funeriu et al. prepared microarrays of six members of the cathepsin family of cysteine proteases . Their work demonstrated three basic applications for enzyme microarrays; they can be used to determine enzyme kinetics, to screen for enzyme inhibitors in high-throughput, and to obtain dose-response curves for selected compounds.
In some cases, protein domains mediate interactions that are too weak to quantify directly using protein microarrays. For example, most PDZ domains bind their peptide ligands with dissociation constants between 2 µM and 50 µM. Stiffler et al. showed, however, that microarrays can be used as a high-fidelity, high-throughput screening tool to identify putative domain–peptide interactions, which can then be retested and quantified in a more targeted fashion . By choosing an appropriate intensity threshold, they showed that protein microarrays highlight biophysical interactions with a false positive rate of 19% and a false negative rate of 6%.
One of the key advantages of protein domain microarrays is that they provide a way to assess binding selectivity in an unbiased fashion and on a family-wide scale. When performed in a quantitative fashion, these data can be used to train predictive models of molecular recognition. MacBeath and coworkers recently prepared microarrays of 157 mouse PDZ domains and used these arrays to survey interactions with over 200 peptides derived from the C-termini of mouse proteins . The resulting interaction dataset was used to construct statistical models that capture the binding preferences not only of mouse PDZ domains, but of PDZ domains derived from other organisms as well [13,14]. Interestingly, their models showed that PDZ domains do not fall into discrete classes, as had previously been assumed. Instead, they lie on a functional continuum and it appears that their binding selectivity has been optimized across the proteome in order to minimize cross-reactivity .
Many of the above studies focused on interactions between immobilized domains and solution-phase peptides. As an alternative approach, microarrays of short peptides can be prepared and interrogated with solution-phase proteins (Fig. 1b). Standardized methods now exist to synthesize peptides in situ in a microarray format (‘SPOT’ synthesis) [15,16]. Peptide microarrays are particularly useful when the objective is to screen one or a few proteins against a much larger number of potential binding partners. In one application of SPOT synthesis, Katz et al. prepared a microarray of 59 overlapping peptides spanning the length of three anti-apoptotic Bcl-2 family proteins . The array was then probed with a fragment of the pro-apoptotic protein ASPP2. The authors identified two homologous binding sites on the Bcl-2 proteins, which were found to be associated with pro- and anti-apoptotic functions, respectively. In an interesting variation of SPOT synthesis, Boisguerin et al. developed an efficient way to prepare microarrays of inverted peptides displaying their C-termini . They then used this method to study interactions mediated by the PDZ domains of α-1-syntrophin and Erbin.
Peptide microarrays also provide a convenient way to study the recognition properties of enzymes. For example, Hilhorst et al. used commercially available microarrays of 140 peptides to characterize the substrate selectivity of protein kinase A (PKA) . By measuring the levels of phosphorylated peptides at different times of incubation with PKA, they were able to determine reaction kinetics for each peptide in a multiplex fashion.
Microarrays featuring more than 1,100 peptide substrates of human kinases are now available and have been used to study glucocorticoid-mediated effects on T-cell receptor-initiated signal transduction ; to identify kinases involved in angiotensin II–dependent hypertensive renal fibrosis ; to characterize the kinome in Barret’s esophagus ; and to study the effects of COX-2 inhibition on kinase signaling pathways in colorectal cancer cells . In each case, the arrays were used as a screening tool to identify active kinases. Traditional biochemical techniques were then used to corroborate the microarray findings and to investigate the individual roles of the kinases. Overall, these studies show how peptide microarray technology can be used as a discovery tool to generate new biological hypotheses.
The ultimate extension of protein microarray technology is to prepare arrays comprising entire proteomes. In a groundbreaking study in 2001, Snyder and coworkers expressed, purified, and arrayed 5,800 yeast proteins. They then probed the resulting microarrays with six different phospholipids to reveal lipid-binding proteins . In a subsequent study, they used their yeast proteome microarrays to highlight in vitro substrates of 87 protein kinases . By integrating their kinase–substrate data with large-scale protein–protein interaction data and transcription factor binding data, they assembled a global regulatory network for yeast that uncovered several frequently used regulatory modules. The insight they gained into the logic of protein regulation could only have been obtained using this proteome-wide approach. In a similar fashion, Popescu et al. constructed microarrays comprising 2,158 unique Arabidopsis thaliana proteins and used these arrays to identify 570 phosphorylation substrates of mitogen-activated protein kinases . The majority of substrates highlighted in their screens were not previously known and overall the list of substrates was enriched in transcription factors involved in the regulation of development, defense, and stress response. Several substrates were validated by subsequent reconstitution experiments, showing that whole proteome arrays provide a powerful way to identify biologically relevant interactions.
Proteome-wide microarrays also enable surprising biological discoveries. Lin et al. recently incubated yeast proteome microarrays with the essential nucleosome acetyl transferase of H4 (NuA4) complex . Unexpectedly, they identified many nonchromatin substrates. Most notably, they found that the metabolic enzyme phosphoenolpyruvate carboxykinase is a physiological substrate of NuA4 and that acetylation of this enzyme plays a role in regulating the chronological life span of yeast.
Investigations of this nature are now much more accessible, as human proteome arrays are commercially available. These arrays are particularly useful for identifying biomarkers of disease. For example, human proteome arrays were used to detect autoimmune response markers in ovarian  and breast cancer , and Escherichia coli proteome arrays were used to identify markers of inflammatory bowel disease . Whole proteome arrays also serve as a convenient discovery tool that can be interfaced with sophisticated biochemistry. Merbl et al. recently incubated human proteome microarrays with cell extracts that replicate the mitotic checkpoint and anaphase release . By developing their arrays with antibodies that recognize polyubiquitination, they were able to identify novel substrates of the anaphase-promoting complex. While proteome chips have not yet been used in a quantitative fashion to characterize and model binding selectivity, they are proving extremely useful as unbiased and normalized tools for biological and clinical discovery.
As detailed above, protein function microarrays enable broad and unbiased investigations of molecular recognition. Information gained from these studies can be used to map out biophysical and biochemical connections between proteins. To determine how information flows through these networks in a dynamic fashion, however, requires methods to measure the abundances and post-translational modification states of many different proteins in biological samples in high throughput. Protein-detecting microarrays provide this capability (Fig. 1c,d). Since a large number of selective antibodies are commercially available, most studies in this area have relied on antibodies, in conjunction with microarray technology, to profile cellular lysates, tumor biopsies, and human serum.
Perhaps the most frequently used strategy is to prepare microarrays of immobilized antibodies on solid supports, where they function as ‘capture’ reagents (Fig. 1c). Antibody microarrays enable multiplexed analyses of non-denatured proteins with very low reagent and sample consumption. To detect and quantify the captured analytes, a second antibody is often used that recognizes a different epitope on the captured protein. This procedure mimics a sandwich ELISA, but in a multiplexed microarray format. This strategy has been used to track the phosphorylation status of receptor tyrosine kinases following growth factor-stimulation ; to assess changes in the tyrosine phosphorylation state of selected proteins following drug treatment ; to discover cancer-associated glycan variations on specific proteins in the serum of pancreatic cancer patients ; and to identify prognostic markers of early mortality in the serum of patients with end-stage renal disease .
One limitation of the sandwich assay is that it is often difficult to identify matched pairs of capture and detection antibodies. In addition, the detection step requires a cocktail of labeled antibodies. As more antibodies are included in this cocktail, the level of background binding increases, as does the risk of cross-reactive binding. To circumvent these problems, the proteins in the biological samples can be directly labeled using one or more fluorescent dyes or small molecule haptens (Fig. 1c) [36,37]. An advantage of this strategy is that pairs of samples can be compared in a ratiometric fashion using spectrally distinct fluorophores. This approach has been used to discover biomarkers of pancreatic cancer , bladder cancer , and mantle-cell lymphoma . One caveat of this method, however, is that it relies on the uniform labeling of proteins across different samples, which cannot be assured. In addition, direct chemical labeling can modify antigenic epitopes, thereby blocking antibody recognition.
In principle, one can avoid these problems using label-free detection. To date, however, label-free methods are too insensitive for most biological investigations. For example, proof-of-concept experiments using surface plasmon resonance [41,42] and optical interferometry  reported limits of detection in the range of 20–30 ng/mL of target protein. In contrast, fluorescence-based detection methods enable quantification down to 1 pg/mL [35,44]. Future development of label-free detection methods must focus on increasing assay sensitivity.
Although antibody microarrays offer great potential as a tool for quantitative proteomics, proteome-wide analyses are still far from reach. Currently, only a small number of signaling events can be followed reliably using well-validated antibodies. One solution is to combine a variety of experimental approaches. Gaudet et al. used antibody microarrays, in conjunction with kinase activity assays, western blotting, and flow cytometry, to assemble a compendium of ~10,000 measurements of signals and responses in HT-29 cells treated with different combinations of TNFα, EGF, and insulin . This disparate dataset was then used to derive a predictive model of apoptotic signaling . Their integrated approach shows how the high-throughput technology of antibody microarrays can be used when suitable detection reagents are available, and how additional measurements can be added using lower-throughput methods.
As an alternative to preparing microarrays of antibodies, the biological samples themselves can be spotted on nitrocellulose-coated glass substrates and the immobilized proteins detected and quantified using solution-phase antibodies (Fig. 1d). This strategy was initially described by Paweletz et al. in 2001  and is referred to either as a ‘reverse phase protein array’ or as a ‘lysate microarray’. Each microarray contains the entire set of biological samples and reports on the relative levels of one or two proteins. By preparing multiple copies of the array, however, it is possible to follow the abundances and post-translational modification states of many different proteins. One of the advantages of this approach is that the proteins being quantified are denatured and so do not have to remain folded and functional on the solid support. Additionally, protein complexes are disrupted and so do not complicate subsequent analyses.
To date, this method has been used in a clinical setting to follow pro-survival checkpoint proteins as a function of cancer progression  and to discover and validate specific biomarkers for disease diagnosis and patient stratification [48,49]. In addition, lysate microarrays have been used to study the kinetics of intracellular signaling in a system-wide fashion. By tracking 62 phosphorylation sites in stimulated Jurkat T-cells, Chan et al. uncovered a previously unrecognized link between T-cell receptor activation and Raf-1 activity .
Lysate microarray technology has also been adapted to a high-throughput screening format. By treating cells grown in microtiter plates with cell-permeable small molecules, and by arraying the resulting lysates, Sevecka and MacBeath showed that compounds could be screened on the basis of how they perturb the ‘state’ of the cell . Interestingly, they showed that compounds with the same target induced similar states, suggesting that this technology could be used to help identify the targets of bioactive compounds with unknown mechanisms of action.
Although lysate microarray technology holds great promise for quantitative proteomics, its use remains limited due to a lack of well-validated antibodies. In our experience, we have found that the majority of commercially available antibodies exhibit a substantial amount of cross-reactivity when used in this assay, even when the antibody produces a single discernable band of the correct molecular weight on a western blot . Other groups have found that the precise blocking and detection protocol , as well as the composition of the lysis buffer , substantially affects antibody performance. Further development is therefore necessary before this approach is widely adopted.
Protein microarray technology offers a powerful way to dissect the complexity of the proteome. The diversity of applications and the economy of scale make this method ideally suited to the investigation of molecular recognition and to the multiplexed quantification of proteins in complex mixtures. Whereas technical problems concerning protein function microarrays have largely been solved, accurately quantifying proteins in complex mixtures remains a challenge. Antibodies still enjoy their status as the detection reagents of choice, but suffer from cross-reactive binding and therefore require extensive validation.
Overall, the quality of future systems-level investigations will hinge on carefully validated experimental protocols and reagents. Currently, there are no standardized methods for preparing protein-detecting microarrays or widely accepted techniques for analyzing the resulting data. Ensuring stringent standards for protein microarray experiments, and rendering data comparable across different studies, will require a coordinated effort by the scientific community. By far the most pressing need, however, is for a renewable and distributable source of detection reagents. Despite considerable efforts in this area over many years, this remains an unsolved problem. In our opinion, generating this resource will require robust and scalable methods for directed evolution and a well-funded, organized effort by the scientific community. A concerted endeavor of this nature would not only enable microarray-based investigations, but other targeted analyses of proteins in biological contexts as well.
This work was supported by awards from the Arnold and Mabel Beckman Foundation, the W.M. Keck Foundation, and the Camille and Henry Dreyfus Foundation and by grants from the NIH (1 R01 GM072872, 1 R33 CA128726, and 1 R21 CA126720).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Papers of particular interest published within the period of review have been highlighted as:
• of special interest
•• of outstanding interest