A cross-linker by definition is simply a chemical reagent with two or more reactive groups connected by a spacer or linker region. The selection of the reactive groups depends on the target molecules to be cross-linked. For the purposes of labeling proteins, the reactive ends can be amine-, sulfhydryl-, or photo-reactive. The length of the spacer chain is often used as a “ruler” for estimating the distance of the two linked residues, providing approximate topological information on proteins or protein complexes. Many structural variants of the basic, conventional cross-linker (two reactive groups and a simple spacer chain) are commercially available. Pierce Biotechnology (Rockford, IL) has been a leading source in this area and provides a wide variety of cross-linkers and technical publications with a highly useful summary and guidance on cross-linker selection.
32,33Although in principle many reactive groups such as maleimides, diazoacetate-esters, or azides can be implemented in cross-linkers,
N-hydroxysuccinimide (NHS) esters have been the most frequently-used reactive group
27 due to multiple factors. First, NHS esters form stable amide linkage to the primary amines in proteins at physiological condition (pH 7.0—7.5), which is important for
in vivo labeling. Second, the most common targets for labeling proteins are the primary amine groups, which are present on the large majority of proteins due to the high frequency of lysine residues in most proteins. Third, the NHS ester reaction has very rapid reaction rates with primary amines.
34 Our studies
35 showed that a 5 min reaction time was sufficient for completion of labeling of intact cells at pH near 7. The fast reaction rate is critical for
in vivo experiments since it can quickly “snap-shot” or “freeze” the protein complexes with stable covalent bonds before living cells are extensively perturbed. Finally, lysine or N-terminus modification can be used as additional search constraint with software tools. In that regard however, it is worth noting that recently several research groups have observed that NHS esters do not react exclusively with primary amines and other groups including serines, tyrosines, and threonines also show significant reactivity.
36-39 The Sinz research group reported that about 12.5% of overall cross-linker containing products resulted from serine labeling, 4.3% from tyrosine, and 3% from threonine when BS
3 and BS
2G were used as cross-linkers.
38 Therefore, cross-link assignment using software tools should also take into account possible modifications of Ser, Tyr, and Thr residues.
Since the span of the cross-linker has been used for providing topological measurement for the low-resolution studies of protein three-dimensional structures, small-sized or shorter cross-linkers are often considered to be useful for obtaining structural constraint information. However, our previous studies using a PIR cross-linker with the calculated maximum length of 43 Å showed identification of a cross-link between the two lysine residues in the distance of 14 Å in ribonuclease S.
40 Furthermore, this same cross-link was also reported later by others using the smaller-sized cross-linkers.
41,42 Our recent studies
43 using PIR cross-linkers for labeling several protein standards resulted in identification of many intra-protein cross-links which were reported by others using DSS, a small cross-linker with 11 Å spacer chain.
43,44 These results suggest that in solution, the cross-linkers are constrained to give rise to a shorter distance between the two reactive groups than the fully extended length, which agrees with the simulated estimates of the realistic lengths of 32 commercial cross-linkers in solution reported by Green
et al.45 Finally, it is also worth considering that within the cellular environment, many potential labeling sites that would allow identification of interactions span a wide range of distances. Therefore, flexibility in the cross-linker structure that can allow labeling sites over a wider range of distances is most critical to enable identification of larger numbers of interactions. The latter point highlights a significant difference in the overall goals for cross-linker designs that target protein interactions and those of more conventional cross-linking efforts that aim to provide actual distance constraints on purified proteins and complexes. However, once identified as an interaction that exists in cells, these complexes can be studied for more detailed structural information using a wide variety of conventional molecular biology,
46,47 analytical techniques such as co-crystallization, and computational methods.
48,49 For example, site-directed mutagenesis has been employed by Wells, to map critical structural features for selected interactions
in vivo and identify “hot spots” for these selected interactions.
50 Thus, chemical cross-linking can be considered as potentially fulfilling two separate and distinct needs in proteomics research: protein interaction network identification and protein and complex structural measurements. Both are important and both represent unique opportunities to provide critical information on biological systems.
For cross-linking application and protein interaction studies, this concept can be further considered as two distinct areas () since the challenges associated and potential utility from either area vary greatly. The first involves the identification of interactions with a given specific protein target. The second involves the large-scale determination of interacting proteins. Chemical cross-linking coupled with the IP approach has been consistently employed for targeted interactions for several years. The observation of higher molecular weight bands in the SDS-gel image after cross-linking and IP often suggests the detection of interaction. For identification of these interactions, the analysis and identification of two or more unique tryptic peptides is generally sufficient for protein identification
51,52 as in other approaches such as co-IP or TAP-tag. In fact, a significant advantage of cross-linking approach over other techniques is that interactions can be stabilized with covalent bonds in cells or under conditions more likely to preserve native, weak, and transient interactions prior to cell lysis; thus identified interactions may be more physiologically relevant. This approach is highly useful for membrane protein interactions since effective solubilization of these species normally requires conditions that are likely non-conducive to preservation of protein interactions. In principle, cross-linking methods can be applied on a large scale without the need of IP, since the detection of the cross-link itself is sufficient to indicate interactions. Importantly, the identification of cross-linked peptides not only suggests protein interactions but also provides the information on
how proteins interact
in vivo, i.e., provides critical data on the sites of close proximity of X and Y. In this regard, largescale protein interaction studies that are based on identification of cross-linked peptide pairs resultant from reactive protein sites with close proximity during cross-linking reaction (either within a single protein or two proteins that form complex) are highly specific. In contrast, the IP approach is susceptible to detection of non-specific binding of abundant proteins to the target protein, antibody and solid support. As such, cross-linking offers a complimentary approach to conventional co-IP studies and potentially may yield increased specificity of the detected interactions. Furthermore, highly abundant proteins that are labeled during cross-linking reaction and compose a majority of the detected cross-linked peptides during a single analysis can be removed by affinity selection performed with cross-linked samples, allowing detection of less abundant cross-linked peptides and greater dynamic range. Finally, structural models can utilize these data to enable improved comprehension of identified interactions and identify factors that cause binding or increase binding specificity. For interactions that give rise to undesirable functional consequences (
e.g., release of inflammatory cytokines upon ligand/receptor binding) and thus, might be targeted for disruption or blocking, identification of interacting regions can conceptually lead to more rapid development of molecules that can inhibit binding with desired specificity. In this regard, the actual interfacial region is impossible to retrieve with all other large-scale protein interaction technology, and yet, is perhaps the most critical piece of information related to protein interactions in biological systems.
The challenges associated with the identification of the cross-linked peptides have prevented large-scale application. To improve the capability of cross-link identification, there have been two primary strategies,
i.e., reducing the sample complexity through enrichment of cross-linked peptides and reducing the data complexity by introducing signature pattern in cross-link containing spectra. For the purpose of reducing sample complexity and data complexity, much work has centered on modification of the spacer chain of the cross-linker, which includes incorporating an affinity tag,
53-58 differential isotope label,
41,59-62 chemically cleavable bonds,
61,63-65 and mass spectrometry-cleavable bonds.
40,42,66-69 Implementation of isotope patterns in the data has also been achieved through incorporation of isotope labels in peptides
via proteolytic digestion using
18O water
56,70,71 or in proteins with
15N labeling.
72 Recently, the Brodbelt group employed an IR chromophore group in the spacer chain and demonstrated that cross-linked peptides were more prone to IRMPD fragmentation thus allowing peptide identification from unmodified product ions.
73 Additionally, several groups have initiated efforts on informatics software development to identify cross-linked peptides with modified database search strategies.
41,74,75Despite significant advancement in cross-linking technology, further proteome-wide
in vivo application of cross-linking strategies for studying protein interactions with topology information has remained challenging due to the complexity of both the database and the possibility of multiple cross-link product types. For example, a digest mixture with n peptides can lead to
n2 possible cross-linked products and the desired inter-cross-linked peptides are less than 0.1% of the total theoretical peptide combinations.
41 With proteome-wide consideration, this becomes intractable for most interesting biological organisms in the absence of additional information that can help decipher reaction product type and protein identity.
12 Furthermore, a living organism is a dynamically changing system, in which many protein–protein interactions are transient both in time and space. As a consequence, mapping protein–protein interactions and their interaction topology in the native cellular milieu will provide most relevant and crucial information for understanding biological function. Therefore, a method that is well-suited to the challenging task of large-scale determination of protein interaction networks will likely benefit from some type of reporter molecules that can infiltrate living systems and provide analytically-useful feedback to allow visualization of these networks.