Cloning of the bait cDNAs and construction of entry clones
Full-length cDNAs encoding the genes for the respective protein baits were either purchased from Invitrogen (www.invitrogen.com
) and the Kazusa project (www.kazusa.or.jp
) or cloned in-house. Established polymerase chain reaction (PCR) methodologies were used to amplify the bait cDNAs from the corresponding parent plasmid DNAs. The oligonucleotide primers used for PCR (four required for each unique bait gene; two 5′-terminal primers, with Kozak code or not, and two 3′-terminal primers, with or without a stop codon) were designed to be complementary to the 5′ and 3′ ends of the bait coding region and to introduce an additional nucleotide sequence (29 bp), corresponding to Gateway att
B recombination sites (Invitrogen), onto the ends of the PCR product. To create Gateway entry vectors, a portion of the purified PCR reaction product was added to the BP Reaction mixture, which contains a donor vector (encoding att
P sites) and the BP CLONASE
mix of recombination proteins. The recombination results in the oriented integration of the att
B flanked PCR product into the att
P sites of the donor vector, generating the Entry Clone in which the bait gene coding region is now flanked by att
L sites (required for the LR Reaction, see below). A portion of the BP Reaction was used to transform competent Escherichia coli
DH5α cells and the Entry Clone plasmid DNA was purified from selected transformants (antibiotic selection) using routine plasmid miniprep protocols (Sigma-Aldrich, www.sigmaaldrich.com
). The integrity of each Entry Clone was verified by PCR amplification using gene-specific primers and DNA sequencing.
Construction of destination vectors
Two Destination Vectors, DV1 and DV2, were constructed based on a vector backbone using standard recombinant DNA methodologies. The Entry Clone and Destination Vector were subjected to the GATEWAY LR Reaction, which contains the LR CLONASE mix of recombination proteins. The LR Reaction results in the directional transfer of the bait gene coding region, flanked by the attL sites in the Entry Clone, to the Destination Vector (DV1 or DV2) through recombination with the attR flanked GATERC, generating the Expression Clone. A portion of the LR Reaction was used to transform competent DH5α cells and the Expression Clone plasmids were purified from selected transformants (antibiotic selection) using routine plasmid miniprep protocols. Following confirmation by PCR with gene-specific primers, milligram quantities of purified Expression Clones were prepared by standard protocols (Maxiprep; Sigma-Aldrich).
Anchorage-dependent human embryonic kidney 293 (HEK293) cells were maintained in Dulbecco's modified Eagle's medium (DMEM) containing 10% fetal bovine serum and supplemented with 2 mM L
-glutamine and 0.1 mM nonessential amino acids. Cells were grown in 10-cm-diameter or 24.5 × 24.5 cm2
tissue culture plates at 37°C in a 5% CO2
atmosphere. Cells were routinely tested for mycoplasma presence. A detailed protocol for the maintenance and passaging of cells is provided in Supplementary Information
A seed culture of HEK293 cells (at 70–80% confluence) was split and plated with fresh media the day before transfection and then grown to 30–40% confluency. Before performing transfection, cell plates were individually verified by microscopy. In particular, we verified that cells were healthy—no large vacuoles, no long extensions, not rounded up, no contamination was present (mould, yeast or bacteria) and less than 5% dead cells. We also confirmed that the plates were approximately 40% confluent. Any plates that did not meet the above criteria were discarded. Typically, approximately 1 × 107
cells were transiently transfected by adding 5 μg of DNA construct in the form of a calcium phosphate/DNA coprecipitation protocol. Briefly, a solution of calcium chloride and maxiprep Expression Clone plasmid DNA was diluted with an inorganic phosphate-containing buffer. The mixture was overlaid on the cells following a brief period to allow the calcium phosphate/DNA precipitate to develop. Cells were incubated at 37°C with the calcium phosphate DNA mixture for 12–16 h, the culture medium was replenished and the cells was cultured a further 24 h to ~90% confluence before harvest. A similar procedure was used to culture HEK293 cells that were transiently transfected with the Destination Vector (no bait gene) in order to provide a negative-control sample. A detailed protocol for the transfection is provided in Supplementary Information
Cell harvest and extract preparation
All methods used during the harvest procedure were performed at 4°C. Following the culture period described above (for each experimental and control culture), the media were removed from the plates by aspiration and the adherent HEK293 cells were washed thoroughly with Tris-buffered saline. Cells were then overlaid with a predetermined volume of detergent-containing lysis buffer (supplemented with a cocktail of protease inhibitors) and then scraped to concurrently dislodge and lyse the cells. Typically, cells were lysed by the addition of (1 ml) of lysis buffer (20 mM Tris–HCl (pH 7.5), 150 mM NaCl, 1 mM EDTA, 1% NP-40, 0.5% sodium deoxycholate, 10 μg/ml aprotinin, 0.2 mM AEBSF (Calbiochem)). The cell lysate was collected and then clarified by preparative centrifugation for 30 min at 20 000 g to yield a crude extract. In all cases, portions of the soluble and insoluble fractions from the centrifugation were separated by SDS–PAGE and immunoblotted with an anti-FLAG® (M2) monoclonal antibody (see below) to verify the bait's presence in the soluble extract fraction.
Immunoprecipitation of bait and bait-specific interacting proteins
The Flag-tagged bait proteins and their interacting partners were isolated from cell extracts by immunoprecipitation using M2-Agarose resin (Sigma-Aldrich). The M2-Agarose comprises the monoclonal anti-Flag M2 antibody immobilized onto an agarose resin and reacts specifically with fusion proteins possessing the Flag epitope at the N- or C-terminus. Briefly, the crude lysate were first incubated with 5 μg of agarose beads for 60 min at 4°C to remove nonspecific binders. The supernatant was then subjected to immunoprecipitation by adding 5 μg of anti-Flag monoclonal antibody covalently attached to crosslinked agarose beads (M2, Sigma). The mixture was gently agitated by inversion for 60 min at 4°C. Immunocomplexes associated with the insoluble fraction were recovered by centrifugation (1000 g for 2 min) and washed by three cycles of resuspension in lysis buffer followed by centrifugation as described above. Immunocomplexes were eluted from the beads by resuspension in 250 μl of 50 mM ammonium bicarbonate (prepared just before to use) containing 400 μM Flag peptide. Following a 30 min incubation, beads were removed by centrifugation and the supernatant containing Flag peptide as well as the eluted proteins was lyophilized.
Gel-based protein analysis
The dried immunopurified proteins were solubilized in a minimal volume of protein-loading buffer and subjected to SDS–PAGE. The immunopurified proteins were then separated by gel electrophoresis and detected by colloidal Coomassie staining. All gels were subjected to a visual appraisal before further processing; gel lanes that contained anomalies such as significant background across the entire lane or a large number of protein bands arising from nonspecific protein precipitation were rejected (approximately 40% of the gels were rejected based on these criteria). Band excision was automatically performed by a robotic system developed in-house and gel bands automatically transferred to a 96-well plate. Post-excision steps were carried out using commercially available automated robotic workstations (ProGest, Genomic Solutions). The proteins contained in the excised gel bands were treated with dithiothreitol (DTT) and the free sulfhydryl groups were alkylated using iodoacetamide. Proteins were then digested with trypsin and the resulting peptides were extracted from the gel slice using a series of wash steps. The extracted peptides were concentrated and analyzed directly by mass spectrometry.
LC-ESI-MS/MS identification of proteins was performed as described previously (Figeys et al, 2001
) using an automated network of mass spectrometers. Tryptic peptides recovered from individual gel bands were separated by reverse-phase chromatography on C18 resin and directly injected into a mass spectrometer. Ion trap mass spectrometers (LCQ Deca, Thermo Finnigan), operated in a data-dependent mode, which produces tandem MS spectra of all peptide species present above a programmed threshold, were used for these experiments.
Additional detailed experimental protocols for cell transfection and passaging of cells are provided in Supplementary Information
Laboratory data were managed using an in-house developed LIMS system that tracks all steps of immunoprecipitation, gel band excision as well as mass spectrometry acquisition names, annotated SDS–PAGE images and QC data. Mass spectrometry acquisition files were stored on a centralized network file system and processed using an automated analysis pipeline, including a cluster of Mascot nodes for peptide and protein identification.
Peptide and protein identification
All spectra were analyzed using Mascot version 1.9 (Matrix Sciences, www.matrixscience.com
) searches against a non-redundant human protein sequence database (122 989 entries), constructed from all major sources of human protein sequences (GenBank, TrEMBL, SwissProt, IPI and Ensembl). Mascot was run in MS/MS Ion search mode with the following parameter settings: fixed modification (carbamidomethyl on cysteine), variable modification (oxidation on methionine), peptide mass tolerance 2 Da, fragment mass tolerance 0.4 Da, maximum missed cleavages two and enzyme trypsin. Peptide and protein identifications were included for further analysis according to the following criteria: for single peptide hit proteins, Mascot ionscore
40; for proteins with multiple peptide hits, each Mascot peptide ionscore
20. (The average Mascot recommended (P
<0.05) ionscore for our data is ~40.) Further assessment of the peptide and protein identification false-positive rates was made by searching a subset (500 gel bands; ~3% of the data) against a randomized (each entry randomly shuffled) sequence database. Using Mascot ionscore thresholds as above, we estimate a protein false-positive rate of <7.5%. Mascot result files were parsed, proteins clustered and all data stored in a relational database. An in-house protein sequence index and annotation system was used to both provide the non-redundant sequence search database and to interpret and analyze the resulting protein hits. Spotfire (www.spotfire.com
), cytoscape (www.cytoscape.org
) softwares and Pathway Studio (Ariadne Genomics) were used extensively for data analysis and interaction map visualization respectively. The PLS regression analysis and generation of interaction confidence score was implemented in custom code using Python (www.python.org
Comparisons to other data sets
Comparisons were made in general by cross-referencing NCBI Gene Ids where possible, or official HUGO gene symbols. For comparison to other protein interaction data sets, computation of statistical significance was carried out by repeatedly randomizing (1000 iterations) the IP-HTMS bait–prey associations and recalculating the interactions in common between the set of randomized interactions and the data set being compared. Minimum, mean and maximum counts of the interactions in common were then calculated from the 1000 trials. Cross-referencing to the inparanoid database (O'Brien et al, 2005
) was performed by downloading all orthologous pairs for Homo sapiens
and then forming paralogous groups of human genes in a simple, single-link fashion. Integration of the gene co-expression compendium (Lee et al, 2004
) was performed by cross-referencing gene symbols.
GO-Slim versions of the Gene Ontology (www.geneontology.org/GO.slims.shtml
) were used to map baits and preys to biological processes and cellular component categories (courtesy of Suparna Mundodi and Amelia Ireland, and MGI, www.spatial.maine.edu/~mdolan/MGI_GO_Slim.html
), respectively. In addition, certain baits were ‘up-propagated' to parent categories where representation was low. Eighty percent of proteins in the interaction network were assigned biological process categories and 77% cellular component categories (55% of interactions were assigned biological process categories for both bait and prey, 33% of interactions were assigned cellular component categories for both bait and prey). Each combination of bait GO category and prey GO category was tested for association by constructing a 2 × 2 contingency table and using the Fisher exact test. Distributions of P
-values from randomly permuted bait–prey categories were characterized as follows. Random permutation of bait–prey category associations (1000 trials) were performed, contingency tables for each bait–prey category combination constructed and the Fisher exact test P
-value calculated. These distributions of 1000 P
-values for each bait–prey category combination were then used to calculate the frequency with which a P
-value less than or equal to the observed non-random P
-value is seen by chance.