Cell culture and transfection
293T cells were maintained at 37°C in a humidified atmosphere of 5% CO2
in air, in Iscove's modified Dulbecco's medium supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin. Transient transfections were performed by standard calcium phosphate precipitation. The plasmid used for expression of S-Tax-GFP has been described previously [18
]. For expression of S-Tax and S-GFP the tax
open reading frame was inserted into the SmaI site of pTriEx4-Neo
(Novagen, Madison, WI). Cells were plated in 150-mm plates at 4 × 106
cells per plate. The following day, 20 μg of plasmid DNA in 2 M CaCl2
and 2X HBS were added drop wise to cells in fresh medium. Cells were incubated at 37°C for 5 h and fresh medium was added. The cells were harvested 48 h later.
Purification of S-fusion proteins
S-Tax-GFP, S-Tax, or S-GFP protein was isolated following a single wash with 1X PBS, in 500 μl M-Per mammalian protein extraction reagent (Pierce, Rockford, IL) supplemented with protease inhibitor cocktail (Roche, Palo Alto, CA) and immediately frozen at -80°C. The cell lysate (2.5 mL) was incubated with 200 μl bed volume of S-protein™ agarose (Novagen, Madison, WI) for 30 min at room temperature as per manufacturer's suggestion. The bound S-tagged protein was then washed 3 times with 1 mL Bind/Wash Buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.1% TritonX-100).
Isolation of Tax-complexes
Freshly prepared S-Tax-GFP or S-GFP beads were washed 3× in incubation buffer (25 mM HEPES, pH 7.5, 150 mM NaCl, 1% NP-40, 10 mM MgCl2, 1 mM EDTA, 1% glycerol) and placed on ice. A working stock of Jurkat nuclear lysate (Active Motif, Carlsbad CA) was prepared by diluting 25 μg lysate to a total volume of 75 μL in incubation buffer. The lysate was pre-cleared by adding 30 μL of S-bead slurry and incubating on ice for 30 minutes with occasional mixing. The pre-cleared slurry was spun down at 2000 g for 3 minutes and the lysate (70 μL) transferred to a fresh 0.5 ml tube containing 10 μL of the S-Tax-GFP or S-GFP protein bound to beads. This slurry was incubated at 4°C for 60 minutes on a shaker. The beads were centrifuged at 2000 g for 3 minutes, lysate removed, and beads washed 1× with 250 μL incubation buffer followed by 4 washes with 250 μL ice cold PBS.
Isolation of endogenous DNA-PK-Tax protein complex
In some cases, S-Tax or S-GFP expression plasmids were transfected into 293T and protein complexes isolated as described above from a single T75 flask. In these experiments no nuclear extracts were added. The protein lysates were subjected to purification on S-beads, 50 μL of sample loading buffer (Bio-Rad, Hercules, CA) with β-mercaptoethanol was added to the S-bead pellet and boiled for 10 min. The whole protein sample that was bound to the S-bead was separated by 4–12% SDS-PAGE and analyzed by Western Blot as described below.
LC-MS/MS of protein complexes
S-Tax-GFP or S-GFP beads were washed 3X with ice cold 50 mM ammonium bicarbonate, pH 8 and subsequently resuspended in 50 μL of 50 mM ammonium bicarbonate, 10% acetonitrile containing 3.12 ng/μL sequencing grade modified trypsin (Promega Corp., Madison, WI). The digest was incubated for 6 hours at 37°C with occasional mixing, transferred to a 0.2 μm centrifuge tube filter and spun at 5000 rpm for 3 minutes. The flow through was recovered and peptides dried in a speed vac. Digests were resuspended in 20 μl Buffer A (5% Acetonitrile, 0.1% Formic Acid, 0.005% heptafluorobutyric acid) and 10 μl were loaded onto a 12-cm × 0.075 mm fused silica capillary column packed with 5 μM diameter C-18 beads (The Nest Group, Southborough, MA) using a N2 pressure vessel at 1100 psi. Peptides were eluted over 300 minutes, by applying a 0–80% linear gradient of Buffer B (95% Acetonitrile, 0.1% Formic Acid, 0.005% HFBA) at a flow rate of 150 μl/min with a pre-column flow splitter resulting in a final flow rate of ~200 nl/min directly into the source. A LTQ™ Linear Ion Trap (ThermoFinnigan, San Jose, CA) was run in an automated collection mode with an instrument method composed of a single segment and 5 data-dependent scan events with a full MS scan followed by 4 MS/MS scans of the highest intensity ions. Normalized collision energy was set at 28%, activation Q was 0.250 with minimum full scan signal intensity at 1 × 105 with no minimum MS2 intensity specified. Dynamic exclusion was turned on utilizing a three minute repeat count of 2 with the mass width set at 1.0 m/z. Protein searches were performed with MASCOT version 2.2.0 v (Matrix Sciences, London GB) using the SwissProt version 51.3 database. Parent ion mass tolerance was set at 1.5 and MS/MS tolerance 0.5 Da.
Total protein concentrations were determined by Protein Assay (Bio-Rad, Hercules, CA). An equal volume of sample loading buffer (Bio-Rad, Hercules, CA) with β-mercaptoethanol was added to the lysate and boiled for 5 min. Samples were normalized to total protein and separated through a 10% SDS-polyacrylamide gel. The proteins were transferred onto Immobilon-P (Millipore, Billerica, MA) membrane using a Trans-blot SD semi-dry transfer cell (Bio-Rad, Hercules, CA) at 400 mA for 50 min. Following blocking in 5% non-fat milk in PBS/0.1% Tween-20, blots were incubated in primary antibody overnight, followed by 1 h incubation in secondary horseradish-peroxidase conjugated anti-mouse or anti-rabbit antibody (Bio-Rad, Hercules, CA). Immunoreactivity was detected via Immunstar enhanced chemiluminescence protein detection (Bio-Rad, Hercules, CA). The following primary antibodies were used in the analysis: mouse monoclonal antibody of DNA-PKcs (Upstate), 1:1000; rabbit polyclonal antibody of Tax, 1:5000; mouse monoclonal antibody of GFP (Santa Cruz), 1: 2000.
Sources of data for in silico analysis
Interaction data were gathered from three types of information sources: manual extraction from Pubmed, laboratory derived physical interactions, and protein interaction databases. In the first database source, the information was extracted by manually searching the Pubmed literature to obtain a list of known Tax binding proteins. The criterion for acceptance in this group was physical verification of binding in the referenced publication. For the second database source, the physical interactions utilized in this study were all derived from the experimental efforts described elsewhere in this article. For the final database source, we queried a human protein interaction database; The Human Protein Reference Database (HPRD) [34
]. The HPRD http://www.hprd.org
contains interactions of proteins in the human proteome manually extracted from the literature by expert biologists who read, interpret and analyze the published data.
Terms and definitions for in silico analysis
For our topological studies of interaction networks, we utilized a novel overlapping clustering approach [23
] that exposes the modular structure of the network. We define bridges as proteins that belong to multiple clusters due to the overlap among them. We also employed centrality measures of networks known as betweenness and closeness. To define these measures, first we need to define some network concepts. The distance of a protein v
from another protein w
is the number of edges in a shortest path between them. The diameter of a network is the maximum distance between any pair of vertices. The average path length of a network is the average distance over all pairs of vertices. The closeness centrality measure for a protein, v
, is the reciprocal of the sum of the distances of v
to all other proteins in the network.
The dependence of a protein s on a protein v is the sum over all proteins t in the network of the ratio of the number of distinct shortest paths between proteins s and t that includes v as an intermediate vertex, and the number of distinct shortest paths between s and t. The betweenness value of a protein v is the sum of the dependence values of all proteins s on the protein v. This is equivalent to the following equation for betweenness.
Here V is the set of proteins in the network. The numerator in the fraction shows the number of distinct shortest paths joining s and t on which v is an intermediate vertex; the denominator is the number of distinct shortest paths joining s and t. Further details on centrality measures are available in [35
As in earlier work [36
], we define hubs
as all proteins that are ranked in the top 20% with respect to degree in the network (the number of interactions a protein is involved in). Similarly bottlenecks
are all the proteins that are ranked in the top 20% of betweenness values. To calculate betweenness values for proteins, we used an algorithm provided by Yu et al. [37
In the clustering approach to be described next, we use the concept of a k-core of a graph. The k-core of a graph is obtained by repeatedly deleting all vertices which are joined to the vertices remaining in the graph by fewer than k edges. This procedure begins by deleting all vertices whose degree is less than k. The deletion of such vertices could decrease the degrees of the remaining vertices. If some of these vertices have degrees less than k, they would be deleted as well. This process is repeated until the subgraph that remains has every vertex with degree at least k; this subgraph is the k-core of the graph. All the deleted vertices belong to the (k-1)-shell. Computing the k-core of a graph helps with denoising the interaction network by removing many false positives, and also reduces the initial size of the network to be clustered. The deleted vertices will be added to the clustering obtained in a subsequent step.
Spectral clustering and modules identification
We now summarize the technique we used for clustering the protein interaction networks [23
]. The protein interaction network is represented by a graph G = (V, E), with the proteins constituting a set of proteins V, and interactions constituting the set of edges E. We obtain clusters in the interaction network by identifying a number of subgraphs of G that have a relatively large number of edges joining vertices in each subgraph and fewer edges to vertices outside the subgraph. We permit these clusters to overlap (have some vertices in common), since proteins have multiple functions and could be involved in more than one biological process.
The details of the clustering algorithm will be described elsewhere, but here we provide an overview. Clusters are obtained by dividing a subgraph at each step into two subgraphs based on the ratio of the number of edges that join vertices in the subgraph to the total number of edges, a measure called the cohesion of the subgraph. Given the initial graph G, we recursively split it into subgraphs until the value of cohesion of a subgraph is above a threshold value, or the subgraph has number of vertices fewer than a threshold size. We have used a spectral algorithm that uses the components of an eigenvector of the Laplacian matrix of the graph to divide each subgraph into two. Once the eigenvector is computed (its components correspond to the vertices of the graph), those vertices whose component values are below some specified value are included in one subgraph and the others belong to the second subgraph. The choice of the value where the split should be made is based on computing the cohesion.
We have found that the overall clustering approach described above needed to be adapted to protein interaction networks, which are small-world and modified power-law networks. Initially we decompose the vertices of the network into three sets; hubs or high degree vertices (those in the top 20% of the degrees); low-shell vertices (vertices not in the 3-core of the network); and the residual sub-network, which forms a 3-core of the network from which the hubs have been removed. We call the last subnetwork as the local network. We have found it advantageous to cluster the local and hub sub-networks separately using the spectral clustering method described above. The clusters from both sub-networks are then merged together if a large number of edges join clusters from the two networks. We check to see if nodes that belong to a cluster are significantly connected to other clusters, and if so, they are included in such clusters as well. The statistical significance of the connections is computed using a p-value based on the hypergeometric distribution. Finally, the low-shell nodes are added to clusters; each such node could be added to none, one, or more than one cluster, based on whether it has a statistically significant number of connections to the clusters that have been found. If a node belongs to three or more clusters, we call it a bridge node.