|Home | About | Journals | Submit | Contact Us | Français|
Estrogen receptor-α (ER) is an important target both for therapeutic compounds and endocrine disrupting chemicals (EDCs); however, the mechanisms involved in chemical modulation of regulating ER transcriptional activity are inadequately understood. Here, we report the development of a high content analysis-based assay to describe ER activity that uniquely exploits a microscopically visible multicopy integration of an ER-regulated promoter. Through automated single-cell analyses we simultaneously quantified promoter occupancy, recruitment of transcriptional cofactors and large-scale chromatin changes in response to a panel of ER ligands and EDCs. Image-derived multi-parametric data was used to classify a panel of ligand responses at high resolution. We propose this system as a novel technology providing new mechanistic insights into EDC activities in a manner useful for both basic mechanistic studies and drug testing.
Estrogen receptor-α (ER) is a member of the superfamily of nuclear receptor (NR) transcription factors (Nilsson and Gustafsson, 2000). ER elicits its effects via direct binding to specific sequences within DNA, termed estrogen response elements (EREs) (Klein-Hitpass et al., 1988) and also indirectly via interaction with certain transcription factors (e.g., AP1) (Kushner et al., 2000). Found in both reproductive and non-reproductive tissues, ER participates in the gene regulator programs of several important physiological processes including sexual maturation, maintenance of bone density, CNS functions, and is known to be a critical regulatory factor in certain diseases, namely breast cancer and osteoporosis (reviewed(Woolley, 1999; Nilsson and Gustafsson, 2000; Sommer and Fuqua, 2001). For these reasons, ER has become a target for the development of many therapeutic compounds, including estrogen mimetics (e.g., diethystilbestrol (DES)) and selective ER modulators (SERMS; e.g. 4-hydroxy-tamoxifen (4HT) and raloxifene (RAL)). Similar to its cognate ligand 17β-estradiol (E2), these compounds bind ER within the binding pocket of the ligand binding domain (LBD) (Brzozowski et al., 1997; Shiau et al., 1998). X-ray crystallography has revealed that either an E2- or DES-bound LBD results in presentation of a coregulator binding surface on helix-12 of the LBD, whereas binding to either 4HT or RAL does not. These configurations are proposed to regulate the agonist or antagonistic properties of the receptor (Brzozowski et al., 1997; Shiau et al., 1998). In addition to therapeutic compounds, it is becoming increasingly apparent that there are numerous chemicals, both natural and synthetic that can impinge upon ER (and other NR) signaling through similar mechanisms. These endocrine-disrupting chemical (EDCs), including the widely used plasticizer bisphenol-A (BPA), can modulate ER activity at varying exposure levels with potentially wide-reaching medical and environmental consequences (Bromer et al.; Richter et al., 2007). A key challenge for high-throughput drug screening and EDC testing has been to access the complex mechanisms associated with NR functions that are usually studied with a wide range of biochemical assays on large populations of cells. Historically this has involved the use of numerous and diverse assays each providing single point readouts representing population-based response averages. Although informative, these assays have proved limiting in the quest for time- and cost-efficient assessment of drug or EDC effects. To address these issues, we have developed a high throughout, single cell-based assay specifically designed to simultaneously quantify multiple biological features of ER activity using a novel, high content analysis (HCA) approach. This automated systems-level inquiry was used to profile dose-dependent effects of a panel of ER-ligands, including SERMs and EDCs. Compounds were assigned activities based on simultaneously collected quantitative measurements of their ability to affect transcriptional regulation of a microscopically-visible, integrated reporter gene locus (Sharp et al., 2006; Berno et al., 2008). The primary measurements include: 1) ER protein levels and localization, 2) ER targeting to the ERE-rich reporter gene locus, 3) large-scale chromatin modeling, and 4) ER-based recruitment of RNA polymerase II. Quantitative automated image analyses provided the means to classify the effects of compounds based upon their collective effects upon ER functions.
Several distinct events occur during ligand-activation of ER-dependent gene transcription. These include nuclear translocation, binding to DNA response elements within chromatin, recruitment or loss of various cofactors associated with transcriptional activation or repression, and receptor degradation. Each of these events can be quantified by individual biochemical assays based on cell population averages but the ability to simultaneously perform analyses within a cellular context is currently only available through the use of multi-copy promoter array systems combined with high resolution microscopy. We previously described the generation of a mammalian promoter chromosomal array system based on the stable, multi-copy integration of the ER-responsive prolactin promoter-enhancer (PRL-HeLa (Sharp et al., 2006)). Low-throughput semi-automated image analysis of transiently-expressed ER demonstrated the physiological responsiveness of the array to ER ligands E2 and 4HT (Berno et al., 2008). However, transient expression systems are not ideally suited to high-throughput imaging technologies due to inconsistent transfection efficiencies and extremely variable protein expression levels that require extensive post-acquisition filtering to focus analysis on physiologically-relevant cells. It was therefore critical to generate a clonal derivative of the PRL-HeLa cell line with constitutive expression of relatively low levels of GFP-tagged ER. To achieve this goal we used a lentiviral expression system and clonal expansion in the presence of antibiotic selection and 4HT [1nM], which was necessary due to negative effects of ER expression on HeLa cell growth. The clone used in this study (ER:PRL-HeLa) was typically greater than 95% positive for GFP-ER, which was predominantly localized to the nucleus and expressed at 4.3 fold higher levels when compared to endogenous ER in MCF-7 breast cancer cells, as determined by immunofluorescence (Fig. 1a). Similar to our previous studies with transiently transfected ER in PRL-HeLa (Sharp et al., 2006; Berno et al., 2008), we tested the response of the ER:PRL-HeLa cells to compounds known to regulate ER levels and activity. In agreement with published studies (Wijayaratne et al., 1999; Preisler-Mashek et al., 2002) treatment with either E2 or ICI 182780 (ICI) substantially decreased ER expression while 4HT increased ER levels (Fig. 1a). Treatment with 10nM E2 produced rapid recruitment of the constitutively expressed GFP-ER to the PRL-array followed by large-scale chromatin decondensation that is known to be associated with transcriptional activity (Muller et al., 2001; Janicki et al., 2004; Berno et al., 2008) (Supplementary Video 1). During vehicle treatment, the PRL-array did not accumulate ER above nucleoplasmic levels (not shown). Immunolabeling with antibody against RNA polymerase II (Pol II) demonstrated marked recruitment of Pol II to the PRL-array after treatment with E2 [10nM; 30min]; conversely treatment with 4HT [10nM; 30 min] produced condensed arrays associated with transcriptional repression (Berno et al., 2008) that did not recruit Pol II (Fig. 1b).
Automated multiwell plate handling combined with automated image acquisition tools were used to perform cell plating, compound dilution and transfer, and immunofluorescence protocols. These procedures are described in detail within sections 4.3 and 4.4. Figure 2a shows images collected in 4 fluorescent channels and their associated image masks that were created using customized routines in Pipeline Pilot (Basic and Advanced Imaging Collection, Accelrys, San Diego, CA). Image features pertaining to the mean, sum and distribution of the pixel intensities under each of the defined masks (nucleus, cytoplasm and PRL-array) were selected to describe the cellular response to potential estrogenic/anti-estrogenic compounds. A full list of the extracted features is shown in Supplementary table I. Intra-plate Z′ values were calculated for some of the features envisaged to be important in determining mechanisms of compound activity, and in discerning agonist from antagonist-like responses (Fig. 2b). Z′ values >0.5 are considered preferable for cell based assays (Zhang et al., 1999). Indicative of the data quality from the PRL-array model system, intra-plate Z′ values for 4 of these selected features were >0.6. Array Occupancy (proportion of cells with a detectable GFP-ER loaded array) and GFP-ER CV (coefficient of variance for GFP-ER pixel intensities in the nucleus) were both considered effective measures of ER-DNA binding, giving Z scores of 0.91 and 0.71 when comparing vehicle control against E2 or 4HT respectively. Using customized array segmentation routines, described in detail in section 4.6.1, we were able to measure the area of the ER-occupied arrays with sufficient accuracy and consistency to reliably distinguish the larger decondensed arrays resulting from agonist (E2) treatment from the smaller, brighter, condensed array structures resulting from antagonist (4HT) treatment (Z value= 0.61). Pol II loading at the array was similarly effective at discriminating between agonist (E2) and antagonist (4HT) controls with a Z′ value of 0.78. The feature ‘percent nuclear GFP-ER’ failed to give an acceptable Z′ value due to the small magnitude of change in ER nuclear localization following a 30 min exposure to either E2 or 4HT.
Previous single cell-based studies using both the PRL and MMTV array systems have reported time-dependent changes in array recruitment and chromatin condensation state in response to ligand treatment (Muller et al., 2001; Berno et al., 2008; Stavreva et al., 2009). Also, in previous biochemical studies using bulk cell populations and chromatin immunoprecipitation (ChIP), averaged cellular results from E2-treated cells indicated cyclical recruitment and loss of both ER and Pol II at the endogenous pS2 promoter (Metivier et al., 2003). Taken together, these data suggest complex and highly dynamic ER-promoter interactions (Sharp et al., 2006). Thus, we hypothesized that ligand-dependent effects on ER interaction with the PRL-array would be time-sensitive. To test this hypothesis, we employed the high throughput automated assay described above to study the kinetics of ER and Pol II recruitment to the PRL array in response to treatment with low (0.1nM), intermediate (1nM) and high (10nM) doses of E2 and 4HT (Fig. 3). Cells were treated for 5 – 30 min at 5 min intervals and for 60 – 360 min at 60 min intervals. Maximal array occupancy was not reached at low dose for either ligand (Fig 3a). Treatment with an intermediate dose of E2 resulted in maximal array occupancy within 30 min of treatment; however, the same concentration of 4HT produced only ~10% occupancy at 30 min. High dose E2 gave maximal occupancy within ~10 min whereas the 4HT response was again delayed reaching maximal occupancy at 30–60 min post-treatment. Once a maximal level of array occupancy was established for either ligand, this was maintained for up to 6 hours.
We next investigated the effects of either ligand on the kinetics of chromatin decondensation (Fig. 3b), and Pol II loading at the PRL-array (Fig. 3c). Due to the low array occupancy in cells treated with low doses of ligand we analyzed only intermediate and high doses for these features. As predicted from the live imaging studies (Supplementary Video), quantitative measurement of fixed cells showed that E2-treatment produced a gradual decondensation of arrays reaching a peak at ~30 min. From 30 to 60 min, the arrays underwent a partial recondensation resulting in a plateau of ~60% of the maximum response 2 hours post-treatment (Fig. 3b). In cells treated with 4HT, arrays remained small and condensed up to 6 hours after treatment, with no peak at 30 min. Consistent with early studies using the glucocorticoid receptor (GR)-activated MMTV promoter array system, which indicated maintenance of decondensed chromatin required the presence of an elongating transcription factor (Muller et al., 2001), we observed that recruitment of RNA polymerase II to the PRL array followed a similar kinetic and ligand-dependence to chromatin condensation (Fig. 3c). With intermediate and high doses of E2, peak recruitment of Pol II occurred at ~30 min post-treatment. As expected, neither intermediate nor high dose 4HT could induce loading of Pol II to the PRL array, consistent with suppressed reporter gene mRNA accumulation (Berno et al., 2008) and the inability to induce chromatin decondensation. We were able to conclude from these experiments that the magnitude of a response as oppose to the kinetic profile was most affected by compound dose and, most importantly for this study, that a critical time window exists at ~30 min post-treatment for retrieval of the most informative ligand-specific activities. This time window coincides with the previous assignment of maximal transcriptional activity observed in response to E2 treatment (Berno et al., 2008).
Dose response data was generated for a panel of compounds considered received from the Interagency Coordinating Committee on the Validation of Alternative Methods that were considered potential ER ligands. The identities of some compounds were known and others (potential EDCs) remained blinded during testing. From 15 compounds tested, 8 induced array occupancy (Table I). From this group of 8, 4 were known: E2, BPA, 4HT, and raloxifene (RAL); and 4 were blinded, indicated by a * in all figures and later revealed to be 17α-estradiol (17αE2), BPA, Bisphenol-B (BPB) and diethylstilbestrol (DES). We next calculated EC50 values for the 8 compounds based on non-linear curve-fitting of a 10 point dose-response series of array occupancy measurements, n=3 independent experiments (Fig. 4a&b, and Table I). The average calculated EC50 for E2-dependent PRL-array occupancy was 7.7×10−10M. BPA and BPB had the lowest affinity to induce ER occupancy of the PRL-array, DES had an intermediate affinity and the affinities of 17α-E2, RAL and 4HT were not statistically different from E2 (Fig. 4b).
We next investigated the effects of the active compounds on selected image-derived features. The means ± standard error (SE) were calculated from quadruplicate wells treated with the maximum dose used of each compound for the following parameters: i) array occupancy, ii) percent nuclear GFP-ER, iii) GFP-ER array loading, iv) array area and v) Pol II array loading (Fig. 4c–g). This collection of mechanism-oriented results enabled us to compare the effects of blinded EDC compounds to known ER ligands. All compounds induced a similar level of maximal array occupancy (Fig. 4c) and a similar increase in nuclear localization of ER versus vehicle (Fig. 4d). RAL, BPA and BPB, like 4HT treatment produced small, bright arrays with less Pol II recruitment compared to E2, indicated by significantly higher GFP-ER loading (Fig 4e), smaller array area (Fig 4f) and significantly lower Pol II recruitment (Fig 4g) when compared to E2. Responses to 17α-E2 and DES were not significantly different to E2.
We applied statistical learning methods to develop an automated analysis platform for high content screens. We sought to classify compounds as having varying levels of agonist or antagonist responses by training a classifier to recognize responses in a control set. The control set consisted of E2 and 4HT responses to non-saturating (low), saturating (medium), and supersaturating (high) doses across 14 plates. We extracted cell-level features from the control data and averaged these across wells. We then applied stepwise discriminate analysis (SDA) (Jennrich, 1977) to identify features useful in distinguishing between different control groups. To make the feature selection process more robust across plates, we performed 14-fold cross-validation (Kohavi, 1995) around the SDA, splitting data by plate. We then selected the SDA features that appeared in a majority of the folds, indicating their cross-plate robustness. These four features were: 1) Array_GFP-ER_PI-CV, 2) Array_GFP-ER_PI-Variance, 3) Nucleus_GFP-ER_PI-Maximum, and 4) Array_area. Using these four features, we trained a classifier on the control set and applied this classifier to categorize the 8 active compounds at their maximum dose. For a given plate (each with a set of controls), a support vector machine classifier (Cortes and Vapnik, 1995) was trained. RAL, BPA and BPB were classified into either the 4HT-high or -med class, while DES and 17α-E2 were classified into the E2-high or -med class (Table II). The classifier additionally outputs a probability that its decision is correct (Wu et al., 2004), providing a confidence level to each compound classification. Using 14-fold cross-validation on the 14-plate control dataset, we assessed the performance of this framework (Table III). While there was excellent discrimination between E2 and 4HT compound classes, there was some confusion between medium and high doses indicating that saturating and supersaturating doses produced the greatest similarity in cellular responses. Importantly, the four significant features were selected in the each round of cross-validation, indicating that our feature selection method is robust to plate variability.
In order to show that the features listed in Supplementary Table I can define meaningful and unique fingerprints of ligand responses we performed principal component analysis and hierarchical clustering on maximum-dose compound responses from a single plate (Fig. 5). After defining various treatment response datasets using sampling without replacement, we produced an ensemble of cluster trees that were used in the optimization and evaluation of our clustering approach. We defined a simple measure of stability, m, for a reference tree (generated using the complete set of wells to define treatments) as the ratio of the number of ensemble trees that have the same linkage as the reference to the number of ensemble trees. Trees with higher m are more stable than trees with lower m. We found that selecting PCA features that capture the top 80% of variance coupled with Ward’s method (compared to centroid, complete, median, and single linking approaches) produced the reference tree (Fig. 5, green lines) having the highest robustness with a m=0.46). We also used the ensemble of trees to determine the conditional probability of pairings of different compound groups (Fig. 5, blue text). Hierarchical clustering robustly grouped the treatments into two major response families: one containing the known antagonist (4HT) and the other containing the known agonist (E2). BPA fell into the grouping with 4HT while exhibiting a signature distinct from RAL and/or 4HT. BPB typically clustered with RAL and 4HT, but also grouped notably with BPA (not shown). DES and 17α-E2 clustered together within the agonist response group.
The overall goal of creating a single cell-based model to investigate ER function at a systems level has led to the development of the current in vitro screening system that generates mechanism-rich data using a platform compatible with large-scale screening applications. In this study, we describe the generation and characterization of the ER:PRL-HeLa cell line and the development of associated high content imaging and analysis capabilities. Optimal conditions for comparison of ligand responses were determined to be 30 min post-treatment based on agonist (E2) and antagonist (4HT) control responses). This time point is consistent with maximal transcriptional activity in response to E2 as determined in a previous less automated study (Berno et al., 2008). From a semi-blinded ER- and EDC-specific compound panel, we identified 8 compounds that were able to induce rapid ER binding to the PRL-HeLa array (within 30 min): E2, 17αE2, 4HT, RAL, DES, BPA, BPA and BPB. We determined their EC50 for inducing binding of ER to the PRL-array and characterized their effects on ER nuclear localization, ER induced chromatin remodeling, ER promoter loading and recruitment of Pol II to the promoter array. Further, we used approximately 100 image-based features in both a robust SDA-based classification schema and a PCA-based hierarchical clustering approach to group similar responses, creating a classification framework for high content screening. Using compound classification based on HCA we were able to clearly distinguish agonist (E2) from antagonist (4HT) responses. The classifier assigned 4HT and RAL to the 4HT-like antagonist group with high confidence and these compounds also robustly clustered together. E2, 17α-E2 and DES were classified as having E2-like responses and these compounds consistently clustered together and separate from RAL and 4HT. The classifier assigned both BPA and BPB to the antagonist class with high confidence; however, hierarchical clustering carried out on the PCA-derived data was able to clearly distinguish the BPA and BPB responses from 4HT and RAL. Data presented in this study is consistent with the previously proposed hypothesis that xenoestrogens can affect ER function via multiple mechanisms (Safe et al., 2001) and that BPA can behave antagonistically in certain cell systems (Gould et al., 1998; Yoon et al., 2000). Lack of induction of ER promoter occupancy in response to either vinclozolin or flavone is consistent with previous studies indicating vinclozolin does not activate ERα-dependent gene transcription in some systems (Sonneveld et al., 2005) (Kojima et al., 2004) and that the estrogenic/anti-estrogenic effect of flavone can be attributed to indirect mechanisms (Collins-Burow et al., 2000) (Frigo et al., 2002).
A screening system for ER ligands and EDCs that offers important and early insights into potential mechanisms of action has obvious potential for scientific and logistical/economic benefits. This is particularly critical in the case of ER because it is a common EDC target and also an important drugable target for the treatment of breast cancer and conditions prevalent in post-menopausal women, including osteoporosis (Dix and Jordan, 1979) (Delmas et al., 1997) (Tice, 1978). Significant advances to ER ligand-screening technologies would therefore have potentially far-reaching health benefits. While we (Szafran et al., 2008; Szafran et al., 2009; Hartig et al., 2010) and others (Perlman et al., 2004; Young et al., 2008) (reviewed (Feng et al., 2009)) have applied HCA to compound testing of NR and other biologies, our unique exploitation of the ER-dependent integrated promoter array contributes considerable new functional data to this field. Utilization of a large panel of antibodies to nuclear receptor coregulators and other transcription-associated factors is currently in progress to improve the mechanistic profiling of ER functions at the single cell level.
Chemicals used in assay development/testing were obtained from the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) via G. Bittner (Plastipure, Austin, TX). were received in solid form and 1μM stock solutions prepared in 100% ethanol. Other chemicals were obtained from Sigma-Aldrich (St. Louis, MO) unless stated otherwise.
The ER:PRL-HeLa cell line was generated by stable introduction of GFP-tagged ERα into the previously described PRL-HeLa cell line (Sharp et al., 2006). Briefly, HeLa cells were cotransfected with p52X-PRL-DS-Red2-SKL and pTKHygro (Clontech), and selection was carried out in 400μg/ml hygromycin. Transient cotransfection with GFP-ER and GFP-Pit-1 resulted in the presence of GFP signal in the nucleus with either 1 or 2 bright intranuclear foci of fluorescence that indicated the integration of the p52X-PRL-dsRED2-SKL plasmid (confirmed by DNA FISH (Sharp et al., 2006). The clone HeLa/52X-DM66-Red2-PTS-23(#19) was used to generate the ER:PRL-HeLa cells. In contrast to the previously described HeLa/52X-DM66-Red2-PTS-23 (Sharp et al., 2006), clone #19 exhibits a single foci in the presence of GFP-ER and has lower basal activity in the absence of hormone (Amazit et al., 2007). GFP-hERα was amplified by PCR from the hERα-C1-EGFP plasmid (Berno et al., 2008) by PCR using the following forward and reverse primers: 3′-C ACC ATG GTG AGC AAG GGC-5′ and 3′-CTA GAC TGT GGC AGG GAA ACC CTC-5′. The resulting PCR product was cloned into pLenti6/V5 using directional TOPO cloning system (Invitrogen). The full insert was sequenced in pLenti6/V5 and shown to encode EGFP in frame with full-length hERα. Virus production was performed according to manufacturer’s guidelines (Invitrogen) using 293FT cells. Transduced clones were selected using 0.8μg/ml blasticidin and single cell cloned using flow-assisted cell sorting based on GFP-fluorescence. As initial GFP-ER-positive clones were growth-limited, single cell clones were expanded in phenol red-free DMEM, with 10%FBS, 0.8ug/ml blasticidin and 1nM 4HT. ER:PRL-HeLa cells were subsequently maintained in phenol red-free DMEM+ L-glutamine and Na+ pyruvate supplemented with 10% FBS (Gemini Bio-Products) 200ug/ml hygromycin, 0.8μg/ml blasticidin, and 1nM 4-hydroxytamoxifen (Sigma).
When ER:PRL-HeLa cells are co-cultured with MCF-7 breast cancer cells, MCF-7 cells were routinely maintained in IMEM, supplemented with 10%FBS and penicillin/streptomycin. GFP ER:PRL-HeLa cells were co-cultured with MCF7 cells on poly-D-lysine coated glass coverslips in phenol red-free DMEM supplemented with 5% charcoal-stripped and dialyzed FBS for 24 hours before fixation and immunolabeling with antibody against ERα (rabbit monoclonal ER60C, Millipore 04–820). Cells were plated directly onto poly D-lysine coated glass coverslips for high-resolution deconvolution imaging. Multiwell plate preparation is described below.
ER:PRL-HeLa cells (passage 6–12) grown in T75 cell culture flasks were trypsinized, spun down and resuspended in phenol red-free DMEM with L-glutamine and Na+ pyruvate supplemented with 5% charcoal stripped, dialyzed (SD)-FBS (Experiment Media) to a density of 1.2×10^5 cells per ml. A TiterTek Multidrop 384 fluid dispensing unit was used to dispense ~3600 cells per well into 384-well plates (384 IQ-EB black/clear, Aurora Biotechnologies). Cells were grown in the absence of 4HT for 48 hours prior to compound treatments. Serial dilution of compounds and addition to the multi-well plates was performed using a Beckman Biomek NX robotic platform. Antibody labeling was performed as described previously (Stenoien et al., 2000) using 4% formaldehyde fixation (20 minutes, room temperature) and indirect labeling with anti-mouse/rabbit Alexa-546 or anti-mouse/rabbit Alexa-647 conjugated secondary antibodies (Molecular Probes). Standard protocol details can be found in section 4.4. All liquid handling steps were carried out using a Beckman Biomek NX robotic platform.
Cells were washed once in PBS (with Ca2+, Mg2+) and fixed for 20 min at room temperature in 4% formaldehyde prepared in PEM buffer (80mM potassium PIPES, pH 6.8, 5mM EGTA, 2mM MgCl2). This was followed by a 10 min quench (0.1M ammonium chloride for 10min) and a 30 min permeabilization using 0.5% Triton-X. Blocking was performed in Blotto (5% milk prepared in 1XTBS-Tween-20) for 15 min and the cells were incubated overnight at 4°C in primary antibody (mouse anti-RNA polymerase II, AbCam ab5408) diluted in Blotto. The following day the cells were washed three times in Blotto for 10 min each and incubated with secondary antibody for 1 hour at room temperature. The cells were then washed an additional 3 times in PEM and incubated for 10 min in a solution of 1μg/ml DAPI and 1μg/ml of CellMask-FarRED (Molecular Probes) in PEM. In multiwell plates this solution was replaced with PBS + 0.02% Sodium azide for imaging. Coverslips were mounted in Slow Fade gold (Molecular Probes).
Automated imaging was carried out using either the Cell Lab IC-100 Image Cytometer (IC100; Beckman Coulter) or the DeltaVision core system (Applied Precision) with automated stage (DV Live). The IC-100 system consisted of a Nikon Eclipse TE2000-U Inverted Microscope (Nikon; Melville, NY) with Chroma 82000 triple band filter set (Chroma; Brattleboro, VT), a Hamamatsu ORCA-ER Digital CCD camera (Hamamatsu; Bridgewater, NJ) and a Photonics COHU Progressive scan focusing camera (Photonics; Oxford, MA). This was equipped with a Nikon S Fluor 40×/0.90NA objective and the imaging camera was set to capture 8 bit images at 1×1 binning (1344×1024 pixels; 6.5 μm pixel size). The DV Live system consisted of an Olympus IX71 microscope with a 250W xenon light, photometrics Coolsnap HQ2 camera, standard filter set with multiband dichroic beamsplitter and individual excitation and emission filters (DAPI ex 360/40nm em457/40nm FITC ex 490/20nm em 526/38nm TRITC ex 555/28nm em 617/63nm. This was equipped with 40X Plan Apo/0.95 air gap objective with correction collar. In either case we collected 8–12 fields per well in 4 channels: blue, green, red and far red. For GFP-ER and Pol II a stack of 6 focal planes were collected at 1μM intervals. High-resolution fluorescence deconvolution microscopy was performed with a DeltaVison Restoration Microscopy System (Applied Precision Inc.). Cells were imaged using a 60X objective lens (1.42 NA). A Z-series of focal planes (~30 at 0.2μM) were digitally imaged and deconvolved with the DeltaVision constrained iterative algorithm to generate high-resolution images.
Unless stated otherwise images were analyzed using Pipeline Pilot Version 7.5 (Accelrys) analysis software. Maximum intensity projections were created for Ch01 (GFP) and Ch02 (antibody) and all images were corrected to remove background. Nuclei were identified using Ch00 (DAPI) images to create masks by applying adaptive thresholding followed by marker-based watershed. Total cell area was determined using the nucleus mask regions as markers to apply a watershed on Ch03 (CellMask). Cell cytoplasm was determined by subtracting the nuclear masked region from the whole cell mask.
In order to accurately segment the PRL-array a linear filter and a Top Hat operator were applied to the Ch01 image to enhance only the dim arrays, and all arrays respectively. Images exhibiting only dim arrays or only bright arrays were used to train 3 k-means classifiers. The first classifier (DimFilt) was trained on the linear filtered Ch01 images of only dim arrays (E2-treated). The second and third classifiers were trained on the Top Hat processed images of dim and bright arrays (E2 and 4HT-treated), respectively (DimTH and BrightTH). Once the training is completed, all three classifiers are always applied. Small regions of less than 8 pixels in area are then removed. The DimFilt classifier is sensitive and accurately estimates the area of an array but is prone to false positives. The DimTH classifier has a low false positive rate, but can underestimate the area of the array. Therefore, these two classifiers are combined using Morphological Reconstruction, whereby arrays detected by DimTH are used as markers for reconstruction of arrays detected by DimFilt. This way, the area of detected arrays is accurately determined, while the false positive arrays are removed. BrightTH accurately detects bright arrays, missing most of the dim arrays; however, the halo of out of focus light often found around bright arrays, is often picked up by the Dim Filt classifier. Therefore, a final step is applied to remove this halo using Morphological Reconstruction of dim arrays using bright arrays as markers and removing those reconstructed dim arrays from the final array segmentation mask.
Cell populations were filtered to achieve a uniform population of cells without cell aggregates, mitotic cells, apoptotic cells, and cellular debris. Applied gates were based upon nuclear area, nuclear circularity and the ratio of nuclear to cell area (cell size ratio). Outlier filtering (99% acceptance based on a Gaussian distribution) was also performed based on the mean nuclear Ch01 (GFP) and Ch02 (antibody staining) signal per cell. Typically, 20–40 cells per field were kept for analysis after filtering. Cell-level features were then averaged across wells, producing well-level features that were used in subsequent high content analysis.
Where indicated, the mean fluorescence intensity per the nuclei was obtained using Cyteseer automated cell image analysis software (Vala Sciences, San Diego) exploiting algorithms for automated analysis of protein expression.
Assay quality was established using the Z′, a dimensionless measurement determined using the following equation:
where σ represents the standard deviation of both positive and negative control and μ represents the mean of the populations (Zhang et al., 1999). One way ANOVA followed by post-hoc Dunnets comparison to the positive control (E2) was used to ascertain significant differences between compound responses. EC50 calculations were carried out in GraphPad prism using variable slope model, where Y=Bottom + (Top-Bottom)/(1+10^((LogEC50-X)*HillSlope)). Constraints were set to >0 and <1 for bottom and top values respectively.
A glossary of mathematical terms used in this section is provided in section 4.8.4
We used N-fold cross-validation1 on the control data to identify a set of features useful for distinguishing between low, medium and high dose E2 and 4HT treatments. Each plate in the control set was assigned to a fold. In cross-validation, a classifier is trained on (N – 1) plates (training set) and evaluated on the remaining group (testing set). This is repeated until each of the N plates has been tested once. For each round of cross-validation we scaled the features by subtracting the mean of the training set features then dividing by the standard deviation of the training set features and normalized by dividing each sample by its L2-norm. We then performed stepwise discriminant analysis2 (SDA) on the training set to remove less informative features (Jennrich, 1977). Using the N sets of SDA-selected features (obtained from the cross-validation runs), we selected features that appear in a majority of the runs. This was implemented in Python 2.6 using a port of the SLIC toolbox (http://pslid.cbi.cmu.edu/release/).
A radius basis function-kernel3 support vector machine (SVM) classifier4 was trained on control data using selected features (Cortes and Vapnik, 1995). Its parameters, C (slack penalty) and g (kernel parameter), were tuned with a grid search. Once a classifier was trained, it was applied to the testing data, yielding probabilities of a sample belonging to one of the classes the classifier was trained to recognize (these probabilities are determined by the distance between the sample and the classifier’s decision boundaries) (Wu et al., 2004). After all samples were tested, these probabilities were used to assess classification accuracy. This was implemented in Python 2.6 using the LIBSVM 2.9 toolbox (http://www.csie.ntu.edu.tw/cjlin/libsvm/).
Features were scaled and samples normalized using the method described above. Principal component analysis5 (PCA) was applied to these features, and the number of resulting components was determined by the percent of dataset variance they captured. Data were clustered using the Euclidean distance6 with various linkage algorithms (centroid, complete, median single, Ward’s methods). A resampling approach was employed to find the variance and linking algorithm that produced the best clustering. Since each treatment was run in quadruplicate, we randomly sampled (without replacement) three wells per treatment. From this subset we defined treatments by the median of these triplicates, and then performed feature standardization and normalization, PCA, and clustering. This was done 2000 times to produce an ensemble of trees. From this ensemble we computed conditional probability tables that describe the probability of two groups of compounds (including single compounds) clustering together given all other existing groups.
A technique used to evaluate classifier performance and tune classifier parameters. Validation data is split into N-folds. (N-1) of these folds are used to define a training set, while the Nth fold is used for testing. The process is run a total of N times such that each of the folds has been used in testing.
An iterative approach using feature removal and replacement to select features that are most informative in discriminating between given classes. For more, see (Jenrich et al., 1977; Huang et al., 2003)
A transformation to project features from a non-linear to a linear space so that these features are suitable for support vector machine classification.
A supervised learning approach used to define a decision boundary between classes of data. For two classes of data, with samples represented by N features, an (N-1)-dimensional hyperplane is defined such that it maximizes the margin (distance) between the plane and the nearest samples from both classes. For M classes of data, an ensemble of M*(M−1)/2 pair-wise classifiers can be produced, and some voting method can be applied across this ensemble to produce a classification label for a test sample. One property of SVM classification is that it allows the classifier to handle noisy data by allowing misclassifications during training. The parameter controlling this is the slack penalty. Another property of SVMs is that they define linear hyperplanes. To deal with non-linear data, features can be projected into different spaces (to linearize them) using various transformations, one of which is a radial basis function (Cortes and Vapnikm, 1995).
A feature reduction method in which features are projected into orthogonal components. The first component contains the highest variance, and subsequent components capture less. Many of the lower ranked components can be discarded under the presumption that they contain little useful information.
The geometric distance between two samples. For two samples, S1 and S2, with N features, this is
ER:PRL-HeLa cells were lysed in Cell Extraction Buffer (Biosource, Invitrogen) + complete protease inhibitor cocktail (Roche) for 10 min and the debris was cleared by centrifugation at 13,400 × g for 15 min at 4°C. The samples were resolved by SDS PAGE and transferred to nitrocellulose membranes (Bio-Rad). Primary antibodies (ER, Millipore 04–820 and actin, Affinity MA1-744) were diluted in TBS-T buffer (5% non-fat dry milk, 50mM Tris-HCl, 150 8mM NaCl [pH 7.5], 0.1%Tween 20) and added to the membranes overnight at 4°C followed by incubation with the appropriate horseradish peroxidase-conjugated secondary antibody for 1 hour at room temperature. All proteins were detected with ECL Plus Detection Reagents (Amersham) and visualized by chemiluminescence.
ER:PRL-HeLa were plated onto 35-ml Delta T dishes (Bioptechs) pre-coated with Poly-D-lysine for live cell imaging. Imaging was performed with a Zeiss LSM 510 confocal microscope using a 63x objective (NA=1.4). HEPES-buffered media previously gassed in a 5% CO2 incubator was used to replace the existing growth media. Delta T dishes (Bioptechs) were secured to a stage adapter for temperature control at 37°C (±0.1 degree). A bioptechs objective-heating collar was also used (also 37°C). Hormone was applied to the cells and the DeltaT dish covered with a black plastic lid to minimize evaporation.
Excellent technical support was provided by MG Mancini, and generous image analysis and data workflow support provided by TJ Moran (Accelrys, Inc). This work was funded by NIH 5R01DK055622 (MAM), The Susan Komen Foundation KG091198 (FJA), Department of Defense (MAM), NIH K12-DK0830140-02, DJL P.I. (JYN), Keck Foundation (EDJ) and pilot grant and equipment support from the John S. Dunn Gulf Coast Consortium for Chemical Genomics (MAM). The authors imaging resources were supported by SCCPR U54 HD-007495 (BW O’Malley), P30 DK-56338 (MK Estes), P30 CA-125123 (CK Osborne), and the Dan L. Duncan Cancer Center of Baylor College of Medicine.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.