|Home | About | Journals | Submit | Contact Us | Français|
A previous clinical trial showed that selenium supplementation significantly reduced the incidence of prostate cancer. We report here a bioinformatics approach to gain new insights into selenium molecular targets that might be relevant to prostate cancer chemoprevention.
We first performed data mining analysis to identify genes which are consistently dysregulated in prostate cancer using published datasets from gene expression profiling of clinical prostate specimens. We then devised a method to systematically analyze three selenium microarray datasets from the LNCaP human prostate cancer cells, and to match the analysis to the cohort of genes implicated in prostate carcinogenesis. Moreover, we compared the selenium datasets with two datasets obtained from expression profiling of androgen-stimulated LNCaP cells.
We found that selenium reverses the expression of genes implicated in prostate carcinogenesis. In addition, we found that selenium could counteract the effect of androgen on the expression of a subset obtained from androgen-regulated genes.
The above information provides us with a treasure of new clues to investigate the mechanism of selenium chemoprevention of prostate cancer. Furthermore, these selenium target genes could also serve as biomarkers in future clinical trials to gauge the efficacy of selenium intervention.
Supplementation with a nutritional dose of selenium was found to reduce prostate cancer incidence by 50% in a randomized, placebo-controlled cancer prevention trial (1–3). Prostate cancer was actually a secondary endpoint in this study, which was designed originally to evaluate the effect of selenium on non-melanoma skin cancer. We reported previously that a selenium metabolite, in the form of methylseleninic acid or MSA, suppressed the growth of both the androgen-responsive LNCaP and the androgen-refractory PC-3 human prostate cancer cells (4,5). Growth inhibition by MSA was time- and dose-dependent, with an IC50 of ~10μM at 48 hours of treatment. In order to identify the molecular alterations that might be responsible for the growth inhibitory effect of selenium, we profiled gene expression changes in PC-3 cells using the Affymetrix 12K-gene oligonucleotide chip (4,5). Several working hypotheses have been generated from this dataset regarding the mechanisms of selenium action (4,5). In the present study, we completed a similar selenium array analysis in the androgen-responsive LNCaP cells using a 3K custom cDNA array. The smaller array is expected to improve the sensitivity of the assay, although the advantage is compromised by the reduced size of the dataset. Recently, Zhao et al. also performed microarray analysis in MSA-treated LNCaP cells using a high-density cDNA array (6). Our goal was to make use of these three selenium datasets and develop a global data mining strategy to earmark putative prostate cancer genes which are sensitive to selenium intervention.
Our approach was to compare the selenium datasets to three recently published prostate cancer microarray datasets generated from human tumor specimen. The first was an Affymetrix oligonucletide array study in 50 normal and 52 prostate cancers reported by Singh et al. (7). The second, described by Welsh et al. (8), was similar to the first with the exception that fewer samples were examined (9 normal and 25 prostate cancers). The third was an analysis of 41 normal and 62 prostate cancers by Lapointe et al. (9) using a 26K-gene cDNA microarray. These three prostate cancer datasets offer a rich source of information of dysregulated genes implicated in prostate carcinogenesis.
Androgen receptor (AR) signaling is known to play an important role in promoting prostate cancer progression (10). Consequently disruption of AR signaling is an effective means of prostate cancer management. We newly reported that selenium is capable of decreasing the expression and transactivating activity of AR (4). This novel finding underlies the justification of applying microarray analysis to investigate whether the expression of AR-regulated genes might be counteracted by selenium. Recent events have made this query possible. In separate studies by DePrimo et al. (11) and Nelson et al. (12), LNCaP cells were treated with a synthetic androgen and microarray analyses were then performed to identify genes responsive to androgen stimulation. These two androgen datasets are well suited to serve as a tool to mine the selenium datasets for additional clues. Collectively, the timely publication of a number of prostate cancer and androgen microarrays in the past two years provides an opportunity and sets the stage for the present effort to advance our understanding of selenium chemoprevention of prostate cancer.
The culture conditions of LNCaP cells have been described in detail previously (4). After exposure to 10 μM MSA for 3, 6, 12, 24, 36, or 48 h, total RNA and protein were isolated using TRIzol (Invitrogen, Carlsbad, CA, USA). The RNA collected from three independent experiments was pooled and subjected to microarray analysis using a 3K human cDNA microarray printed at the Microarray and Genomics Core Facility at Roswell Park Cancer Institute. This custom cDNA array was constructed based on the genes which were found to be modulated by selenium in PC-3 cells from our previous study (5). Each gene on this array was spotted in triplicate. Probe generation and array hybridization were conducted according to a protocol developed by the Core Facility (http://microarrays.roswellpark.org/Protocols). The hybridization signals were captured using an Affymetrix 428 array scanner (Affymetrix, Santa Clara, CA, USA), and analyzed using the ImaGene software (BioDiscovery, Inc., Marina Del Ray, CA, USA). Poor quality spots, along with spots with signal levels indistinguishable from background, were disgarded. The extracted image data were then processed by a series of steps including background subtraction, data normalization, ratio calculation, and statistical analysis of replicate spots. Data processing was done with the use of the ImaGene (BioDiscovery, Inc.) and the GeneTraffic software (Iobion Informatics LLC, La Jolla, CA, USA), the statistical package R, and in-house PERL programs. In order to control for the noise introduced by the fluorescent dyes, Cy3 and Cy5, each array experiment was performed twice with the labeling dyes reversed to eliminate dye biases, and the signal ratios from these two experiments were averaged. A log2-transformed treatment to control signal ratio of ≥1 or ≤−1 was chosen as the criteria for induction or repression, respectively. These threshold values are commonly used in the literature for microarray expression analysis (13,14). Hierarchical clustering analysis was performed using the Hierarchical Clustering Explorer software from the University of Maryland, USA.
The datasets from the six published gene expression profiling studies (cited as references 7–9) were downloaded from the authors’ respective websites. Our own selenium PC-3 dataset (cited as reference 5) and selenium LNCaP dataset are available at the Roswell Park website (http://falcon.roswellpark.org/publication/CIp/dataMining). In view of the fact that the eight microarrays originated from different sources, one must appreciate that different identifiers, including cDNA clone IDs, probe set IDs, and GenBank accession numbers, were used to label the genes. In order to facilitate data comparison, these identifiers were mapped to the UniGene database (Build 136) at the National Center for Biotechnology Information (NCBI). The UniGene Cluster IDs were used to cross-reference genes in different datasets.
For the three prostate cancer datasets (7–9), only samples classified as primary prostate cancer or normal prostate were included in the analysis; all other sample types were excluded from the original datasets. In order to identify genes that are differentially expressed between normal and cancer tissues, permutation t-test analysis was performed individually with each dataset. The t-statistic of a gene was calculated by the following formula:
where μi is the mean expression value of a given gene in the ith group, is the variance of that gene, and ni is the sample size of the ith group. The procedure of permutation was carried out on a gene-by-gene basis by randomly assigning each data point to either the normal or cancer group, while maintaining the total sample size of each group. This process was repeated 10,000 times and the p-value was defined as the fraction of t-statistics generated from randomization that was greater than or equal to the t-statistic generated from the actual data points. This method of analysis makes allowance for missing data points; however, anything with less than 5 data points is generally not expected to have sufficient statistical power and is therefore excluded from the analysis. A list of dysregulated genes was compiled based on the following criteria: p-values less than 0.001, and consistent changes in at least two out of the three datasets. The false discovery rate (q) was calculated as follows: , where p is the p-value, n is the total number of genes, and i is the number of genes with a p-value less than p. The above analyses were performed with in-house PERL programs.
Our selenium LNCaP dataset and Zhao’s selenium LNCaP dataset (6) were generated by an essentially identical protocol. The merging of these two datasets or, for that matter, other compatible datasets, would greatly increase the power and precision of the analysis provided that certain key parameters are properly safeguarded. Since the above two array experiments were conducted at multiple time points, it is necessary to devise a method for categorizing the pattern of expression changes across all time points. The data were filtered first to admit only those changes (induction or repression) that were over the 2-fold threshold (i.e. log2-transformed ratio ≥1 or ≤−1). A decision call of induction or repression was made for each gene only if ≥70% of the filtered data points showed the same direction of change. A consolidated LNCaP dataset was generated by merging the two LNCaP datasets and discarding genes with conflicting decision calls. The two androgen datasets of DePrimo et al. (11) and Nelson et al. (12) were merged in a similar manner. The above analyses were performed with in-house PERL programs.
Once a gene has been identified to be a target of selenium intervention, we assign it to a functional category for informational purposes. Functional annotation of transcripts was performed by using the Gene Ontology (GO) database and literature review. The UniGene cluster IDs of these genes were used to query the LocusLink database at NCBI (http://www.ncbi.nlm.nih.gov/LocusLink/) in order to extract the GO terms associated with these genes.
The three prostate cancer datasets (7–9) were chosen for our investigation because they represent the largest gene expression profiling studies comparing normal and cancerous prostate tissues. No statistical analysis, however, was performed in these three studies to identify putative prostate cancer genes. Since each of these datasets has independent measurements of gene expression in the normal and tumor groups, we undertook a systematic statistical evaluation of their results. Permutation t-test was carried out on each dataset, and genes with p-values <0.001 were selected as differentially expressed between the normal and cancer groups. Based on this criterion, 5,306 genes were pulled out from the Lapointe study (7–9), 672 from the Singh study (7–9), and 1,527 from the Welsh study (7–9). Our selection method has false discovery rates of 0.005, 0.019, and 0.008, respectively. For cross-validation, we reduced the number of genes to those with the same expression pattern in at least two out of three datasets. This procedure narrowed the list down to 1,067 genes with aberrant expression in prostate cancer. Among these, 497 or 46.6% are up-regulated, and 570 or 53.4% are down-regulated. The top 50 up- or down-regulated genes that appear in all three datasets, ranked by the average ratio, are listed in Tables IA and IB. The complete list can be accessed at our website.
A hierarchical clustering algorithm was applied to group genes according to their expression pattern across six time points following treatment with MSA. The clustering analysis of 762 selenium-responsive genes is shown in Figure 1. The branch points in the dendrogram correspond to each gene, and the length of the branches reflects the degree of relatedness. Red and green squares represent up-regulation and down-regulation, respectively, relative to the control values. Black squares indicate no change, and gray squares signify data of insufficient quality. The genes identified and the raw array data are available at our website. Four distinct clusters emerge from this analysis. Clusters A and C are composed of genes with a gradual or a rapid increase in expression level, respectively. Clusters B and D represent the group of genes with a rapid or gradual reduction in expression level, respectively.
The cellular responses of the androgen-responsive LNCaP cells and the androgen-nonresponsive PC-3 cells to selenium are very similar. These two cell models represent different stages of prostate cancer progression. In order to identify relevant molecular targets underlying selenium chemopreventive action in incident prostate cancer or late stage relapse, we matched the prostate cancer datasets to the selenium LNCaP and PC-3 datasets. The goal was to identify dysregulated prostate cancer genes which could be reversed or restored to normal by selenium in both LNCaP and PC-3 cells. In this analysis, we compared 1,067 genes that are consistently dysregulated in prostate cancer and 427 genes that are sensitive to selenium modulation in both LNCaP and PC3 cells. We found that there are a total of 71 genes common to both datasets. Among these, 25 are regulated in the same direction, 42 are regulated reciprocally, and 4 are regulated spuriously. Theoretically, when comparing a random list of 1,067 genes with another random list of 427 genes from the human genome (estimated to contain a total of ~30,000 genes), the number of overlap one would expect to obtain is: genes. Assuming there is a 50% chance of these 15 genes being modulated reciprocally (i.e. a random distribution), the number of genes in this category would be reduced by half to 7.5. This number is far less than the 42 reciprocally regulated genes we have identified. Therefore, it is very unlikely that the outcome of our data mining method is due only to chance. These 42 genes are listed in Table II. A negative value denotes down-regulation, while a positive value indicates up-regulation. The flip-flop between the PCa (prostate cancer) column and the two Se columns is self-evident. Three genes, UMPK, SERPINB5, and FOXA1, are also present in Tables IA or IB. It should be noted that the genes in these two tables are only subsets of the cohort of prostate cancer genes used in this analysis.
The genes in Table II are further classified into a number of functional categories. Because of space limitation, it is not possible to elaborate the function of each of these genes. Suffice it to note that a significant number of them is involved in controlling cell cycle progression and/or cell death, including AHR (15), CHC1 (16), CDKN1C (17), ATF5 (18), and FOXO1A (19–21). Selenium modulates their expression in a way that is consistent with cell growth inhibition, cell cycle block, and apoptosis stimulation. Table II also shows that selenium is able to up-regulate the expression of four genes with tumor suppressing activities. SERPINB5, also known as maspin, is a serine proteinase inhibitor capable of suppressing tumor invasion, apoptosis, and angiogenesis (22–24). It has been reported that the expression of SERPINB5 decreases with increasing prostate cancer malignancy (25). Gelsolin is under-expressed in several cancer types, including prostate (26–29). CYLD is a deubiquitinating enzyme which negatively regulates the activation of NFκB, an anti-apoptotic factor (30). Restoring the lost expression of CYLD in prostate cancer cells could conceivably sensitize them to apoptosis induction. SSBP2 is a translocation target in a leukemia cell line and is classified as a tumor suppressor candidate gene (31). It is intriguing that the expression of two oncogenes, FYN and RAB31, is down-regulated in prostate cancer. FYN is a member of the protein-tyrosine kinase oncogene family (32), and RAB31 belongs to the RAS oncogene family (33). The roles of these genes in prostate carcinogenesis are not clear; nonetheless, selenium is found to elevate the expression of both genes.
As a reminder, Table II is produced to highlight the putative prostate cancer genes sensitive to reversal of expression by selenium in both LNCaP and PC-3 cells. For the sake of thoroughness, we also present the analyses of two additional sets of prostate cancer genes which are uniquely modulated by selenium in either LNCaP (Table III) or PC-3 cells (Table IV). Due to the size of these tables, it would be tiresome to go through the data in any comprehensive fashion. Depending on future interests and evolving knowledge, this kind of information has value in seeking out clues and generating hypotheses.
In an attempt to identify the androgen-regulated genes of which the expression is opposed by selenium, we compared the list of androgen-regulated genes (422 genes) to the list of selenium-responsive genes in LNCaP (1,031 genes). A partial summary of our analysis is shown in Table V. The AR (androgen-regulated) column shows the genes which are sensitive to androgen. A positive sign means up-regulation, while a minus means down-regulation. A total of 92 genes were found to be present in both datasets. As a control, a list of 1,031 genes were selected randomly from the selenium LNCaP dataset, and compared with the list of androgen-regulated genes to identify genes in common. This process was repeated 10 times, and the number of overlap was 30.4±1.6 (mean±SEM), which is significantly less than the actual number of 92 genes common to the androgen and selenium datasets (p<0.0005). Out of these 92 genes, only 38 genes (~41%) are reciprocally modulated by androgen and selenium (Table V). These 38 genes are the ones presented in Table V. In the Discussion, we will offer additional explanation of why only a fraction of AR-targets are oppositely modulated by androgen and selenium, even though selenium is a potent inhibitor of androgen signaling.
Using prostate cancer chemoprevention as a research problem, Williams and Brooks (34) recently made a poignant commentary that microarray analysis holds great promise in unraveling the mechanisms of anticancer agents. Here we report, for the first time, a data mining approach to gain insight into the mechanisms of selenium utilizing published microarray datasets. The paradigm combines laboratory- and bioinformatics-based research to identify molecular targets or biomarkers of prostate cancer intervention by selenium. We recognize that this approach is only a first step in the discovery process. Nonetheless, the information extracted from this kind of analysis has significant potential in generating new leads to guide future research endeavors.
Rhodes et al. recently reported a meta-analysis of four datasets from prostate cancer gene expression profiling studies (35). Our study differs from the Rhodes study in a number of ways. First, two of the largest available datasets by Lapointe et al. (9) and Singh et al. (7) were not included in their analysis. Second, the Rhodes study compared localized prostate cancer to benign prostate tissue. The latter was inclusive of both normal prostate and benign prostatic hyperplasia (BPH). It has been reported that normal prostate and BPH have distinct gene expression patterns (36,37). Therefore, combining normal prostate and BPH into one single group could obscure some of the differences between normal and cancerous prostate. Third, instead of using a meta-analysis, we performed permutation t-test on each of the three datasets because they are large enough to generate independent and statistically verifiable information on their own. As a validation of our approach, the majority (~80%) of the top 40 over- and underexpressed genes of the Rhodes study are also present in our analysis (See our website). Additionally, our analysis picked up a few more genes (not found in the Rhodes paper) that are well known to be deregulated in prostate cancer, such as KLK2, KLK3 (PSA) (see our website), GSTP1, and SERPINB5 (Table IB).
We have identified 42 genes which are dysregulated in prostate cancer and are counter-regulated by selenium in both LNCaP and PC3 cells (Table II). In order to assess the significance of this analysis, we compared the functions of these genes with those of the 25 genes which are similarly regulated in prostate cancer and by selenium and found two major differences. First, there is no tumor suppressor gene modulated in the same direction in prostate cancer and by selenium. In contrast, there are four tumor suppressor genes which are down-regulated in prostate cancer, but are found to be up-regulated by selenium (Table II). Second, there is only one cell cycle regulatory gene modulated in the same direction in prostate cancer and by selenium. In contrast, there are five cell cycle regulatory genes (ATF5, AHR, CDKN1C, EXT1, and CHC1) which are modulated in opposite directions in prostate cancer and by selenium (Table II). More interestingly, selenium alters the expression of these genes in a manner that is consistent with growth inhibition.
In androgen-responsive prostate cancer, AR signaling is such a dominant pathway that shutting it down is likely to be sufficient for growth inhibition. Our previous publication showed that selenium markedly down-regulates AR signaling in LNCaP cells (5). Furthermore, we were able to confirm that overexpression of AR diminishes the sensitivity to selenium (unpublished data), suggesting that disruption of AR signaling by selenium is biologically relevant. Additionally, selenium is known to modulate a diverse number of cell cycle and apoptosis regulatory molecules, as well as survival signaling molecules, in different cell types regardless of the presence or absence of AR. Different cell types may present both common and unique targets to selenium intervention. Thus, it is apparent that selenium has many targets and there is no one key mechanism to account for the anticancer effect of selenium. The multitude of genes in Table II lends support to common mechanisms for the anticancer activity of selenium in both the androgen-responsive LNCaP cells and the androgen-unresponsive PC-3 cells. However, despite the overall similarity of their cellular responses to selenium, subtle differences exist between the two cell types. For example, selenium slows down cell cycle progression at multiple transition points in PC-3 (5), whereas mostly through G1 arrest in LNCaP (unpublished data). Genes distinctly targeted by selenium in these cells, as presented in Tables III and andIV,IV, could be attributable to the above disparities. They might also reflect the difference in genetic background such as response to androgen. Indeed, a noticeable distinction between Table III and Table IV is the presence of androgen-regulated genes in Table III.
Our analysis has identified 92 genes that are regulated by both selenium and androgen. However, only a modest proportion (38 out of 92) of these genes are modulated in reciprocal directions by selenium and androgen. A possible explanation for this is that genes have multiple regulatory elements, both positive and negative, in their promoter regions. Selenium is known to alter the expression of many transcription factors, co-activators and co-repressors (5). AR regulates the expression of its targets through both direct and indirect mechanisms. In other words, many other transcription factors and co-regulators are likely to be involved by virtue of the rippling effect initiated through AR signaling. Thus, it is to be expected that selenium could counteract the expression of some, but not all, androgen-regulated genes. The litmus test in the future is to study which AR-regulated genes sensitive to selenium reversal are important for modulation of prostate cancer risk.
The induction of forkhead O1A (FOXO1A) by selenium in both LNCaP and PC-3 cells is of special interest to us. FOXO1A is a member of the FOXO family of transcription factors, that induce the expression of pro-apoptotic genes including Fas ligand (38–40), bcl-2 family proteins (19,40,41), and TRAIL (20). Furthermore, FOXO1A is involved in cell cycle arrest (21). FOXO1A is phosphorylated and suppressed by AKT (42,43), which is an important survival molecule for prostate cancer (44). Androgen receptor (AR) also interacts with FOXO1A and inhibits its activation of Fas ligand expression (45). Selenium conveniently down-regulates both AKT and AR signaling (4,46). As shown in Figure 2, the stimulatory effect of selenium on FOXO1A signaling could be due to a direct induction of FOXO1A transcription coupled to an indirect activation of FOXO1A by alleviating the inhibitory modulation through AR and/or AKT.
Three key components of the transforming growth factor β (TGFβ) signaling pathway are consistently repressed in a large set of primary prostate tumors. These genes are TGFβ2, TGFβ receptor type II, and TGFβ receptor type III. Type I and II receptors have serine/threonine protein kinase domains and are directly involved in TGFβ signaling (47). Type III receptor does not have an intrinsic signaling domain; however, it facilitates the binding of all TGFβs, and especially TGFβ2, to the type II receptor (47). TGFβ is a pleiotropic cytokine, but is mainly a growth inhibitor of epithelial cancer, particularly at the early stage of development (48). It has been shown that the type I and II receptors are down-regulated in prostate cancer (49,50) and that loss of expression of the receptors is associated with poor prognosis (51). Therefore, from a prevention standpoint, stimulating TGFβ signaling is likely to produce a desirable outcome. In this study, we found that the expression of TGFβ2, TGFβ type II and III receptors is concertedly up-regulated by selenium. It is also worth mentioning that the expression of TGFβ2 is known to be induced by a forkhead transcription factor closely related to FOXO1A (52) and that both AR and AKT repress TGFβ signaling (53). Thus, the effect of selenium could be amplified by the crosstalk of TGFβ, AKT, and AR signals as illustrated in Figure 2. Our future research efforts will be directed towards elucidating the contribution of these pathways in selenium chemoprevention of prostate cancer.
This work was supported by a Department of Defense Postdoctoral Fellowship Award and an AACR-Cancer Research Foundation of the America Fellowship in Prevention Research Award, grant 62-2141 from the Roswell Park Alliance Foundation, grant CA91990 from the National Cancer Institute, and also in part by Cancer Center Support Grant P30 CA16056 awarded to RPCI from the National Cancer Institute, U.S.A. H. Zhao is supported by a Postdoctoral Traineeship Award from the United States Army MRMC Prostate Cancer Research Program (Award Number W81XWH-04-1-0080). The authors are grateful to Dorothy Donovan and Cathy Russin for their excellent technical assistance.