|Home | About | Journals | Submit | Contact Us | Français|
Annotation of prostate cancer genomes provides a foundation for discoveries that can impact disease understanding and treatment. Concordant assessment of DNA copy number, mRNA expression, and focused exon resequencing in 218 prostate cancer tumors identified the nuclear receptor coactivator NCOA2 as an oncogene in ~11 percent of tumors. Additionally, the androgen-driven TMPRSS2-ERG fusion was associated with a previously unrecognized, prostate-specific deletion at chromosome 3p14 that implicates FOXP1, RYBP and SHQ1 as potential cooperative tumor suppressors. DNA copy-number data from primary tumors revealed that copy-number alterations robustly define clusters of low- and high-risk disease beyond that achieved by Gleason score. The genomic and clinical outcome data from these patients is now made available as a public resource.
Prostate cancer is the most common malignancy in males with ~190,000 new cases diagnosed per year in the United States and ~27,000 deaths. Prostate tumors show tremendous biological heterogeneity, with some patients dying of metastatic disease within 2–3 years of diagnosis whereas others can live for 10–20 years with organ-confined disease, likely a reflection of underlying genomic diversity. Large-scale cancer genome characterization projects studying glioblastoma, lung, colon, pancreas and breast cancers have provided critical new insights into the molecular classification of cancers and have the potential to identify new therapeutic targets (Cancer Genome Atlas Research Network, 2008; Ding et al., 2008; Jones et al., 2008; Parsons et al., 2008; Sjoblom et al., 2006; Weir et al., 2007; Wood et al., 2007). Prostate cancer presents special challenges for such large-scale multicenter genomics projects because of the relatively small tumor size and admixture with stroma that requires careful pathologist-guided dissection.
A number of groups have reported analyses of transcriptomes and copy-number alterations (CNAs) in prostate cancer, but rarely from the same samples and typically from modest numbers of tumors (~50–100 samples) or with lower resolution platforms (Kim et al., 2007; Lapointe et al., 2007; Lapointe et al., 2004; Lieberfarb et al., 2003; Perner et al., 2006; Singh et al., 2002). Consistent and common findings from these reports include the TMPRSS2-ERG fusion in ~50 percent, 8p loss in ~30–50 percent, and 8q gain in ~20–40 percent of cases. The data implicating ERG as a prostate cancer gene is clear (Tomlins et al., 2005), but there has been less progress in defining specific genes targeted by various common amplifications and deletions, in part due to limited availability of complementary transcriptome and exon resequencing data on sufficient patients to narrow the focus to a small list of candidate genes. Numerous transcriptome studies have defined general prostate cancer signatures but, unlike breast cancer (Paik et al., 2004; van de Vijver et al., 2002), these analyses have not identified robust subtypes of prostate cancer with different prognoses (Febbo and Sellers, 2003; Lapointe et al., 2004; Singh et al., 2002).
Here we adopted a comprehensive approach to define transcriptomes and CNAs in 218 prostate tumors (181 primaries, 37 metastases) and 12 prostate cancer cell lines and xenografts, as well as complete exon resequencing and/or focused mutation detection for 157 high interest genes in 80 tumors and 11 cell/xenograft lines (Table 1). After generating a map of CNAs across the dataset, we used matching mRNA and microRNA transcriptome and exon resequencing data to define the frequency of alterations in several common signal transduction pathways, explore various candidate genes within a few selected regions of copy-number gain and loss and correlate genomic alterations to clinical outcome. These data serve as a valuable resource for the cancer genomics community, prostate cancer scientists and clinicians and is readily and freely available through a user-friendly web-based portal (http://cbio.mskcc.org/prostate-portal/).
We applied rigorous criteria for selecting tumors for genomic analysis that have become the standard in large genomic studies (Cancer Genome Atlas Research Network, 2008) but adapted to address unique challenges posed by prostate cancers. All 218 samples had at least 70% tumor content (Table 1, Figure S1, Table S1). Transcriptome (mRNA and microRNA) and CNA profiling were conducted without amplification, with the exception of exon resequencing, which required whole genome amplification. Because we did not impose a stringent tumor size requirement, the small size of some tumors precluded concurrent analysis across all four platforms (Table 2, Table S2).
Analysis of known prostate cancer alterations in our dataset indicates successful tumor selection criteria (Figure S1). For example, the frequency of ERG alteration was 52 percent (see Experimental Procedures), consistent with other studies, and chromosome 8p loss and 8q gain were easily detected (Figure 1A). Overt CNAs were observed in 89 percent of tumors, also indicative of high tumor content. Additional histologic and molecular analysis of those tumors without CNA confirmed high tumor content (e.g., detection of TMPRSS2-ERG translocations). To address the possibility that our stringent tumor selection criteria might bias our dataset toward larger, more aggressive prostate cancers, we compared the clinical outcome of the 181 primary tumors (Table S1) in this dataset to 3437 consecutive men with prostate cancer treated by prostatectomy at MSKCC from 2000–2006. The time to biochemical relapse, defined as a rise in serum prostate specific antigen, was somewhat shorter in this study cohort (Table 1, Figure S1A). While this indicates that genomic findings from these samples may be biased toward larger, more aggressive prostate cancers (selected to ensure sufficient nucleic acid yield), this cohort nevertheless includes patients with favorable long-term clinical outcome (24% of patients have greater than 5-year recurrence-free survival).
Global analysis of CNA data from 194 tumors and 12 cell lines/xenografts revealed broad diversity in alteration levels. Metastatic tumors and cell/xenograft lines harbored the greatest number of whole chromosome, chromosome arm, and focal amplifications and deletions, but primary tumors also displayed a wide range of alteration levels, from tumors appearing metastatic-like in profile to those with fundamentally diploid genomes (Figure S1B). Regions of recurrent CNA were identified using the statistical method RAE (Taylor et al., 2008), revealing 30 focal amplifications and 36 focal deletions as well as recurrent gains and losses of seven chromosome arms (Figure 1A, Tables S3 and S4). The most frequent alteration in the prostate oncogenome was loss of chromosome 8p, a common abnormality in many epithelial tumors that harbors NKX3.1 (He et al., 1997). Interestingly, NKX3.1 mRNA expression did not correlate with copy-number loss, suggesting the possibility of alternative tumor suppressors in this region. Consistent with prior studies, we also found peaks of deletion targeting PTEN on 10q23.31, RB1 on 13q14.2, TP53 on 17p31.1 and the interstitial 21q22.2-3 deletion spanning ERG and TMPRSS2. Other broader deletions included 12p13.31-p12.3, which spans ETV6 and DUSP16 in addition to CDKN1B, the previously reported target of this genomic deletion (Lapointe et al., 2007). The most common amplified loci included MYC on 8q24.21 and a previously unreported NCOA2 amplification on 8q13.3 (discussed further below). Focal amplifications of AR (Xq12) were also common but restricted to metastatic tumors. Other gains span discontinuous regions of 7q, including genes such as BRAF and EZH2, for which we were unable to localize individual target genes. We observed less frequent gains of 5p13.3-p13.1 spanning AMACR, RICTOR, SKP2 as well as 47 other genes and 2 microRNAs.
Eighty tumors were examined for somatic mutations in 138 genes by exon sequencing (Figure 1A, Tables S5 and S6). These and an additional 76 tumors were also profiled for well-known oncogenic mutations in 22 genes by mass spectrometry using the iPLEX Sequenom assay (Table S5). In total, 84 confirmed somatic mutations were detected in 57 different genes (Table S6). Thirty-seven percent of the missense mutations we detected are predicted to affect protein function (Table S6) based on an algorithm that uses a combination of evolutionary information from protein-family sequence alignments and residue placement in known or homology-deduced three-dimensional protein and complex structures (B. Reva, et al., http://www.mutationassessor.org/). Among all mutated genes, including those bearing previously known mutations, the most commonly mutated gene was the androgen receptor (AR), with 4 samples, all metastases. Mutations in 21 other genes were detected in 2 or more samples, but no single gene other than AR had mutations in more than 3 samples. We also confirmed prior data suggesting that common, broadly mutated oncogenes such as PIK3CA, KRAS and BRAF are not commonly mutated in prostate cancer (2 tumors had H1047R and E545K PIK3CA mutations, 2 had G12V and Q61H/L KRAS mutations and one tumor had a BRAF V600E mutation). Mutations in other more recently identified oncogenes such as IDH1 and IDH2 were similarly rare, with only one tumor bearing an IDH2 R172K mutation. Curiously, one tumor with a mutation in the mismatch repair gene MSH6 (V250A) had 11 confirmed somatic mutations versus an average of 2 somatic mutations per tumor, suggestive of a mutator phenotype. Mutations in two other DNA repair genes, BLM and XPC, were each found in a single tumor, but not in association with an increased number of other mutations. Only two tumors had missense mutations in TP53 and none had mutations in PTEN, but both tumor suppressors were commonly altered through hetero- or homozygous copy-number loss (~24 percent and ~21 percent respectively). Comparison of synonymous and non-synonymous changes detected in these samples suggests a low mutation rate in prostate cancer (~0.31 mutations/Mb). Consistent with this notion, the frequency of mutations recovered in our analysis did not exceed the expected background rate (Ding et al., 2008), although the modest number of genes and samples sequenced limits this analysis.
We next integrated the CNA, transcriptome, and mutation data to conduct a core pathways analysis, based on the success of this approach in revealing common pathway alterations in glioblastoma (Cancer Genome Atlas Research Network, 2008). Three well-known cancer pathways were commonly altered – PI3K, RAS/RAF and RB – with frequencies ranging from 34 to 43 percent in primary tumors versus 74 to 100 percent in metastases (Figure 1B). In this analysis, a tumor was considered altered if one or more genes in the pathway was mutated or significantly deregulated at the expression level (outlier expression compared to the distribution of expression in normal prostate samples, see Experimental Procedures). As in glioblastoma, the extremely high frequency of alteration in these pathways became evident only through examination of multiple genes in each pathway since individual genes are affected less commonly. Of particular interest is the PI3K pathway, which was altered in nearly half of primaries and all metastases examined. Loss of PTEN function, through deletion, mutation or reduced expression, has been well documented in prostate cancer with an estimated frequency of ~40 percent (Pourmand et al., 2007), consistent with our findings here. The frequency of PI3K pathway alteration rises substantially when PTEN alterations are considered together with alterations in the INPP4B and PHLPP phosphatases recently implicated in PI3K regulation, the PIK3CA gene itself, and the PIK3CA regulatory subunits PIK3R1 and PIK3R3 (Cancer Genome Atlas Research Network, 2008; Gao et al., 2005; Gewinner et al., 2009; Jaiswal et al., 2009; Ueki et al., 2003). These data provide strong rationale for exploring the clinical activity of PI3K pathway inhibitors, many of which are now in early clinical development, in prostate cancer.
We also conducted a core pathway analysis of AR, which is essential for growth and differentiation of the normal prostate and is responsible for treatment failure in castration-resistant, metastatic disease (Chen et al., 2004; Tran et al., 2009). As expected, alteration of AR through mutation, gene amplification, and/or overexpression was common but occurred exclusively in metastatic samples (58%, Figure 2A). However, AR pathway analysis (including several known AR coactivators and corepressors) revealed alteration in 56 percent of primaries and 100 percent in metastases (Figure 2A, Figure S2). Among AR pathway genes, the most striking finding was a peak of copy-number gain on 8q13.3 (~57 megabases away from the peak at 8q24 commonly attributed to MYC, and of even greater significance) that spans the nuclear receptor coactivator gene NCOA2 (also known as SRC2/TIF2/GRIP1). Seventeen percent of tumors had broad gains of the region spanning NCOA2 on 8q while 6.2% of tumors (1.9% and 24.3% of primary and metastases respectively) harbored focal or high-level amplifications of the locus and these were significantly correlated with elevated NCOA2 transcript levels (p-value < 10−16, Figure S2A). In addition to copy number and expression changes, AR pathway alterations included mutations in NCOA2 (2 confirmed somatic) as well as in NCOR2 (3 tumors), NRIP1, TNK2, and EP300 (1 tumor each). Overall, eight percent of primary tumors and 37 percent of metastases had NCOA2 gain of expression (determined to be outlier expression as described in the Experimental Procedures) or mutation (Figure 2A–B). Including broader gains of 8q, the frequency of NCOA2 alteration may be as high as 20 and 63 percent in primary and metastatic tumors respectively. Of note, NCOA2 mutations have also been reported in melanoma and lung cancer and, in conjunction with the prostate mutations detected here, cluster in two highly conserved regions. These include a serine/threonine-rich stretch (S/T) known to be phosphorylated and an activation domain (AD1) that mediates binding with histone acetyltransferases such as CBP and p300 (Huang and Cheng, 2004) (Figure 2C). A third patient had an A407S NCOA2 substitution that was potentially germline (detected at low frequency in adjacent normal prostate). Interestingly, non-castrate patients with primary tumors harboring NCOA2 mutation, overexpression or high-level amplification had significantly higher rates of recurrence (Figure S2B).
Taken together, the high frequency of NCOA2 gain in primary tumors and its known role as an AR coactivator (Agoulnik et al., 2005) suggest that these two genes might collaborate in early prostate cancer progression by enhancing AR transcriptional output. We addressed this possibility by expressing increasing levels of NCOA2 in prostate cancer cells with a fixed endogenous level of non-amplified AR. The range in NCOA2 protein levels is shown by western blot (Figure 2D) and is similar to the ~2 to 4-fold increase in NCOA2 mRNA levels over the mean level seen in the overall prostate cancer cohort. As expected from other work, AR transcriptional output (measured using an androgen-responsive reporter construct) was increased when cells were treated with dihydrotestosterone (DHT) in a dose-dependent fashion, but reached a plateau between 1–10 µM (Figure 2D). Increasing levels of NCOA2 shifted the DHT dose-response curve leftward and upward, indicating that NCOA2 can not only prime AR to respond to lower androgen concentrations but can also boost the total magnitude of AR transcriptional output. One prediction from these in vitro data is that the AR transcriptional output in prostate cancers with NCOA2 gene amplification should be greater than those without. Based on a 29-gene signature of AR transcriptional output previously used to conduct small molecule screens for novel antiandrogens (Hieronymus et al., 2006), NCOA2-amplified primary tumors displayed an increase in AR signaling (Figure 2E). Collectively, the genomic and functional data suggest that NCOA2 functions as a driver oncogene in primary tumors by increasing AR signaling, which is known to play a critical role in early and late stage prostate cancer. In contrast, AR amplification, which is largely restricted to castration-resistant metastatic disease, is more likely a mechanism of drug resistance rather than a natural step in tumor progression. We also propose that NCOA2 and MYC both function as driver oncogenes on the 8q13 and 8q24 amplicons respectively.
The TMPRSS2-ERG fusion is the single most prevalent molecular lesion in prostate cancer (Tomlins et al., 2005). Functional studies of TMPRSS2-ERG, including transgenic expression in the mouse prostate, have shown modest evidence of oncogenic activity (Carver et al., 2009; King et al., 2009; Klezovitch et al., 2008; Tomlins et al., 2008), which raises the possibility that cooperating events are required.
Analysis of 194 tumors for CNAs associated with TMPRSS2-ERG fusion revealed three significant regions of copy-number loss: two spanning the tumor suppressors PTEN and TP53 and a third spanning the multigenic region at 3p14 (Figure 3A). PTEN loss was recently shown to cooperate with TMPRSS2-ERG in transgenic mice and in a prostate tissue reconstitution model (Carver et al., 2009; King et al., 2009; Zong et al., 2009). The 3p14 deletion, whose association with TMPRSS2-ERG was even more significant, has not been previously reported and spans only 8 genes. Further interrogation of 2550 tumors and cell lines spanning 14 tumor types (acute lymphoid leukemia, breast, colorectal, esophageal, glioma, hepatocellular, non-small cell lung, squamous lung, medulloblastoma, melanoma, myeloproliferative, ovarian, renal and prostate) for CNAs in this region suggests this deletion is only found in prostate cancer (Beroukhim et al., 2010). Indeed, the only other focal signal found in this region is an amplicon in melanoma that includes microphthalmia-associated transcription factor (MITF), a previously reported finding (Garraway et al., 2005).
Closer inspection of the 3p14 deletion in our prostate cohort revealed two distinct peaks of association within the region that, together with expression data and the focal deletion patterns, implicates only three genes: FOXP1, RYBP and SHQ1 (Figure 3B, 3C). Deletions in some tumors spanned FOXP1 only, whereas others included RYBP and SHQ1, but spared FOXP1. FOXP1 encodes a forkhead box transcription factor and functions in motor neuron specification in the spinal cord, as well as early thymocyte development, in collaboration with various HOX genes (Arber, 2008; Pfaff, 2008). A role for FOXP1 in cancer has been proposed based on reduced expression in breast and other cancers, increased expression in some lymphoid malignancies and, remarkably, by translocation-mediated fusion to the ERG homolog ETV1 in at least one prostate cancer (Goatly et al., 2008; Hermans et al., 2008; Koon et al., 2007). Furthermore, recent evidence implicates the FoxP family member FOXP3 as a potential tumor suppressor in prostate cancer (Wang et al., 2009). RYBP (Ring and YY1 Binding Protein) encodes a polycomb group transcriptional repressor implicated in homeotic development and, potentially, as a tumor suppressor through inhibition of MDM2 and subsequent p53 stabilization (Chen et al., 2009). SHQ1 encodes an accessory factor for the assembly of H/ACA ribonucleoproteins (RNP) through direct binding to NAP57, a core RNP subunit. Missense mutations in NAP57 that disrupt interaction with SHQ1 are associated with the bone marrow failure syndrome dyskeratosis congenita, raising a potential link to precancer syndromes (Grozdanov et al., 2009).
To gather further evidence for a potential tumor suppressor role of either of these genes, we searched for point mutations through exon resequencing. We found no mutations in FOXP1 or RYBP, but detected a confirmed somatic mutation in SHQ1 (P22S) in a highly conserved region of the CS domain that is required for SHQ1 function (Singh et al., 2009). A second tumor had a deletion targeting the middle of the SHQ1 gene that, consequently, resulted in production of an aberrant mRNA species truncated at exon 6 (Figure 3D). Although these data further implicate SHQ1 as a tumor suppressor in this locus, the fact that some tumors with 3p14 loss spare SHQ1 (Figure 3B) raises the possibility of multiple tumor suppressors in this region.
Given the pressing need for biomarkers that distinguish indolent from aggressive prostate cancer, we also examined the genomic data for prognostic significance. It is estimated that 30–50 percent of men diagnosed with prostate cancer could avoid surgery or radiation (and instead be followed by watchful waiting) because they have good-prognosis tumors that are unlikely to progress (Cooperberg et al., 2005). Whereas transcriptome analysis defines breast cancer subgroups with distinct prognoses and treatment outcomes that have changed clinical practice (Paik et al., 2004; van de Vijver et al., 2002), similar studies in prostate cancer have been less clinically useful (Mucci et al., 2008a; Mucci et al., 2008b). The 5-year median clinical followup linked to this tumor set provided an opportunity to address the prognosis question using various forms of oncogenomic data. While unsupervised hierarchical clustering of mRNA and microRNA data failed to identify robust clusters of patients with significant differences in prognosis, the CNA data revealed distinct subgroups with substantial differences in time to biochemical (PSA) relapse (Figure 4A,B, Figure S3A–C). Further attempts to identify individual genes whose expression has prognostic impact through outlier analysis (~1766 genes with over- or under-expressing outliers relative to normal prostate) were only modestly successful and these associations were weak relative to those observed using the CNA data.
CNA analysis revealed two notable subgroups of primary tumors – those with minimal CNA (clusters 1–4) and those with substantial CNA (clusters 5–6) that include most of the metastatic samples (Figure 4A). Clusters 5 and 6 are distinguished by the fact that cluster 5 tumors have genome-wide alterations, whereas those in cluster 6 primarily have 8q (NCOA2, MYC) or chromosome 7 gains. Among the tumors with minimal CNA, cluster 2 is characterized by largely unaltered genomes. Using the endpoint of time to biochemical relapse, primary tumors with generally diploid tumors in the minimally altered cluster 2 had an extremely favorable prognosis versus an extremely unfavorable prognosis for the highly altered cluster 5 tumors (Figure 4B).
We next examined whether the prognostic impact of CNA is simply a reflection of genomic instability or the impact of specific genomic alterations. The fact that the two clusters with the highest prevalence of CNAs (5 and 6) have statistically different outcomes supports the latter hypothesis. To explore this question further, we systematically examined the impact of gain or loss of whole chromosome arms or more focal regions of gain or loss across the genome. Combined loss of 13q and 18q, focal amplification of two distinct 5p regions (5p13 or 5p15), and focal deletion of 5q21.1 were each significantly associated with a negative outcome (Figure S3B,C), further supporting the notion that distinct genomic alterations impact prognosis and raising the possibility that genes in these regions play functional roles in prostate cancer.
These findings raise the possibility that CNA assessment at diagnosis may have clinical utility in distinguishing low- from high-risk disease, but only if this data adds to the prognostic impact currently provided by the histology-based Gleason score. Poor prognosis Gleason score (>7) tumors were distributed across clusters 2–6 (albeit with greatest frequency in cluster 5), indicating that histology and CNA are not overlapping (Figure 4C). Furthermore, low-risk Gleason scores (≤6) were not enriched among clusters 1–4. Therefore, Gleason grade cannot fully explain the association with biochemical relapse. These results raise the possibility of a CNA-based test that might guide treatment choice in men with newly diagnosed prostate cancer, though this would require validation in a larger independent data set and confirmation that such information can be obtained from biopsies rather than prostatectomy samples. Such a test might be a genome-wide assessment using array-based CGH or multiple inversion probe (MIP) technology (Wang et al., 2005) or be centered on specific regions of gain or loss identified through further confirmatory studies.
The clinical heterogeneity of prostate cancer, coupled with its high prevalence, raises challenges in the management of newly diagnosed patients as well as those with metastatic disease. Genomic-based classification offers the hope of more informed clinical decision-making and may yield novel therapeutic targets. Integrated large scale cancer genomics projects in several tumor types have established the utility of this approach for generating the datasets required to derive such classification schema (Cancer Genome Atlas Research Network, 2008; Chitale et al., 2009; Ding et al., 2008; Jones et al., 2008; Parsons et al., 2008; Sjoblom et al., 2006; Weir et al., 2007; Wood et al., 2007). These reports provide definitive overviews of the genomes of those tumor types and in cases such as TCGA provide easy, web-based access to genomic data that serves as a public resource. The prostate cancer dataset generated here is comparable in size (218 carefully selected, well-annotated tumors) and scope (transcriptome, CNA, exon resequencing) and is linked to clinical outcome. All raw and processed data is freely accessible at http://cbio.mskcc.org/prostate-portal/.
One observation from the exon resequencing data is that somatic point mutations in prostate cancer may be rare relative to other tumor types such as glioblastoma, lung cancer and melanoma (Greenman et al., 2007; Pleasance et al., 2010a; Pleasance et al., 2010b). With the caveat that only 138 genes were examined (selected primarily based on known roles in other cancers), no single gene emerged as commonly mutated. TP53 and PTEN, which are often cited as prostate cancer tumor suppressors (Dong, 2006; Pourmand et al., 2007), were commonly altered, but primarily through copy-number loss rather than point mutation. Ongoing comprehensive sequencing studies (whole-genome or whole-exome capture) will provide more insight into the overall mutation rate in prostate cancer.
Several findings have emerged from our analysis, largely based on the opportunity provided by integrated analysis of multidimensional data. The nuclear receptor coactivator NCOA2 was identified as a highly significant target gene on the 8q13 amplicon and is also subject to mutation in some tumors lacking gene amplification. Functional studies presented here support the hypothesis that increased NCOA2 dosage amplifies AR pathway transcriptional output in primary tumors, providing a mechanism for its potential role as a prostate cancer oncogene. Whereas AR gene amplification or mutation is generally restricted to metastatic, castration-resistant disease (acquired in association with treatment resistance), CNAs or mutations in NCOA2 and other regulators of nuclear receptor function such as NCOR2 are present in primary tumors, thereby extending the potential importance of AR pathway perturbation to disease initiation.
A second finding is a narrow deletion on 3p14 highly associated with TMPRSS2-ERG fusion-positive tumors that appears to be present only in prostate cancers. Integrative analysis of copy number, transcriptome and exon resequencing data implicates three genes within this region (FOXP1, RYBP and SHQ1) as potential context-specific tumor suppressors, either alone or in combination. Our methodology also confirmed prior reports of an association of TMPRSS2-ERG with PTEN loss (Han et al., 2009; Reid et al., 2010), an interaction that has now been validated by in vivo studies in mice (Carver et al., 2009; King et al., 2009; Zong et al., 2009). We found evidence of a possible association with 16q23 deletion previously reported by others (Demichelis et al., 2009), but this did not reach statistical significance in our larger dataset. As has been done with PTEN, these new associations warrant future functional studies and could define unique ERG-specific tumor suppressor interactions.
These findings together with our analysis showing the high impact of CNA data on risk of relapse relative to transcriptome profiling demonstrate the broad utility of this integrated prostate oncogenome dataset. The high prevalence of this important disease and the relative paucity of large comprehensive genomic datasets in prostate cancer make this a unique public resource for the cancer research community.
A total of 218 tumor samples and 149 matched normal samples were obtained from patients treated by radical prostatectomy at Memorial Sloan-Kettering Cancer Center. All patients provided informed consent and samples were procured and the study was conducted under Memorial Sloan-Kettering Cancer Center Institutional Review Board approval. Clinical and pathologic data were entered and maintained in our prospective prostate cancer database. Following radical prostatectomy, patients were followed with history, physical exam, and serum PSA testing every 3 months for the first year, 6 months for the second year, and annually thereafter. For all analyses described here, biochemical recurrence (BCR) was defined as PSA ≥ 0.2 ng/ml on two occasions. At the time of data analysis, patient follow-up was completed through December 2008.
DNA and RNA were extracted from dissected tissue containing greater than 70% tumor cell content as well as from seven cell lines and seven xenografts (see Supplementary Information). Resulting DNA and RNA were hybridized to Agilent 244K array comparative genomic hybridization (aCGH) microarrays, Affymetrix Human Exon 1.0 ST arrays, and/or Agilent microRNA V2 arrays respectively (Table 2). The normalization and statistical analysis of both DNA copy number and expression array data are available in Supplementary Information.
In total, 251 million bases in coding exons and adjacent intronic sequences of 138 cancer-related genes in 91 samples were PCR-amplified and sequenced by Sanger capillary sequencing (Table S5). Ninety-five sites of known mutation in 22 genes were also genotyped using the iPLEX Sequenom platform. The details of whole-genome amplification, sequencing, mutation detection pipelines, mutation validation, background mutation rate analysis, and Sequenom genotyping are described in Supplementary Information.
Outlier profiles for all transcripts and outlier assignments in all tumors were determined from normalized expression data as previously described (Ghosh and Chinnaiyan, 2009). Briefly, in this nonparametric approach an empirical distribution function generated from transcript expression in the 29 normal prostate tissues was used to transform expression in the tumor samples, from which outliers were determined with the criteria described in the Benjamini and Hochberg algorithm (Benjamini and Hochberg, 1995) at an error rate (α) = 0.01.
For the purposes of the association analysis among copy-number alterations (described in detail in Supplementary Information), we classified fusion-positive tumors exclusively from aCGH data to maximize our power to detect novel associations in CNA data alone. Tumors were considered fusion-positive if they harbored canonical 21q22.2-3 genomic deletion (D0 or D1 > 0.9 from RAE analysis) with 5’ and 3’ breakpoints in the coding loci of ERG and TMPRSS2 respectively accompanied by interstitial deletion, or those samples with micro-deletions at the expected breakpoint sites in either or both genes in conjunction with intergenic diploidy. This approach under-estimates the true frequency of TMPRSS2-ERG fusion by excluding tumors with balanced rearrangement. Therefore, all other analyses described in this study classified TMPRSS2-ERG status using the subset of cases with both aCGH and expression data. Here, fusion-positive tumors were those having either the genomic deletion described above or whole-transcript outlier expression inferred from exon expression arrays as described above. We note that re-classification of TMPRSS2-ERG status using individual exon expression adjacent to the expected breakpoints in each coding sequence produced similar results.
The details of pathway curation and gene selection for the pathway diagrams are described in the Supplementary Information. To determine pathway alteration frequencies, gene alterations were defined by up- or down-regulation compared to normal prostate (outlier expression), or by somatic non-synonymous mutations. A given tumor was considered altered if at least one gene in the pathway was altered. Mutations in genes known to be frequently deleted or down-regulated were considered as inactivating mutations (shades of blue in the figures), while mutations in genes known to be frequently amplified or up-regulated were considered as activating mutations (shades of red). Additionally, the association of NCOA2 gain-of-function alteration (outlier over-expression or copy-number amplification) with androgen signaling was assessed using a 29-gene signature of androgen stimulation (Hieronymus et al., 2006). The significance of this association was tested in non-castrate primary tumors by Student’s t-test.
Unsupervised hierarchical clustering of discretized copy-number alterations (gain and loss, A0 and D0 ≥ 0.75; otherwise copy-neutral from RAE analysis) assigned to regions of the unified breakpoint profile excluding known CNVs was performed with the Manhattan distance measure and Ward’s linkage.
For NCOA2 assays, pCDNA3-NCOA2 and PSA-Luc reporter were transfected into LNCaP cells that were androgen-starved (20h) and then assayed for growth after another 20h (One-Glo). Additional experimental details available in Supplementary Information.
Study data is deposited in NCBI GEO under accession GSE21032. The analyzed data can also be accessed and explored through the MSKCC Prostate Cancer Genomics Data Portal: http://cbio.mskcc.org/prostate-portal/
Full methods are described in the Supplemental Information.
Current knowledge of prostate cancer genomes is largely based on small patient cohorts using single modality platforms. We present an integrated oncogenomic analysis of 218 primary and metastatic prostate cancers as well as 12 cell lines and xenografts. Mutations in known, commonly mutated oncogenes and tumor suppressor genes such as PIK3CA, KRAS, BRAF and TP53 are present but rare. However, integrative analysis of mutations, copy-number alterations and expression changes revealed changes in the PI3K, RAS/RAF and androgen receptor (AR) pathways in nearly all metastatic samples and a high frequency of primary samples. These data clarify the role of several known cancer pathways in prostate cancer, implicate several new ones, and provide a blueprint for clinical development of pathway inhibitors.
This work is dedicated to the memory of our colleague and friend William Gerald who initiated this project. We are grateful for the technical assistance and support of A. Viale (MSKCC Genomics Core), K. Huberman, S. Thomas, O. Aminova (Beene Translational Oncology Core), A. Olshen, J. Satagopan (Epidemiology and Biostatistics), L. Vargas, L. Chen (Pathology) and A. Gabow (Bioinformatics Core). This work was supported in part by the MSKCC Prostate SPORE CA092629 and by the David H. Koch Foundation. C.L.S. is an Investigator of the Howard Hughes Medical Institute.