|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Tumor metastases pose the greatest threat to a patient's survival, and thus, understanding the biology of disseminated cancer cells is critical for developing effective therapies.
Microarrays and immunohistochemistry were used to analyze primary breast tumors, regional (lymph node) metastases, and distant metastases in order to identify biological features associated with distant metastases.
When compared with each other, primary tumors and regional metastases showed statistically indistinguishable gene expression patterns. Supervised analyses comparing patients with distant metastases versus primary tumors or regional metastases showed that the distant metastases were distinct and distinguished by the lack of expression of fibroblast/mesenchymal genes, and by the high expression of a 13-gene profile (that is, the 'vascular endothelial growth factor (VEGF) profile') that included VEGF, ANGPTL4, ADM and the monocarboxylic acid transporter SLC16A3. At least 8 out of 13 of these genes contained HIF1α binding sites, many are known to be HIF1α-regulated, and expression of the VEGF profile correlated with HIF1α IHC positivity. The VEGF profile also showed prognostic significance on tests of sets of patients with breast and lung cancer and glioblastomas, and was an independent predictor of outcomes in primary breast cancers when tested in models that contained other prognostic gene expression profiles and clinical variables.
These data identify a compact in vivo hypoxia signature that tends to be present in distant metastasis samples, and which portends a poor outcome in multiple tumor types.
This signature suggests that the response to hypoxia includes the ability to promote new blood and lymphatic vessel formation, and that the dual targeting of multiple cell types and pathways will be needed to prevent metastatic spread.
Metastases are the main cause of mortality for patients with breast cancer. The molecular biology behind metastasis is complex and likely requires changes in cell cycle regulation , the repertoire of expressed proteases and protease inhibitors , proteins that promote autocrine growth loops, and/or proteins that cause an epithelial-to-mesenchymal transition . To make matters more complicated, it is clear that metastasis biology is in part governed by non-tumor cells including fibroblasts , endothelial cells , and myoepithelial cells . For example, recent evidence suggests that tumor endothelial cell interactions are important for determining patient outcomes as evidenced by the promising results from clinical trials that use bevacizumab, a monoclonal antibody directed against vascular endothelial growth factor (VEGF) [7,8].
Genomic profiling of human tumors and model systems has identified important features concerning metastasis biology. First, it has been shown that the expression profile of primary tumors without metastases can be highly predictive of the development of future metastases [9-13]. Second, cell lines can be selected that have specific end-organ tropisms with distinct expression profiles [14,15]. Finally, cell line and murine models have demonstrated many different genes as being important for breast tumor metastasis, including Twist , Snail , and CXCL12 . In this paper, we compare primary breast tumors, regional metastases, and distant metastases with each other and show that distant metastasis samples are distinct and provide unique signatures that predict poor outcomes in primary tumors.
One hundred and forty-six patients represented by 161 breast tumor specimens (with 23 paired tumor samples) and 10 normal breast samples (195 total microarrays) were profiled. Most of these samples appeared in previous publications [18-20], with 39 being new to this study, and all of which were collected using institutional review board-approved protocols. The clinical information for all samples is in the table in Additional file 1. Included within the 161 profiled tumors were 134 primary tumors, nine regional metastases and 18 distant metastases. Patients were heterogeneously treated in accordance with the standard of care dictated by their disease stage, estrogen receptor (ER) and HER2 status.
Total RNA isolation and microarray protocols are described in Hu et al . Each sample was assayed versus a common reference sample . The microarrays used were Agilent Human oligonucleotide microarrays that were scanned on an Axon GenePix 4000B, analyzed with GenePix Pro 4.1, and Lowess normalized. All microarray data have been deposited into the GEO under the accession number of GSE3521.
The background-subtracted, Lowess-normalized log2 ratio of Cy5 over Cy3 intensity values were filtered to select genes that had a signal intensity of > 30 units in both the Cy5 and Cy3 channels. Only genes that met these criteria in at least 70% of the 195 microarrays were included for subsequent analysis. Next, each patient was classified according to the following metastasis scoring system (MetScore): MetScore = 1 were patients that had a primary tumor and were clinically node negative (N = 0) and distant metastasis negative (M = 0); MetScore = 2 were patients that had a regional metastasis (N = 1–3) and no distant metastasis (M = 0); MetScore = 3 were patients with confirmed distant disease at the time of diagnosis (M = 1 and any N) or that were represented by an actual distant metastasis sample. We next performed a multi-class significance analysis of microarrays (SAM) using a single sample from each patient, biasing the sample selection to use the actual regional or distant metastasis samples (146 arrays, see Additional file 1). We identified the gene set that was associated with the MetScore 1-2-3 distinction, which gave 1195 genes at a false discovery rate of 5%. This gene set was next used in a one-way average linkage hierarchical cluster using the program 'Cluster' , with the data being displayed relative to the median expression for each gene using 'Java Treeview' .
Relationships between the gene expression data and the MetScore classification was further examined using a 10-fold cross-validation (CV) analysis to identify a set of genes that might distinguish a MetScore group from the others. 10-fold CV using five different statistical predictors including PAM , a k-nearest neighbor classifier with either Euclidean distance or one-minus-Spearman-correlation as the distance function, and a class nearest centroid metric with either Euclidean distance or one-minus-Spearman-correlation as the distance function, were used as described in Chung et al . We performed 10-fold CV using the five different statistical predictors with the reported CV prediction accuracies being the average of the five predictors (Tables (Tables1,1, ,2,2, ,33 and and44).
For the VEGF profile, an average expression value across all 13 genes (RRAGD, FABP5, UCHL1, GAL, PLOD, DDIT4, VEGF, ADM, ANGPTL4, NDRG1, NP, SLC16A3 and C14ORF58) was determined and the patients were placed into a three-group classification based their 13-gene average log2 expression ratio from the University of North Carolina (UNC) training data set and using the cut off values (-0.63/0.08) that were identified using X-tile  and relapse-free survival as the endpoint. Analyses using the VEGF profile and the training set cutoffs were also applied to an independent test set of 295 patients assayed on Agilent microarrays (that is, NKI295 ), to a set of lung carcinoma samples from Bhattacharjee et al , and to the glioblastoma sample set from Nutt et al . To perform these across-data set analyses, for the NKI295 dataset we used the log ratio of red channel intensity versus green channel intensity and the data was median centered for every gene across the 295 arrays. The Netherlands Cancer Institute (NKI) dataset was then distant weight discrimination (DWD) normalized  with the UNC training dataset after collapsing by NCBI Entrez GeneID; after DWD normalization, the NKI data was also column standardized. For the Affymetrix datasets the probe level intensity .CEL files were processed by robust multi-chip average. The probe sets' log intensity was median centered for every gene across all the arrays. The Affymetrix datasets were also DWD normalized relative to the UNC training data after collapsing by NCBI Entrez GeneID, and were column standardized.
First, each sample was assigned an 'intrinsic subtype' as described in Hu et al , where a centroid was created for each of the following intrinsic subtypes: Basal-like, Luminal A, Luminal B, HER2-enriched and Normal-like. Next, we tested for associations between a tumor's intrinsic subtype, the VEGF profile and other published expression profiles implicated in metastasis biology that included a) the 70-gene outcome predictor developed by van't Veer et al [10,11], b) the 'wound-response' profile , c) the hypoxia-induced cell line signature , d) the 11-gene BMI/stem cell signature , e) a bone metastasis signature , f) a lung metastasis signature , and g) the expression profiles of HIF1α, Snail  and Twist ; we extracted as many genes as was possible from our microarrays for each predictor and followed the classification scheme described by the authors. For the bone metastasis signature , we created an average value for each patient using the 43 genes that were highly expressed in the cell line derivatives that metastasized to the bone; we performed a similar analysis for the lung metastasis signature .
Lastly, for the 11-gene stem cell signature , we created an average value across all 11 genes. We also created a 'glycolysis-profile' by starting with the nine glycolysis genes/probes present on the array, then filtering for probes that showed > 30 intensity units in both channels, and then selecting for 70% good data across all samples; next we selected the subset of glycolysis gene probes that passed filtering and showed a Pearson correlation of greater than 0.4, which resulted in the selection of six out of nine glycolysis genes (GPI, PKM2, PFKP, PGK1, GAPD, ENO1), which were then used to create an average profile for each patient.
We examined correlations between profiles using multiple methods (Additional file 1): for quantized profile testing, Chi-squared analysis and Fischer's exact test were used. For continuous variable testing, ANOVA analyses were performed. Finally, we also performed a calculation of the Cramer's V statistic for the evaluation of the strength of association between two quantized variables (see Oh et al ).
Univariate Kaplan-Meier analysis was performed with a log-rank test using WinSTAT for excel. Multivariate analysis of the NKI295 test set using Cox proportional hazards modeling was conducted in SAS version 9.1; a Cox hazard model was tested that included estrogen receptor status (coded as positive vs. negative), tumor size (coded as ≤ 2 cm vs. > 2 cm), lymph node status (codes as 0, 1–3, > 3 positive nodes or M = 1), age (continuous variable, formatted in decades), grade (coded as grade 1 vs. 2, and grade 1 vs. 3), and treatment (coded as yes if treatment with chemo and/or hormonal therapy, no if no adjuvant therapy was given), and the VEGF profile of low, intermediate or high as a single categorical variable. Another Cox model was also tested that included all the clinical variables, the VEGF profile, and other expression predictors [11,13,18,19,28,35].
In situ hybridization (ISH) on tissue microarrays containing 250 different human breast tumors (not related to the 146 patients used for microarray analysis) was performed as previously described . Briefly, digoxigenin (DIG)-labeled sense and anti-sense RNA probes are generated by PCR amplification of approximately 450 bp products with the T7 promoter incorporated into the primers; the primer sequences used for amplification were VEGF (Forward-tctccctgatcggtgacagt, Reverse-tcgaaaaactgcactagagacaa), ANGPTL4 (Forward: gggaatcttctggaagacctg, Reverse-tacacacaacagcaccagca) and ADM (Forward-gtgtttgccaggcttaagga, Reverse-tcggtgtttccttcttccac). In vitro transcription was performed with a DIG RNA-labeling kit and T7 polymerase according to the manufacturer's protocol (Roche Diagnostics, Indianapolis, IN). Immunohistochemistry (IHC) was performed for HIF1α using Mouse Anti-Human HIF1α (BD Biosciences #610958) according to the protocol from Vleugel et al ; the tumors were scored for perinecrotic and diffuse staining as described in Vleugel et al.
To identify gene expression patterns associated with breast cancer metastases, we performed 195 microarrays representing 134 primary tumors, nine regional metastases and 18 distant metastasis specimens (146 different patients and 10 normal breast tissues). Each patient was classified according to a MetScore, which is roughly analogous to stage except that tumor size was not considered (see Methods). As expected, this scoring system was highly predictive of patient outcomes (Figure (Figure1A1A and and1B).1B). Using the MetScore classifications, we performed CV analyses to determine if any MetScore group might be distinct relative to the others. Low accuracy rates (56% to 65%) for the prediction of MetScore 1 vs. MetScore 2 specimens were observed; however, when MetScore 1 vs. MetScore 3 (80% to 85%) or MetScore 2 vs. MetScore 3 samples (81% to 83%) were compared, high accuracy rates were obtained, which suggests that MetScore 3 was the most distinct group.
Next, we performed a multi-class significance analysis of microarray  analysis using a single sample from each of the 146 patients and the MetScore 1-2-3 grouping and obtained a 1195 genes at a 5% false discovery rate. This gene set was then used in a one-way average linkage hierarchical clustering analysis (Figure (Figure22 and Additional file 2) where the samples were first ordered according to MetScore, and then according to their correlation to the average profile (that is, centroid) of the MetScore 3 class. This analysis demonstrates that some MetScore 1 and 2 samples actually have a MetScore 3 profile; a similar result has been shown before by Ramaswamy et al .
The gene expression patterns from this SAM analysis were complex and there were few, if any, that directly correlated with a simple progression from MetScore 1 to 2 to 3. Included within this gene set were many clusters and/or gene sets that have been identified previously, including a luminal/ER+ pattern [11,39,40] and a proliferation signature [41,42], both of which are integral parts of a gene expression assay that predicts the likelihood of recurrence in ER+ and patients treated with tamoxifen . In addition, many other biologically important gene sets were identified, including an 'immediate early' gene cluster containing c-FOS and JUNB (Figure (Figure2A)2A) , a set of fibroblast genes containing PLAU, THSB2 and multiple collagen genes (Figure (Figure2B),2B), a set of immune cell genes (Figure (Figure2D),2D), and a gene set containing CXCL12 (Figure (Figure2C);2C); CXCL12 was the top-ranked gene from this SAM analysis and was recently identified as a chemokine whose high expression promotes tumor cell proliferation, migration and invasion . Analysis of these individual clusters by EASE , with both EASE score and Bonferroni < 0.05 used as the cut off, identified many significant gene ontology categories that included 'transcription regulation' and 'DNA/nucleic acid binding' for the FOS-JUN cluster, while the fibroblast cluster was over-represented for 'extracellular matrix', 'cell adhesion and communication', 'organogenesis', 'development', and 'regulation of protease activity'. The CXCL12 cluster was over-represented for 'cell adhesion', 'cell migration' and 'extracellular matrix'. Lastly, a small 13-gene cluster containing VEGF, Adrenomedulin (ADM) and Angiopoietin-like 4 (ANGPTL4) was identified as the 'VEGF-profile' (Figure (Figure2E),2E), which is discussed below in greater detail.
Our previous work identified five 'intrinsic' subtypes of breast cancer that are of prognostic and predictive value [18,41,45]. Subtype classification of the tumors using the centroid predictor from Hu et al  showed significant outcome predictions (Figure (Figure1C1C and and1D).1D). A Chi-squared test (p = 0.0006) showed that intrinsic subtype was significantly correlated with MetScore, with the Basal-like and HER2-enriched groups being the most frequent in MetScore 3 and with no Luminal A samples being in the MetScore 3 group. Correlations between tumor subtype and stage have been described [46,47], and were recapitulated here.
A small cluster of genes containing VEGF was identified (Figure (Figure2E)2E) that showed high expression in MetScore 3 tumors. This gene cluster contained several secreted proteins that have been implicated in endothelial cell (VEGF and ANGPTL4), lymphatic cell (ADM) and smooth muscle cell (GAL) dynamics. As a step in evaluating this profile, we performed ISH to determine what cell type was producing VEGF, ANGPTL4 and ADM. In the vast majority of cases that showed strong ISH positivity (which totaled approximately 10% of the 250 tumors tested), it was the tumor cells themselves that produced the mRNA for these three genes, and typically all three were produced (Figure (Figure3).3). In a few cases, both tumor and fibroblasts showed ISH positivity, but this was rare.
As a second step in the evaluation of the VEGF profile, we created an average expression ratio of the 13 genes for each patient and looked for correlations with outcome. By dividing the patients into low, intermediate and high-expression groups using relapse-free survival (RFS) and cutoffs determined by X-tile , we saw that the VEGF profile was prognostic of RFS (Figure (Figure4A)4A) and overall survival (data not shown) with the high expression portending a poor outcome. Rank order expression classifications (two or three groups) were also robust methods of predicting outcomes (Additional file 3). Applying the VEGF profile classification rules to an independent test set of 295 patients (that is, NKI295) [10,28] also significantly predicted outcomes (Figure (Figure4B),4B), as did rank order classifications (Additional file 3). This classification rule was also of prognostic value on a set of lung carcinoma samples (Figure (Figure4C4C and Additional file 3E and 3F), although there were too few 'low' samples to be included into the Kaplan-Meier plot analysis, and on a set of patients with glioblastoma (Figure (Figure4D4D and Additional file 3G and 3H); we noted that two genes (ANGPTL4 and C14ORF58) were not found on the Affymetrix platform for lung and glioblastoma test data sets. However, the Pearson correlation is 0.992 (UNC training dataset) and 0.986 (NKI295 test dataset) respectively between the average of the 13 genes and that of the 11 genes (omitting ANGPTL4 and C14ORF58). We repeated the survival analysis for the UNC dataset and NKI test set again using the 11 genes and the results were very similar (data not shown).
A multivariate Cox proportional hazards analysis on the NKI295 test set using RFS was performed using clinical variables and the VEGF profile, and it was determined that the VEGF profile was a significant predictor of RFS (Table (Table1).1). In Fan et al , we evaluated the prognostic powers and concordance across multiple expression predictors including the intrinsic subtypes, the NKI 70 gene signature, a microarray-based version of the Genomic Health Inc. Recurrence Score, and the wound-response profile using this same NKI patient data set, and we have also identified other profiles of prognostic significance including an estrogen pathway  and p53 mutation profiles ; therefore, we performed a Cox proportional hazards analysis (Table (Table2)2) with backwards variable selection (Table (Table3)3) to evaluate a model that contained all of the aforementioned gene expression predictors and clinical variables. The final model contained both clinical parameters and multiple gene expression predictors including the VEGF profile (Table (Table3).3). Similar results were obtained when using time to distant metastasis formation, or overall survival (data not shown), or when treating the VEGF profile as a continuous variable (Table (Table44).
A biological implication of the VEGF profile is that it is related to a tumor's response to hypoxic conditions, which historically has been referred to as the Warburg effect [49,50]. A central tenant of the Warburg effect is that a tumor's metabolism becomes more dependent upon glycolysis due to anaerobic conditions. To examine glycolysis using a genomic approach, we created a 'glycolysis-profile' using the six most highly correlated glycolytic enzyme probes (GPI, PKM2, PFKP, PGK1, GAPD, ENO1, Figure Figure5A);5A); the VEGF profile and the six best glycolysis probes were highly correlated (p < 0.001, Table Table55).
HIF1α is a known regulator of VEGF expression, and therefore we determined that HIF1α mRNA gene expression was correlated with the VEGF profile (p = 0.0004; Table Table5);5); in addition, 'perinecrotic' HIF1α IHC staining as defined by Vleugel et al  was also assayed on a subset of 66 of these tumors and was correlated with expression of the VEGF profile (ANOVA p-value = 0.018, data not shown), while a 'diffuse' HIF1α IHC profile was not. Next, the promoter region of each of the genes in the VEGF profile was examined using the program rVISTA  and showed that DDIT4, VEGF, NDRG1, SLC16A3, PLOD, ADM, ANGPTL4 and C14ORF58 all had hypoxia response elements within 2000 bp upstream of their start codons; it is already known that many of these genes including VEGF , ADM , and DDIT4  are HIF1α-regulated. Nearly identical genomic results were also obtained from the NKI295 test set (Figure (Figure5B5B).
A fibroblast/mesenchymal signature was another profile that changed with MetScore (Figure (Figure2B),2B), and thus to examine the potential fibroblast cell content present within each MetScore group we determined each patient's average expression value of the genes contained with the cluster presented in Figure Figure2B.2B. This gene set contains fibrillin, fibroblast activation protein alpha, six collagen protein subunits and versican, which are genes and/or proteins that are typically produced by fibroblast and/or mesenchymal cells . This analysis shows that the fibroblast profile is correlated with intrinsic subtype (Table (Table5,5, p = 0.012) and that the MetScore 3 samples had the lowest average expression compared with the MetScore 1 and 2 samples (ANOVA p-value = 0.005, data not shown). Pathological examination of H&E sections of the distant metastasis samples also supports this conclusion and shows scant admixed mesenchymal cells in the distant metastasis samples versus their primaries that show abundant admixed mesenchymal cells (Figure (Figure66).
We examined whether the intrinsic subtypes, the MetScore classification, and the VEGF signature correlated with any of the following expression profiles that have been associated with metastatic potential: a) the NKI 70-gene predictor [10,11], b) the 'wound-response' profile , c) a cell line-derived hypoxia profile , d) an 11-gene BMI/stem cell signature , e) a bone metastasis signature , f) a lung metastasis signature , g) a hypoxia metagene , and h) the expression profile of three individual genes (HIF1α, Snail , and Twist ). These analyses identified a large amount of concordance across profiles (Table (Table5).5). For example, the breast tumor subtype was significantly correlated with the bone and lung profiles, Snail expression, and the 11-gene stem cell signature; in particular, the bone and lung profiles were associated with both ER-negative subtypes (Basal-like and HER2-enriched), and Snail expression and the 11-gene stem cell signature were the highest within the Basal-like subtype. Similar results were also observed when the VEGF profile was compared with the other profiles. Two 'hypoxia signatures' have been described and shown to be of prognostic value across a variety of tumor types including breast [33,56]; the large signature of Chi et al  showed a four-gene overlap with the VEGF profile (ADM, NDRG1, DDIT4 and ANGPLT4) while the 'hypoxia-metagene' of Winter et al  showed a three-gene overlap (VEGF, NDRG1 and ANGPLT4); as might be expected, all three of these profiles were correlated (Table (Table5,5, p < 0.0001).
We took a genomics approach to study metastasis biology and classified patients with breast cancer according to the presence and location of their metastases (that is, MetScore). The resulting analyses showed that the most distinct group with the most distinguishing features were the distant metastases; few differences were seen between primary tumors and regional metastases, as has been shown before . When the set of genes that were correlated with MetScore was determined, many previously known gene sets were identified including proliferation , ER status [11,39,40], and fibroblast and/or mesenchymal genes [55,59]. Notable distant metastasis features included the low expression of fibroblast genes (and a corresponding paucity of fibroblasts as defined by histological examination) and the high expression of the VEGF profile. The VEGF profile represents a in vivo defined gene expression program that includes a combination of cell-intrinsic and cell-extrinsic factors. The cell-extrinsic factors have known roles as inducers of endothelial cell growth (VEGF and ANGPTL4), inducers of lymphatic vessel growth (ADM) , and smooth muscle cell dynamics (GAL); thus, the expression of this gene set would appear to increase the likelihood of tumor survival by causing de novo vessel formation and providing a dual conduit for metastatic spread. The cell-intrinsic factors include the high expression of SLC16A3, whose function is to efflux the lactic acid out of the cell that occurs during high glycolytic activity, and the expression of NDRG1, which is a known hypoxia-inducible gene [61,62]. In addition, the tumors that highly express the VEGF profile also highly express glycolytic enzymes. In total, our data suggests poor-outcome distant metastasis samples have the intrinsic ability to promote vessel formation, the intrinsic ability to live under anaerobic conditions, and have lost dependence upon fibroblasts.
Many genomic profiles for breast tumor metastasis biology have been identified, and we therefore compared them with each other and determined that significant correlations exist. In particular, all metastasis profiles tested correlated with 'intrinsic subtype'. For example, the Basal-like subtype showed significant correlation with the 11-gene stem cell profile, the lung and the bone metastasis profiles (consistent with these observations, one of the MetScore 3 Basal-like patients had distant metastases present in the bone, lung and liver). The Basal-like subtype also showed high expression of Snail, and Basal-like tumors have been shown to have other features of epithelial-mesenchymal transition  including vimentin expression .
The VEGF profile showed very significant prognostic value when using primary tumors, even when tested in models that contained many other expression predictors and clinical variables. We also believe it possible that the VEGF profile may have predictive value for angiogenesis inhibitors because it contains VEGF and ANGPTL4, which are inducers of angiogenesis. How, or if, the VEGF profile is correlated with response to angiogenesis inhibitors remains to be determined; however, our profile does suggests that effective anti-angiogenesis therapies for patients who express this profile may need to extend beyond VEGF to include the simultaneous targeting of ANGPTL4 and/or ADM.
CV: Cross-Validation; DIG: Digoxigenin; DWD: Distant Weight Discrimination; ER: Estrogen Receptor; IHC: Immunohistochemistry; ISH: In-Situ Hybridization; MetScore: Metastasis Scoring System; NKI: Netherlands Cancer Institute; RFS: Relapse-Free Survival; SAM: Significance Analysis of Microarrays; UNC: University of North Carolina; VEGF: Vascular Endothelial Growth Factor.
CMP and ZH have filed a patent application for the use of the VEGF profile for breast cancer prognosis.
ZH, CF, XH, CL, DSO, SS, RW, FI and CMP made substantial contributions to the concepts, acquisition and analysis of the data. ZH, DSO, MGE, LAC, OIO, MVDR and CMP contributed significantly to the drafting of the manuscript and its intellectual content. All authors have read and approved the final manuscript.
The pre-publication history for this paper can be accessed here:
Supplementary table. Summary of GEO submission of 202 microarrays used in this paper and clinical data of the patients in this study.
Figure S1. The complete cluster diagram of all 146 patients using the 1195 gene. MetScore-associated gene list.
Figure S2. Univariate Kaplan-Meier survival plots for patients stratified using the VEGF profile based upon rank order expression on the A-B) UNC training data set, C-D) NKI test data set, E-F) Bhattacharjee et al  lung carcinoma data set, and G-H) Nutt et al  glioblastoma data set.
We thank J. Chuck Harrell for reviewing this manuscript. This work was supported by funds for CMP from the NCI Breast SPORE program to UNC-CH (P50-CA58223-09A1), by NCI (RO1-CA-101227-01), by the Breast Cancer Research Foundation and by the V Foundation for Cancer Research. LAC was supported in part by the National Institutes of Health (M01RR00046), and OIO was supported by a Disparities Center Project Grant NIEHS P50 ES012382.