Although there have been numerous studies of clinical factors addressing the relationship between age at diagnosis and breast cancer prognosis [
12,
14,
16,
53-
55], few studies have comprehensively investigated the age dependency of the many well-established prognostic breast cancer biomarkers, and no studies have used a prospective study design [
13,
18]. Concerned about the established inverse relationship between the ER status and poor-risk biomarker surrogates of breast cancer proliferation and genomic instability [
13,
18], the present study aimed to identify genomic and transcriptome changes associated with aging using DNA and RNA prospectively collected from stage-matched and histology-matched ER-positive breast cancers from younger women (age ≤ 45 years) and older women (age ≥ 70 years), analyzed by array CGH and high-throughput expression microarrays.
Similar bioinformatics-based approaches have been used to characterize aging effects in human fibroblasts [
5,
6], lymphocytes [
5] and myoblasts [
56]; however, comparable efforts to investigate aging influences on human cancer biology have not been reported. Moreover, while ER-positive breast cancers have been well studied as a subgroup within unselected breast cancer phenotypes using array CGH [
28,
57] or expression profiling [
38,
39,
49-
51], the present study represents the largest study reported to date using these powerful techniques to subset ER-positive breast cancers, while employing a statistical design powered to detect age-specific differences.
Array CGH analysis of 71 DNA samples confirmed that our ER-positive breast cancers were composed of two basic genotypes [
28]: a simple subtype characterized by few genomic copy number changes other than gain of 1q and loss of 16q, and a mixed amplifier subtype characterized by recurrent amplifications but otherwise low levels of genomic gains and losses. A third genomic subtype of breast cancer, referred to as complex, known to be almost exclusively composed of ER-negative breast cancers [
28], was not observed in either of the two age cohorts studied. Neither the simple nor the mixed amplifier genomic subtypes of ER-positive breast cancer showed any particular age bias. Direct comparison of the two age cohorts for multiple array CGH parameters also revealed no significant differences in the fraction of genome altered, in whole chromosome changes or in total or site-specific amplicon frequencies. Although nonsignificant trends suggested slightly fewer oncogene amplifications within the older cohort, overall amplification frequencies for the most common oncogenes were as expected for ER-positive breast cancers [
51,
58]:
MYC (27%),
CCND1 (23%),
ZNF217 (17%),
AIB1 (16%),
MDM2 (8%),
ESR1 (7%),
ERBB2 (7%), and
TOPO2A (7%). At the level of genomic resolution (~1 MB) achievable by BAC-based array CGH, there appeared to be few if any genetic differences between ER-positive breast cancers arising in women whose ages differ by more than 25 years. Future studies employing higher density genomic arrays are warranted to confirm this conclusion.
Microarray profiling of 101 RNA samples showed an average 65-fold range in
ESR1 transcript levels across the entire collection of ER-positive breast cancers, with the older cohort showing significantly higher
ESR1 levels as compared with the younger cohort, consistent with earlier biomarker studies [
13]. There was the expected close correlation between the
ESR1 transcript levels and commonly observed
ESR1 coexpressed genes (for example,
GATA3) as well as other genes (for example,
KRT8,
KRT18) that characteristically define luminal-type breast cancer, although this tumor collection also contained several ERBB2-positive cases (10/101) that are not characteristically found in microarray-defined clusters of luminal-type breast cancer [
38,
39,
49-
51]. Hierarchical clustering of the ~5.1 K variably expressed genes also identified six transcriptome subtypes of ER-positive breast cancer with significant age biases (
P < 0.05) but not associated with differing PR status. Based on relapse-free survival analyses of the 54 cases with known clinical outcome (30 younger women, 24 older women), there was a trend supporting a less favorable prognosis for the younger age cases (
P = 0.09) and PR-negative cases (
P = 0.08). The six age-biased transcriptome clusters, however, showed significantly different relapse-free survival outcomes (
P = 0.025, log-rank analysis), suggesting that these transcriptome subtypes represent clinically relevant phenotypes of ER-positive breast cancer. Previous expression array studies analyzing fewer ER-positive cases have identified no more than two or three subsets of luminal-type breast cancer [
38,
39,
49-
51].
Reported gene signatures representing luminal, proliferation and MAPK markers were tested for their enrichment in one or the other of the age-stratified cohorts, and only the proliferation gene signature showed any significant age bias when multiple testing was accounted for, being more highly expressed in the younger cohort. This finding is consistent with earlier studies showing higher tumor grade and proliferation markers (for example, mitotic index and Ki-67 positivity) in younger age breast cancer patients [
13]. While none of the >1,000 curated gene sets in the Molecular Signature Database that were similarly evaluated demonstrated any significant age biases when multiple testing was account for, a trend was observed for enrichment of cell cycle genes in the younger cohort cases. Nine genes common to both the GO biological process cell cycle set and the proliferation signature set (
BUB1,
CCNB1,
CCNE2,
CDC25A,
CDC7,
MAD2L1,
MCM4,
ORC6L,
PTTG1) were also present in our significant probe set. Among these, four genes (
BUB1,
CCNE2,
MAD2L1,
ORC6L) have been previously associated with poor-prognosis ER-positive breast cancers in a well-established 70-gene prognostic signature [
58]; these genes are therefore probably important contributors to the more aggressive tumor characteristics of ER-positive breast cancers arising in younger patients.
Using only the proliferation gene signature to perform unsupervised hierarchical clustering of the 101 cases generated two comparably sized ER-positive subsets, one with higher expression and another with lower expression of the proliferation genes; the higher expressing subset contained most of the younger age cases (34/52) and all but one of the ERBB2-positive cases. When this proliferation signature was also used to dichotomize the 54 cases with known clinical outcome, the higher expressing cases showed significantly worse disease-free survival as compared with the lower expressing cases, consistent with reports on the association of a similar proliferation signature with poor outcome in patients with ER-positive breast cancer [
59]. Interestingly, despite a presumed mechanistic link between activation of growth factor receptors, MAPK signaling and cell proliferation, there was minimal overlap between genes in the reported MAPK and proliferation signatures, and no significant association was observed between the MAPK signature, age and ERBB2 positivity.
Despite the observed positive association between the
ESR1 expression level and older age, no age association was seen for the luminal gene signature that included
ESR1,
ESR1-associated genes and estrogen-inducible genes. This finding is consistent with our previous report showing increased breast cancer ER protein with aging without comparably increased levels of such estrogen-inducible markers as PR, pS2, Bcl2 and cathepsin D [
13], and suggesting reduced estrogen signaling in breast tumors of older patients. In keeping with these protein biomarker observations, differential gene expression analysis in the present study did not identify any known estrogen-inducible genes such as
TFF1,
PGR,
IRS1,
IGFBP4,
PCNA,
MYC,
CCNA2 or
DLEU2 as being more highly expressed in the older cohort despite higher expression of
ESR1 in this cohort. In contrast, two estrogen-inducible growth-regulating genes,
GREB1 and
AREG, showed significantly higher expression levels in the younger cohort, in keeping with a recent study demonstrating a negative correlation between these estrogen-inducible genes and age [
20]. As
GREB1 and
AREG are known to induce cell proliferation upon estrogen activation [
60,
61], their increased expression in the younger cohort offers some mechanistic basis for increased proliferative activity and gene expression in the younger cohort.
Of the 75 unique genes differentially expressed between younger and older cohorts, 24 genes showed increased expression in younger cases relative to older cases (including
GREB1 and
AREG) while 51 genes showed increased expression in older cases relative to younger cases (including
ESR1). Comparison with a well-studied estrogen-inducible gene signature set [
20] revealed that ~25% (19/75) of these differentially expressed genes overlapped with known early or late estrogen-responsive genes, and thus potentially reflected hormonal changes associated with menopause rather than aging effects. While two-thirds (13/19) of these potential estrogen-responsive genes showed appropriate directional changes according to cohort menopausal status, supporting this possibility, at least 75% of the differentially expressed genes would appear to be independent of menopausal differences in circulating estrogen levels and, therefore, potentially informative of age-related differences in ER-positive breast cancer biology. A comprehensive database search confirmed that at least 40% of these differentially expressed genes have reported direct links with malignancy; and while none have reported links with premature aging, one of the differentially expressed genes (
KIF2C) has been previously implicated in aging studies of lymphocytes and fibroblasts [
5], while six other genes (
COBLL1,
HPGD,
HOXB2,
PDE4A,
SLC25A12,
TP73L) were recently reported as differentially expressed with age in human skeletal muscle [
62].
A search for annotated enrichment of the differentially expressed genes for specific biological processes (GO Biological Processes, Expression Analysis Systematic Explorer score < 0.05) indicated that 'development' and 'cell cycle/M-phase' were the most overrepresented functional gene categories. In keeping with the GSEA observation indicating a trend for enrichment of cell-cycle-associated genes in the younger cohort cases, differentially expressed cell cycle/M-phase genes (including positive regulators such as
STK6,
FGFR1 and
DLG7) represented 20% (5/25) of all genes overexpressed in the younger cohort but only 8% (4/51) of those overexpressed in the older cohort. In contrast, the older cohort cases showed differentially increased expression of negative cell cycle regulators (such as
SASHI and
RHOB) and four developmentally essential homeobox genes (
HOXB2,
HOXB5,
HOXB6,
HOXB7), the latter finding also in keeping with the GSEA observed trend showing enrichment in the older cohort of HOX-regulated (NUP90-HOXA9 repressed) genes. Two of the overexpressed HOXB genes (
HOXB6,
HOXB7) have been specifically linked to mammary gland development and are known to be expressed in ER-positive breast cancer cells [
63].
HOXB7, in particular, known to be dependent on stromal (extracellular matrix) signaling, is transcriptionally upregulated in breast cancers metastatic to bone (relative to primary tumors), and is thought to play a role in promoting angiogenesis, growth factor-independent proliferation and DNA double-strand break repair, conferring breast cancer resistance to the genome destabilizing effects of DNA damage [
64].
PAM was used to derive an age signature that consisted of 128 unique genes, including 44 of the 75 differentially expressed genes determined by our conditional permutation approach. The age signature was independently validated against two other age-matched ER-positive breast cancer microarray datasets and proved to have >80% accuracy in distinguishing younger from older ER-positive breast cancer cases. ESR1 and AREG were among the genes in common between the age signature and the differentially expressed gene sets; it is therefore not surprising that the age-signature-defined subsets from the two independent databases showed similar differences in the mean expression levels of these two genes as found in our age-defined cohorts. Only 28% of the age signature genes overlap with known early or late estrogen-responsive genes, suggesting that this age signature largely reflects age-related differences in the phenotype of ER-positive breast cancer rather than differences in circulating estrogen levels associated with menopausal status.
The fact that a PAM-derived PR signature did not perform well upon validation implies substantial heterogeneity between ER-positive breast cancers with the same PR status, and possibly indicates that confounding age-related gene expression changes are of greater biological importance than PR-related gene expression differences. Misclassification errors using the age signature were more prevalent among the older cohort cases, also suggesting greater variation in expression of the age signature genes with aging. Of further interest, the 128 age signature gene set was unable to accurately subset ER-negative cases identified from the two independent breast cancer datasets [
47,
48], consistent with expression-array-based conclusions that the biology of ER-positive and ER-negative breast cancers are fundamentally distinct, and supporting the likelihood that the PAM-derived age signature incorporates biological profiles specific to ER-positive breast cancers but not ER-negative breast cancers.