The discovery that gene expression profiles could predict breast cancer outcome has initiated widespread use of the technology for the development of expression profiles to improve individualized medicine for patients. It also reignited a debate in the literature as to the molecular origins of metastatic capacity (12
). The prevailing theory of metastasis, the somatic evolution theory, predicted that only a small subset of tumor cells within the bulk tumor mass would acquire all of the capabilities required to successfully colonize a distant site. The ability of bulk tumor tissue to predict outcome however, suggested that on average the majority of primary tumor cells had to express the molecular signature of metastasis, which appeared potentially incompatible with the somatic evolution hypothesis. As a result some investigators offered a new hypothesis suggesting that metastatic potential might be encoded early within the tumor, potentially by the original transforming mutations themselves (12
). Simultaneously, work in our laboratory demonstrated that the propensity to metastasize was at least in part due to inherited susceptibility (16
). This led to an additional hypothesis that enabled the reconciliation of the data supporting both somatic evolution and early oncogenesis models. If a significant fraction of the prognostic gene signatures were encoded by inherited germline polymorphism, rather than somatic mutation, then the predictive gene signatures would be present throughout the tumor and metastasis-inducing somatic evolution could subsequently occur in susceptible individuals resulting in disseminating disease(37
This hypothesis makes several predictions. The most important is that if the predictive gene signatures are due in part to inherited polymorphism, it would suggest that the signatures should be detectable in normal, preneoplastic tissue in susceptible individuals. The aim of this study was therefore to test this hypothesis and to evaluate the ability to translate the results of our mouse genetic model system of breast cancer progression to human clinical samples. To do so we performed a series of gene expression array analyses to ask the following questions: 1) do gene expression profiles from mouse models of inherited metastasis susceptibility predict outcome in human breast cancer; 2) what are the cellular origin(s) of prognostic gene expression signatures; 3) does germline variation contribute to the induction of prognostic expression patterns in human breast cancer; and 4) if there is indeed an inherited component to such signatures, what are the relative contributions of somatic and inherited factors in the establishment of the predictive expression profiles?
The strategy we employed was to examine spontaneous tumors, transplant tumors and normal tissues in mouse strains with different genetic susceptibility to metastatic progression for the presence of gene signatures that were able to discriminate outcomes in human breast cancer datasets. Our previous studies suggested that like mice, humans also exhibit an inherited genetic susceptibility to metastasis (14
). This in turn implied that the prognostic gene expression profiles observed in human breast cancer datasets might be at least partially the result of inherited factors (14
). In the current study, we provide further support for the hypothesis that metastasis susceptibility is a complex heritable trait. More significantly, we provide evidence supporting our hypothesis that metastasis-predictive microarray gene expression signatures, which are currently being evaluated as potential prognostic tools in the clinical setting, may be partially driven by host germline polymorphism.
To investigate this, we performed microarray analysis to derive a gene expression signature indicative of the differences in gene expression between primary spontaneous mammary tumors from mice with a 20-fold difference in metastatic propensity (17
). The resulting gene expression signature accurately predicted outcome in four of the five human breast cancer datasets examined. Additionally, non-neoplastic tissues from five other organs involved in the process of tumorigenesis were analyzed to investigate the relative cellular contributions to signatures derived from complex, bulk human tumors. Whole blood, spleen and thymus were chosen to investigate the contribution of hematologically-derived cells present within the primary tumor mass. Additionally, we characterized gene expression patterns in bone marrow since these cells have recently been demonstrated to promote metastasis in both the primary tumor (38
) and secondary site (40
). Finally, lungs were selected for gene expression analysis since the majority of metastatic lesions in this model system form at this site.
Several important conclusions can be drawn from these experiments. First, as predicted by the genetic predisposition hypothesis, metastasis-predictive gene expression signatures could be derived from a variety of normal, non-neoplastic tissues. Specifically, normal lung, spleen and thymus derived from mice of differing metastatic propensities exhibited gene expression signatures that could predict outcome in breast cancer. No consistent predictive signal was observed for the circulating whole blood or bone marrow, supporting the conclusion that the contribution of these tissues to metastatic phenotype, while potentially critical to the clinical phenotype, may not contribute a large fraction of the expression patterns of most bulk primary tumors. The ability of the lung, spleen and thymus to distinguish patient outcomes suggests that both basal epithelial and lymphocyte signals may comprise the majority of the signal observed in bulk tumor tissue.
The cellular origins of the inherited components of the predictive gene signatures were further investigated using a transplant strategy. Previously published analyses and earlier work in our laboratory demonstrated that genes associated with stromal tissues and the immune compartments are frequently dysregulated in tumors more prone to metastasizing (10
). We therefore sought to investigate the relative contribution of these tissues to signatures by removing a major source of genetic heterogeneity: the tumor epithelium. This was achieved by implanting a malignant highly metastatic mouse mammary tumor cell line into the mammary fat pad of mice with differing metastasis susceptibilities. The resulting primary tumors were therefore composed of identical tumor epithelium, but contained different infiltrating host components from the two mouse genotypes. Thus, any gene expression differences between tumors from different hosts would result directly from host tissue germline polymorphism and/or the reaction of tumor cells to the differing microenvironments.
Based on the presence of numerous host-derived, non-epithelial transcripts in the prognostic signatures, we anticipated that both the spontaneous and transplant tumors would be able to discriminate patient outcome. Indeed, we did observe that this was the case. However, no difference was observed in the metastatic capacity of this tumor cell line in spite of the previously observed twenty-fold difference in metastatic susceptibility of the host genotypes (17
). The one possible explanation for this lies in the highly malignant properties of the Mvt-1 cell line. It may be that the influence that host germline polymorphism exerts upon the tumor epithelium is too subtle to be detected by in vivo
orthotopic transplantation assays using a cell line selected for high malignant potential (23
). Microarray analysis is, however, a very sensitive means of detecting changes in gene expression. Therefore, the observed prognostic gene expression signature in the Mvt-1 implant tumors likely reflects the subtle changes in gene expression resulting from interaction with the different hosts. Alternatively, it is possible that the effect of inherited polymorphisms on metastatic capacity is a tumor autonomous effect and the prognostic gene expression profile from the transplant tumors is due entirely from the infiltrating host tissues. Thus, although the prognostic signature is apparent in the bulk tumor, the presence of the same highly malignant cell line in both hosts results in equivalent metastatic capacity. Additional work will be necessary to resolve these two scenarios.
Significant variation in the number of significant probe sets and the discriminatory ability of the tissue signatures was also observed across the human datasets. We believe that this reflects the underlying heterogeneity of the human populations represented in each dataset, which are comprised of mixtures of different molecular subtypes and stages. Previously bioinformatic investigation into gene expression signatures demonstrated that subsets of predictive genes would be identified based on the particular subset of patients analyzed (43
). As a result, the different sets of patients included within each dataset, as well as different experimental variation introduced during array analysis, would be expected to generate different significant subsets of each tissue signature. Despite these fluctuations, all of these large datasets in the analysis to increase the probability that any results that were observed was due to a general phenomenon, rather than a dataset specific effect, or due to false-positives from analyzing only one of a limited number of datasets.
In addition, differences in the clinical characteristics of each patient set may also contribute significantly to the probe set selection and discriminatory ability of each dataset. The dataset from Wang et al. (GSE2034)(1
), for example, consists of only untreated lymph node-negative patients, while the other datasets contain a mixture of node-positive, node-negative and adjuvant therapy treated patients. The GSE2034 dataset therefore represents the natural progression of node-negative breast cancer since there is no confound due to adjuvant therapy to account for. The Rosetta dataset, in contrast, was designed to develop a discriminatory assay for younger patients (10
). The differences observed for the prognostic ability of our samples between the datasets may therefore be potentially explained by these confounding variables. Of note, however, is the fact that the lung expression profile had prognostic value in all of the datasets, regardless of these confounding clinical differences. Since GSE2034 represents the natural progression of node-negative patients this results supports our hypothesis that germline encoded transcriptional differences may in fact account for some measurable fraction of the prognostic gene signatures.
Finally, investigations over the past few years into the factors underlying the metastasis predictive expression profiles have suggested that all of the prognostic gene signatures may be sampling the same underlying network (32
), most commonly thought to be cell cycle and proliferation (33
). The data presented here are consistent with these being important biological functions associated with progression. The signature profile derived from the spontaneous PyMT-induced tumors from (AKR × PyMT)F1 and (DBA × PyMT)F1 mice was capable of discriminating outcome in four of the five human datasets, and was trending toward significance in the GSE2034 dataset (). Removal of potential differences in proliferative capacity of the tumor epithelium resulting from constitutional polymorphism by implanting the same cell line into non-transgenic hosts eliminated any trend in GSE2034 () and somewhat reduced the risk ratio in both GSE3494 and GSE4922 (). Similar results were observed when proliferation associated genes were stripped out of the lung gene expression signature (figure S7
The ability of Mvt-1 transplant and truncated lung signatures to predict outcome in the datasets other than GSE2034, however, raises the possibility that other biological networks may also be predictive of breast cancer outcome. There are several possibilities that would need to be considered. First, these other pathways may not be causative factors predicting outcome. It is possible that the same polymorphic differences that are driving the predictive proliferation-associated gene sets may also be impacting the other networks as a bystander effect. Second, they may be causative factors, but have not been detected as a common mechanism in analysis of the human datasets because of the dominant effect of the cell proliferation pathway and/or effects only in subsets of the human population. Third, it is possible that genes remaining in the Mvt-1 and truncated lung profiles are in fact members of the proliferation network but have not been so annotated either because their functional significance in cell growth is as of yet unrealized, or that the current annotations are incomplete. While it is not possible to definitely distinguish between these possibilities at this time, we favor the first two possibilities. Previous studies have demonstrated that expression profiles are an independent predictive factor compared to standard clinical measures, including mitotic index. This suggests that the signatures either are a much more accurate measure of proliferation compared to standard immunohistochemistry, or that they are measuring factors in addition to cellular growth. However, additional studies will be necessary to investigate and definitely address these possibilities.
In summary, these results provide additional evidence for the role of inherited factors in human breast cancer progression. In addition, they suggest that the prognostic gene signatures currently in clinical trial likely result from a complex mixture of somatic and inherited factors present not only in the tumor epithelium, but also infiltrating non-neoplastic cells. Further investigations will hopefully improve our current understanding of the relationship between these various factors not only in the tumor epithelium itself, but also in the infiltrating non-neoplastic tissues, with a goal of improving not only the current prognostic tools but also developing more effective therapeutic strategies for therapeutic intervention.