Three pathways and one signaling cascade are highlighted in this study. The top ranked pathway in this study, ‘syndecan-1 signaling’, contains 13 genes involved in different cellular processes that are mediated by syndecan-1 [SDC1
]. This gene encodes a transmembrane heparan sulfate proteoglycan which mediates signal transduction cascades leading to cell proliferation, cell migration and cell adhesion processes following interactions with extracellular matrix proteins. There are multiple lines of evidence for a potential role of syndecan-1 in breast cancer development. For example, altered syndecan-1 expression has been detected in several different tumor types and has been linked with unfavourable breast cancer prognosis (34
). Additionally, expression of syndecan-1 by stromal fibroblasts, has been shown to promote breast carcinoma growth in vivo and stimulates tumor angiogenesis (35
). In this GWAS, SDC1
was only moderately associated with breast cancer susceptibility (Ptrend
= 0.019 for rs7563245) however, it is a key mediator of different pathways illuminated in this study (e.g. ‘c-Met Signaling’ and ‘Fibroblast Growth Factor Signaling’; see discussion below). Therefore, it can modulate breast cancer susceptibility through different biological mechanisms.
The second highest ranked pathway in our analysis, ‘c-Met signaling’ consists of 33 genes participating in signal transduction mechanisms induced by the tyrosine-kinase proto-oncogene c-Met [MET
]. Stimulation of the c-Met pathway can lead to several different cellular processes related to tumor growth and progression such as proliferation, enhanced cell motility, invasion, and apoptosis (36
). Moreover, both c-Met and its ligand, hepatocyte growth factor (HGF), have been shown to be dysregulated and correlated with poor prognosis in a number of human malignancies including breast cancer (37
). Consequently, this pathway has served as an important therapeutic target for human cancers, particularly through the development of small-molecule that inhibit the c-Met/HGF-dependent signaling (37
). Notably, co-expression of SDC1
, the key players in the top two ranked pathways in our study, have been established as a marker signature associated with poor prognosic factors in ductal carcinoma in situ (DCIS) of the breast (38
The third ranked pathway in our study, ‘GH Signaling’, contains 22 genes participating in cellular mechanisms induced by either growth hormone or insulin receptors. These two receptors as well as the insulin-like growth factor (IGF) receptor are all transmembrane tyrosine kinase receptors inducing cell growth and proliferation. Alteration in the activity of these receptors or their related pathways may lead to hyperplasia and eventually to the development of a tumor (39
). Naturally, the ‘GH signaling’ pathway, had considerable overlap with both the ‘IGF-1 signaling’ and ‘insulin signaling’ pathways from BioCarta (32% and 41% respectively), however only the latter had a significant enrichment score in our study (PES
= 0.0064). Examining the genes of these three closely related pathways revealed that what differentiates their respective enrichment scores is a SNP in the insulin receptor gene [INSR
] with a small p-value (Ptrend
= 0.0019 for rs12460755) that is absent from the ‘IGF-1 signaling pathway’.
An interesting pathway related to syndecan-1 signaling is the ‘fibroblast growth factor (FGF) signaling’. This pathway was ranked 10th
in our study (; PES
= 0.0219) in spite of the exclusion of the FGFR2
SNPs from our analyses (see Methods). Adding the FGFR2
SNPs to this pathway improved its enrichment signal (PES
= 0.0053) that ranked it 4th
out of the 421 pathways. This finding in combination with the FGFR2
signal in CEGEMS (18
) and elsewhere (17
) suggests that variations in other genes involved in FGFR2
signaling may modulate breast cancer susceptibility. An important extension to the gene-set enrichment analysis the clustering analysis that highlighted the RAS/RAF/MAPK canonical signaling cascade as the common denominator of pathways associated with breast cancer risk in this study. This cascade plays an essential role in transmitting extracellular signals from growth factors to promote the growth, proliferation, differentiation and survival of cells, and modification in its activity has been linked to multiple human malignancies (40
). Notably, this cascade plays an important role in all three top pathways in this study. It is an integral component in the ‘Signaling of Hepatocyte Growth Factor Receptor’, and ‘Growth Hormone Signaling’ pathways, and a transducer for many of the signals initiated by the Syndecan-1 pathway.
An important limitation of the pathway-based approach for GWAS analysis is the incomplete annotation of the human genome. At present, the function of many human genes is unknown and therefore these genes cannot be assigned to known pathways. Moreover, susceptible loci in intergenic regions are also not included in a study of this kind. As a result, when employing this approach, only a small portion of the human genome variation can be studied and therefore it should only be used as a supplementary method to the standard single-SNP analysis of GWAS. Additionally, there is no gold standard for pathway definition, and different databases have different guidelines for their pathways construction and curation. Consequently, the gene content of pathways representing the same biological process may vary substantially between different databases, and this may have a major impact on the sensitivity and specificity of this approach. We aimed at minimizing this effect by selecting pathways from three commonly used and manually curate resources. Still, considerable differences were observed between similar pathways from different resources. For example, the ‘c-Met signaling’ pathway from BioCarta contained 33 genes and was ranked second in our analysis, while a pathway with the same name and presumably the same cellular function in the PID database included 52 genes but was ranked only 69th. Although 30 of the 33 genes in the BioCarta pathway were also included in the PID pathway, the remaining 22 genes likely attenuated the signal in the PID pathway. These differences emphasize the importance of further synthesizing the results of such pathway-based approach. The clustering analysis we applied in this study is one way to do so, as it helps in finding the common genes underlying enrichment signals in multiple pathways and allows one to focus on a limited number of candidate genes. Other methods aiming to improve the characterization of pathways and their overlap might be useful.
A second limitation of our study is that it does not include validation of the results using a completely independent dataset. The experiment we conducted using a positive-control-pathway with known breast cancer susceptibility genes provides us validation that our approach has high power to detect pathways containing multiple weakly associated susceptibility genes. Therefore, the top ranked pathways in our study should be prime targets for future analyses in independent datasets.
In conclusion, our results suggest that genetic alterations associated with the top three pathways and one canonical signaling cascade in our study may contribute to breast cancer susceptibility. Ultimately, additional studies would be needed to confirm and further explore the genetic variations underlying the association of these pathways with breast cancer. Moreover, this study highlights the potential insight that could be gained by pathway-based approach as a complimentary method to the primary single-SNP analysis of GWAS. Particularly, the organization of multiple association signals according to underlying biological processes may improve our understanding about the cellular mechanisms underlying this too common malignancy.