|Home | About | Journals | Submit | Contact Us | Français|
Four loci have been associated with pancreatic cancer through genome-wide association studies (GWAS). Pathway-based analysis of GWAS data is a complementary approach to identify groups of genes or biological pathways enriched with disease-associated single-nucleotide polymorphisms (SNPs) whose individual effect sizes may be too small to be detected by standard single-locus methods. We used the adaptive rank truncated product method in a pathway-based analysis of GWAS data from 3851 pancreatic cancer cases and 3934 control participants pooled from 12 cohort studies and 8 case–control studies (PanScan). We compiled 23 biological pathways hypothesized to be relevant to pancreatic cancer and observed a nominal association between pancreatic cancer and five pathways (P < 0.05), i.e. pancreatic development, Helicobacter pylori lacto/neolacto, hedgehog, Th1/Th2 immune response and apoptosis (P = 2.0 × 10−6, 1.6 × 10−5, 0.0019, 0.019 and 0.023, respectively). After excluding previously identified genes from the original GWAS in three pathways (NR5A2, ABO and SHH), the pancreatic development pathway remained significant (P = 8.3 × 10−5), whereas the others did not. The most significant genes (P < 0.01) in the five pathways were NR5A2, HNF1A, HNF4G and PDX1 for pancreatic development; ABO for H. pylori lacto/neolacto; SHH for hedgehog; TGFBR2 and CCL18 for Th1/Th2 immune response and MAPK8 and BCL2L11 for apoptosis. Our results provide a link between inherited variation in genes important for pancreatic development and cancer and show that pathway-based approaches to analysis of GWAS data can yield important insights into the collective role of genetic risk variants in cancer.
Genome-wide association studies (GWAS) have become the standard for investigating the association between common inherited genetic variants across the genome and risk of complex diseases such as cancer. Two GWAS (PanScan 1 and PanScan 2) recently identified four susceptibility loci for pancreatic cancer at chromosomes: 9q34.2, 13q22.1, 1q32.1 and 5p15.33 (1,2). Despite these important findings, the statistical tests applied in GWAS are typically restricted to single markers; furthermore, some markers and genes may be missed because of the stringent statistical threshold necessary to minimize false-positive findings (genome-wide significance) (3,4). Pathway-based analysis of GWAS data is a complementary approach for identifying groups of genes or biological pathways enriched with disease-associated single-nucleotide polymorphisms (SNPs) whose individual effect sizes may be too small to be detected by standard single-locus methods.
The idea for pathway-based approaches stems from two concepts: (i) that a functional pathway represents a series of biochemical actions leading to an end point or cellular function such as an activated or inactivated enzyme or metabolite, an enhanced or repressed signaling cascade, a repaired DNA strand or a coordinated immune response and (ii) that small changes due to variation in the expression of genes involved in a functional pathway may lead to an outcome such as cancer (5).
We performed a comprehensive pathway-based analysis of the combined dataset of two pancreatic cancer GWAS, PanScan 1 and PanScan 2, using an adaptive combination of P-values in the adaptive rank truncated product (ARTP) method (6). Twenty-three biological pathways and groups of genes known or hypothesized from the literature to be important in pancreatic tumorigenesis were selected, including pancreas development, DNA repair, apoptosis, cell cycle signaling, immune function and inflammatory pathways, insulin resistance, PI3 kinase, Wnt, Notch, hedgehog and transforming growth factor (TGF)-β signaling. We confirmed the major results from the ARTP analysis with a logic regression analysis (7,8).
The study population included 3851 pancreatic cancer cases and 3934 control participants from the previously conducted GWAS in the Pancreatic Cancer Cohort Consortium and the Pancreatic Cancer Case Control Consortium (PanC4) (1,2). Briefly, this collaborative project included 1528 incident cases and 1594 controls from nested case–control studies of 12 cohort studies and 2323 cases and 2340 controls from 8 case–control studies (1,2). Cases were defined as participants diagnosed with primary adenocarcinoma of the exocrine pancreas; controls were matched to cases according to birth year, sex and self-reported race/ethnicity and were free of pancreatic cancer at the time of recruitment (1,2). Genotyping was performed by the National Cancer Institute’s Core Genotyping Facility using the Illumina HumanHap550 and HumanHap550-Duo SNP arrays (PanScan 1) and Illumina Human 610-Quad arrays (PanScan 2) (1,2). PanScan 1 and PanScan 2 were approved by the Institutional Review Board of each participating institution and National Cancer Institute’s Special Studies Institutional Review Board (1,2).
Pathways were chosen on the basis of our current understanding of the etiology and molecular mechanisms of pancreatic cancer with the aim of constructing concise core pathways known to be important for pancreatic biology; 23 pathways or groups of genes were compiled (Table I) based on literature searches and online resources accessed between 2008 and 2010 (e.g. http://www.SNPs3D.org; http://sciencepark.mdanderson.org/labs/wood/DNA_Repair_Genes.html and http://www.genome.jp/kegg/pathway.html). These included pathways related to pancreatic organ development and differentiation, DNA repair, apoptosis, cell cycle regulation, immune response, Helicobacter pylori infection, inflammation, insulin resistance, PI3 kinase, Wnt, Notch, hedgehog and TGF-β signaling pathways. For example, the DNA repair pathway, including subpathways, were included in this analysis based upon results from previous candidate gene association studies (9–11). Diabetes mellitus (12,13), glucose intolerance (12,14,15) and chronic pancreatitis also appear to predispose individuals to pancreatic cancer (16). Allergies have been associated with reduced risk of pancreatic cancer in several studies (17) but little is known about the genetic basis of this association. Two pathways related to allergies were considered for this study: one including genes related to the balance between T-helper 1 and T-helper 2 cells (Th1/Th2) and the other including genes related to serum IgE levels (18). The pancreas develops from the endodermal epithelium of the foregut of the vertebrate embryo. A series of transcriptional regulators govern the development and cell type differentiation of the gland. We compiled a list of transcriptional regulators important for early pancreatic development by reviews of the literature and by searching GO and KEGG terms (19–22) Predisposing genetic factors for pancreatic cancer remain poorly understood; however, genetic variation in genes that influence the above risk factors are viable candidate genes for interrogation. The total number of genes was 577 (of which 4 were included in 3 pathways and 30 in 2 pathways).
SNP association analysis was conducted with use of the logistic regression model using a boundary for each gene beginning 20 kb upstream of the transcriptional start site and ending 10 kb downstream of the transcriptional end site of the gene (including exons, introns and untranslated regions). This model was fit for genotype trend effects (1 d.f.) adjusted for study, age, sex, self-described ancestry and 10 principal components for the population stratification adjustment, which included the top 5 principal components identified in the cohort studies and the top 5 principal components identified in the case–control studies (1,2). P-values for individual SNPs were based on the 1 d.f. Wald test derived from the fitted logistic regression model.
First, we conducted a gene-based analysis to evaluate the association between a candidate gene/region and cancer risk. The test statistic used was the minP statistic, which was the minimum P-value among all P-values from the single SNP analysis conducted within the candidate gene. The P-value for the gene-based analysis (called gene P-value) can be evaluated through a bootstrap procedure. Second, we conducted the pathway analysis to evaluate the association between a set of candidate genes included in a pathway and cancer risk. The pathway analysis was based on the ARTP method (6) and was implemented in the R package ARTP (http://dceg.cancer.gov/bb/tools/artp). The ARTP method aims at maximizing the association signal by combining gene-level P-values from a set of selected genes within the pathway into the test statistic and uses a bootstrap procedure to estimate its P-value and has been shown to account properly for the type I error (6). The bootstrap procedure is used for the purpose of generating datasets under the null hypothesis while keeping the correlation among SNPs the same as that in the observed dataset. The P-value for both the gene-based and pathway analyses was initially estimated by 30000 parametric bootstrap steps. We re-evaluated P-values for genes or pathways that had initially estimated P-values of <0.05 using 1000000 bootstrap steps.
As a complementary approach to the ARTP method, we used a logic regression model (7,8) to reanalyze several promising pathways identified by the ARTP method to determine whether those pathways were enriched with interactions. The ARTP method looks for marginal effects from individual SNPs but does not aim at detecting epistatic interactions among SNPs. In contrast, logic regression is an adaptive regression methodology that attempts to identify ‘logic’ (binary) combinations of predictors that are associated with a regression outcome. Each SNP is recoded as two binary predictors: one is based on whether at least one variant allele is present (‘dominant coding’) and the other is based on whether two variant alleles are present (‘recessive coding’). We fit models using a simulated annealing algorithm. Model selection was conducted using cross-validation and permutation tests. A Bayesian approach to model selection was used to generate a list of possible candidates of predictors.
Of the 23 pathways analyzed (Table I), the most statistically significant association was seen for the pancreatic developmental pathway (P = 2.0 × 10−6) and the H. pylori lacto/neolacto pathway (P = 1.6 × 10−5). Three additional pathways were nominally significant: hedgehog signaling (P = 0.0019), Th1/Th2 immune response (P = 0.019) and apoptosis (P = 0.023). The top three pathways (pancreatic development, H. pylori lacto/neolacto and hedgehog) were significant after Bonferroni correction for the 23 pathways tested (P < 0.002). However, after excluding genes (i.e. removing all SNPs within the gene) previously identified by the initial GWAS (NR5A2 from the pancreatic development pathway, ABO from the H. pylori lacto/neolacto and SHH from the hedgehog pathway), the pancreatic development pathway remained significant (P = 8.3 × 10−5), whereas the other two pathways became nonsignificant (P > 0.05).
We also computed gene-level P-values for the 577 genes included in the study; 46 genes had P-values of <0.05 (Table II). The major genes contributing to the significant pathways include NR5A2, HNF1A, HNF4G, PDX1 and HNF1B for pancreatic development; ABO for H. pylori lacto/neolacto; SHH, BTRC and HHIP for hedgehog; TGFBR2, CCL18 and IL13RA2 for Th1/Th2 immune response and MAPK8, BCL2L11, FAS, FASLG and CASP7 for the apoptosis pathway. For the other pathways analyzed, zero to four genes were nominally significant (P <0.05) (Table II).
Individual SNPs that were significant at the P < 0.001 level for the five significant pathways are listed in Table III. The pancreatic development pathway showed 15 SNPs: 6 located in the NR5A2 gene, 5 in HNF1A, 3 in HNF4G and 1 in HNF1B. Five SNPs in the H. pylori lacto/neolacto pathway were significant; however, they were all located within the ABO gene previously identified in the GWAS (1,2). Two SNPs in the hedgehog signaling pathway were significant at this level, located approximately 10–15 kb upstream of the SHH gene; again, both were identified in PanScan 1, but the association was not replicated in PanScan 2 (1,2). Two SNPs in the TGFBR2 gene within the Th1/Th2 immune response pathway were significant at a threshold of P < 0.001; these SNPs were also included in the TGF-β pathway that was not significant overall. Finally, three SNPs in the apoptosis pathway were significant at the same P-value level: one in MAPK8 and two in BCL2L11.
We also observed a significant association between the pancreatic development pathway and cancer risk using logic regression analysis. The SNPs that occurred most frequently in the models were rs2816939, rs3762399, rs2737621 (NR5A2), rs7310409, rs7953249 (HNF1A), rs2943547 (HNF4G) and rs2688 (HNF1B). The results of the Bayesian version of logic regression were compared 1000 times with a permuted response. The fit on the permuted data was always worse than the fit on the real data, thus providing strong evidence of an association between the pancreatic development pathway and pancreatic cancer. For the Th1/Th2 immune response pathway and apoptosis genes, logic regression also provided some evidence of associations with pancreatic cancer (data not shown).
Our pathway-based analysis of GWAS data has shown that common germ line variation in pancreatic developmental genes may be important susceptibility factors for pancreatic cancer. The genes that contributed to this significant association include NR5A2, HNF1A, HNF4G, PDX1 and HNF1B. This association remains significant even after removing variants in the NR5A2 gene shown previously to be associated with pancreatic cancer risk (P < 0.001). Four additional pathways showed nominally significant association with risk of pancreatic cancer (P < 0.05), i.e. H. pylori lacto/neolacto, hedgehog signaling, apoptosis and Th1/Th2 immune response, although genes previously implicated in pancreatic cancer risk may drive the association for the hedgehog (SHH) and H. pylori lacto/neolacto (ABO) pathways.
The five genes that contributed to the significant association with the pancreatic development pathway are important components of the transcriptional networks governing embryonic pancreatic development and differentiation as well as maintaining pancreatic homeostasis in adults (23,24). PDX1 (pancreas-duodenal homeobox 1) regulates the very early steps of exocrine pancreas development (25). NR5A2 is a direct downstream target of PDX1 in this process (26). HNF1A and HNF1B encode hepatocyte nuclear factor 1 alpha and beta, also known as transcription factors 1 and 2 (TCF1 and TCF2), respectively. These proteins belong to the homeobox family of DNA-binding proteins and regulate expression of a large number of genes. HNF1A primarily regulates the growth and function of islet β cells, and HNF1B plays an essential role in controlling pancreatic organogenesis and differentiation (23). Consistent with our observations, HNF1A was identified as the top hit for pancreatic cancer in a separate analysis of PanScan data by assessing markers previously identified in GWAS of phenotypes other than pancreatic cancer (27). Heterozygous compound knockout mouse models have shown that PDX1, NR5A2, HNF1A and HNF1B act in a tightly regulated feedback circuit in regulating pancreas development and differentiation (26,28). Therefore, even subtle differences in the relative activity of any of these genes may have profound consequences on overall network activity. Notably, the hedgehog signaling pathway, in particular the SHH gene, also plays an essential role during embryonic pancreatic development (29). Genes involved in organ development and differentiation contribute to the ability of tumor cells to proliferate and evade cell death, but they also often alter cell plasticity, i.e. reprogram cells to a state that may give rise to a tumor (29).
Mutations in HNF1A, PDX1 and HNF1B are responsible for maturity onset diabetes of the young (MODY) types 3, 4 and 5, respectively (30,31). Both mutations and common variants in HNF1A and HNF1B have been associated with the risk of type II diabetes (32–34). Common variants in NR5A2, HNF1B and HNF4G (35) also have been associated with body mass index in recent GWAS. A recent study has reported a critical role of NR5A2 in phosphatidylcholine signaling pathway regulating fatty acid and glucose homeostasis (36). Because obesity and long-term type II diabetes are known risk factors for pancreatic cancer, it is possible that these genes may contribute to pancreatic cancer, partially through an increased risk of obesity and diabetes.
In addition to their roles in regulating the development and function of the pancreas, HNF1A and HNF1B also control terminal differentiation and cell fate commitment in the gut epithelium (37,38). Somatic mutations of the HNF1A gene have been reported in several types of human cancer, suggesting a tumor suppressor role (39–41). HNF1A silencing by small interfering RNA in hepatocellular carcinoma cells induces overexpression of several genes encoding growth factor receptors, components of the translational machinery, cell cycle and angiogenesis regulators, with, in particular, activation of the mammalian target of rapamycin pathway (42). Moreover, HNF1A has been recognized as a master regulator of plasma protein fucosylation (43) and plasma levels of C-reactive protein (44,45). This suggests that HNF1A may also contribute to pancreatic cancer via regulation of immunity, tumor progression and metastasis as well as through metabolic and inflammatory pathways. Overall, the pancreatic development pathway may have an impact on pancreatic cancer risk through multiple diversified mechanisms.
We also observed weaker associations of the Th1/Th2 immune response and apoptosis genes with pancreatic cancer. Genes in the Th1/Th2 pathway influence the balance of T-helper cells; individuals with allergies, who are at lower risk of pancreatic cancer, have heightened Th2 (T-helper type 2) response. TGFBR2 and CCL18 contribute to the significance of the Th1/Th2 pathway. Although T-helper cells are mostly implicated in diseases associated with immune responses, such as allergy, asthma and infections, they may also play a role in immune surveillance of tumor cells (46). On the other hand, TGF-β is one of the core signaling pathways involved in pancreatic cancer (47), and the TGFBR2 gene is mutated in 4% of pancreatic cancer cases (48). Chemokines such as CCL18 have been implicated in biological processes involving tumor growth including leukocyte migration, angiogenesis and metastasis (49); CCL18 is associated with some allergic conditions and is induced by Th2 cytokines. However, the role of CCL18 in pancreatic carcinogenesis is unknown. Defective apoptosis represents a contributory feature in the development and progression of cancer. Among the 42 apoptosis-related genes analyzed, MAPK8 and BCL2L11 were the most notable. Mitogen-activated protein kinases are involved in cell proliferation, differentiation, apoptosis, transcription regulation and development. MAPK8 (aka JNK1 or SAPK1) is a serine–threonine kinase that belongs to the stress-activated signaling cascade and has been shown to play a role in obesity and insulin resistance (50). BCL2L11 is a member of the BCL2 family and plays a role in neuronal and lymphocyte apoptosis.
In summary, our pathway-based association analysis of pancreatic cancer GWAS data has revealed a connection between pancreatic development and cancer risk by using sets of genes previously known to be important for pancreatic cancer through various processes and molecular functions. We use an ARTP method as our primary approach and confirmed the results for the developmental pathway with another approach, logic regression. Our selection of pathways incorporated databases (such as KEGG and GO), however, was narrowed to include only those genes central to each pathway, based on the literature. A more agnostic wider pathway based analysis might elucidate new pathways beyond that which is known. Our study is the largest to date to examine candidate pathways and genes associated with pancreatic cancer. A limitation to our study is that in order to maximize power, all available case–control pairs were used for the analysis. Replication efforts in independent studies are therefore needed to confirm our findings. These findings may open new research avenues in our understanding of the etiology of this deadly malignancy.
This research was supported by the Intramural Research Program of the National Institutes of Health (NIH) , Division of Cancer Epidemiology and Genetics, National Cancer Institute (NCI) , National Institutes of Health, Department of Health and Human Services.
E.J.D. supported by Insituto de Salud Carlos III (RETICC, RD06/0020).
The NYU Women’s Health Study is supported by research grant (R01CA098661) and center grant (P30CA016087) from the NCI and the center grant (ES000260) from the National Institute of Environmental Health Sciences.
The WHI program is funded by the National Heart, Lung, and Blood Institute, NIH , US Department of Health and Human Services through contracts N01WH22110, 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 42107-26, 42129-32 and 44221.The authors thank the WHI investigators and staff for their dedication, and the study participants for making the program possible. A full listing of WHI investigators can be found at: http://www.whiscience.org/publications/WHI_investigators_shortlist.pdf.
The Mayo Clinic Molecular Epidemiology of Pancreatic Cancer study is supported by the Mayo Clinic SPORE in Pancreatic Cancer (P50 CA102701). The authors would like to acknowledge William Bamlet, Traci Hammer, Jodie Cogswell, Hugues Sicotte, Janet Olson, Martha Matsumoto, and Dennis Robinson.
The Yale University study was supported by grant number (5R01CA098870) from the NCI, NIH. The cooperation of 30 Connecticut hospitals, including Stamford Hospital, in allowing patient access, is gratefully acknowledged. This study was approved by the State of Connecticut Department of Public Health Human Investigation Committee. Certain data used in this study were obtained from the Connecticut Tumor Registry in the Connecticut Department of Public Health. The authors assume full responsibility for analyses and interpretation of these data.
The PHS, NHS, HPFS and WHS at Harvard were supported by the NCI, NIH (grants no. P01 CA87969, P01 CA55075, P50 CA127003, R01 CA124908, RO1 CA97193, RO1 CA34944, RO1 CA40360, RO1 HL26490, RO1 HL34595, RO1 CA047988, RO1 HL043851, RO1 HL080467).
The work at Johns Hopkins University was supported by the NCI (grants P50CA62924 and R01CA97075) and the Lustgarten Foundation for Pancreatic Cancer Research.
The Shanghai Men’s Health Study was supported by the NCI extramural research grant (R01 CA82729). The Shanghai Women’s Health Study was supported by the NCI extramural research grant (R37 CA70867) and, partially for biological sample collection, by the Intramural Research Program of NCI (Division of Cancer Epidemiology and Genetics). We are in debt to the contributions of Drs Yu-Tang Gao and Yong-Bing Xiang in these two cohort studies. The studies would not be possible without the continuing support and devotion from the study participants and staff of the SMHS and SWHS.
Pancreatic cancer research at Memorial Sloan-Kettering Cancer Center was supported by The Society of MSKCC and by the Geoffrey Beene Cancer Research Fund.
The work at M. D. Anderson was supported by NIH grant (RO1 CA98380).
The UCSF study was supported in part by NCI grants [CA59706, CA108370, CA109767, CA89726 (E.A.H., PI) and CA98889 (E.J.D., PI] and by the Rombauer Pancreatic Cancer Research Fund.
The University of Toronto study was supported by grants from the NIH (R01 CA97075, as part of the PACGENE consortium), The Lustgarten Foundation for Pancreatic Cancer Research and the Ontario Cancer Research Network. We acknowledge the Pancreatic Cancer Canada Foundation (www.pancreaticcancercanada.ca) for their continued support of research into the early detection of pancreatic cancer and the Pancreas Cancer Screening Study at Mount Sinai Hospital and Princess Margaret Hospital. The authors acknowledge Ayelet Borgida and Heidi Rothenmund for their dedicated contributions toward data collection and study co-ordination.
PLCO was supported by individual contracts from the NCI to the University of Colorado Denver (NO1-CN-25514), Georgetown University (NO1-CN-25522), Pacific Health Research Institute (NO1-CN-25515), Henry Ford Health System (NO1-CN-25512 ), University of Minnesota (NO1-CN-25513), Washington University (NO1-CN-25516), University of Pittsburgh (NO1-CN-25511), University of Utah (NO1-CN-25524), Marshfield Clinic Research Foundation (NO1-CN-25518), University of Alabama at Birmingham (NO1-CN-75022), Westat, Inc. (NO1-CN-25476), University of California, Los Angeles (NO1-CN-25404).
The ATBC Study was supported by funding provided by the Intramural Research Program of the NCI, NIH and through U S Public Health Service contracts (N01-CN-45165, N01-RC-45035, and N01-RC-37004) from the NCI.
For the EPIC cohorts, all coauthors coordinated the initial recruitment and management of the studies. All authors contributed to the final paper. The authors thank all the participants who took part in this research and the funders and support and technical staff who made this study possible. The work described in this paper was carried out with the support of the European Commission: Public Health and Consumer Protection Directorate 1993–2004; Research Directorate-General 2005–2008.; Ligue contre le Cancer, Societé 3M, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Aid, German Cancer Research Center, Federal Ministry of Education and Research (Germany); Danish Cancer Society (Denmark); ISCIII RETIC (RD06/0020) of the Spanish Ministry of Health, The participating regional governments and institutions (Spain); Cancer Research UK, Medical Research Council, Stroke Association, British Heart Foundation, Department of Health, Food Standards Agency, the Wellcome Trust (UK); Greek Ministry of Health and Social Solidarity, Hellenic Health Foundation and Stavros Niarchos Foundation (Greece); Italian Association for Research on Cancer (AIRC) (Italy); Dutch Ministry of Public Health, Welfare and Sports, Dutch Prevention Funds, LK Research Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF) (The Netherlands); Swedish Cancer Society, Swedish Scientific Council, Regional Government of Skane and Västerbotten (Sweden).
CLUE II was supported by National Institute of Aging grant (5U01AG018033) and NCI grants (CA105069, CA73790). Cancer incidence data were provided by the Maryland Cancer Registry, Center for Cancer Surveillance and Control, Department of Health and Mental Hygiene, 201 W. Preston Street, Room 400, Baltimore, MD 21201, USA, www.fha.state.md.us/cancer/registry/, 410-767-4055. We acknowledge the State of Maryland, the Maryland Cigarette Restitution Fund, and the National Program of Cancer Registries of the Centers for Disease Control and Prevention for the funds that support the collection and availability of the cancer registry data.
The Cancer Prevention Study II Nutrition Cohort is supported by the American Cancer Society. The authors thank all the men and women in the Cancer Prevention Study II Nutrition Cohort for their many years of dedicated participation in the study.
This project has been funded in whole or in part with federal funds from the NCI, NIH, under Contract No. HHSN261200800001E.
Conflict of Interest Statement: The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services nor does mention of trade names, commercial products or organizations imply endorsement by the US Government.