|Home | About | Journals | Submit | Contact Us | Français|
Common genetic variation may play an important role in altering lung cancer risk. We conducted a pathway-based candidate gene evaluation to identify genetic variations that may be associated with lung cancer in a population-based case–control study in Xuan Wei, China (122 cases and 111 controls). A total of 1260 single-nucleotide polymorphisms (SNPs) in 380 candidate genes for lung cancer were successfully genotyped and assigned to one of 10 pathways based on gene ontology. Logistic regression was used to assess the marginal effect of each SNP on lung cancer susceptibility. The minP test was used to identify statistically significant associations at the gene level. Important pathways were identified using a test of proportions and the rank truncated product methods. The cell cycle pathway was found as the most important pathway (P=0.044) with four genes significantly associated with lung cancer (PLA2G6 minP = 0.001, CCNA2 minP = 0.006, GSK3β minP = 0.007 and EGF minP = 0.013), after adjusting for multiple comparisons. Interestingly, most cell cycle genes that were associated with lung cancer in this analysis were concentrated in the AKT signaling pathway, which is essential for regulation of cell cycle progression and cellular survival, and may be important in lung cancer etiology in Xuan Wei. These results should be viewed as exploratory until they are replicated in a larger study.
Lung cancer is estimated to account for 1.4 million cancer cases and 1.2 million cancer deaths per year in the world (1). In 2000, it was estimated that 85% of lung cancer in men and 47% of lung cancer in women were attributed to tobacco smoking (1). While smoking is the primary risk factor for lung cancer, other environmental, occupational and genetic risk factors have been documented in certain populations (2). Due to the overwhelming risk associated with tobacco smoking, however, other lung cancer risk factors have not been fully elucidated.
Xuan Wei poses a unique opportunity to assess lung cancer susceptibility in a population with substantial in-home coal smoke exposure, a classified group 1 human carcinogen by the International Agency for Research on Cancer (3). Xuan Wei has the highest prevalence of lung cancer in China (4). The age-adjusted lung cancer mortality rates for men and women in Xuan Wei are 27.7 and 25.3 per 100000, respectively (5). The similarity of lung cancer rates in men and women is of considerable interest because almost all women are non-smokers (6). In Xuan Wei, nearly all women and few men cook, whereas most men and nearly no women smoke tobacco (6). The primary source of indoor air pollution in Xuan Wei is smoke from domestic fuel combustion for heating and cooking with most residents burning smoky coal (bituminous coal) and some using smokeless coal (anthracite coal). Smoky coal use in Xuan Wei homes is associated with very high and comparable risks of lung cancer in both men and women (7,8).
Since the carcinogenic constituent of smoky coal combustion is polycyclic aromatic hydrocarbons (5,9), initial lung cancer susceptibility studies in Xuan Wei focused on individual single-nucleotide polymorphisms (SNPs) in candidate genes associated with polycyclic aromatic hydrocarbon metabolism (10,11). Subsequent studies focused on important biological pathways, such as DNA repair (12). While these studies have provided some promising results, a large-scale candidate gene analysis has not been performed to evaluate genetic susceptibility to lung cancer in Xuan Wei. Therefore, we analyzed 1260 SNPs in 380 candidate genes using an Oligo Pool with an Illumina® GoldenGate Assay. Candidate SNPs were selected from the SNP500Cancer database and genotyped if they were potentially relevant for cancer or other human diseases, had possible functional significance or expanded gene coverage of previously identified candidate genes. We hypothesized that this large-scale candidate gene study would provide insight into the pathways important to lung cancer susceptibility.
The study population of this population-based case–control study has been described previously (10). Briefly, all residents of Xuan Wei, China, from March 1995 to March 1996 were eligible for inclusion. Lung cancer cases with clinical symptoms and X-ray confirmation were identified at one of five hospitals servicing Xuan Wei County. Of the 135 eligible cases, 133 (99%) agreed to participate. To be enrolled, cases had to be histologically (n=14) or cytologically (n=91) confirmed or have died within 1 year of diagnosis (n=17), since previous studies in Xuan Wei suggest that death within 1 year of clinical diagnosis of lung cancer is a strong indicator of lung cancer diagnosis (13). Based on these criteria, 122 of 133 consenting cases (92%) were enrolled into the study.
Controls were selected from the Xuan Wei general population and were individually matched by sex, age (±2 years), village and type of fuel used for in-home cooking and heating at time of interview. The participation rate for controls was 100%. A detailed questionnaire evaluating smoking history, domestic fuel use history and other demographic information was administered by trained interviewers to cases and controls. This research protocol was approved by a United States Environmental Protection Agency Human Subjects Research Review Official, and informed consent was obtained from all study subjects.
Genotyping was performed on DNA extracted from sputum samples via phenol–chloroform extraction (14). Candidate SNPs were identified through the SNP500Cancer database (http://snp500cancer.nci.nih.gov/) and genotyped if they were potentially relevant for cancer or other human diseases, had possible functional significance or expanded gene coverage of previously identified candidate genes. High-throughput genotyping was successful for 122 (100%) cases and 111 (91%) controls with an Oligo Pool by the Illumina GoldenGate Assay (http://www.illumina.com) at the National Cancer Institute’s Core Genotyping Facility (Gaithersburg, MD). Ten controls did not have ample DNA for genotyping. Duplicate samples (n=21) of both cases and controls were randomly distributed throughout study plates to ensure quality control and determine the intra-subject concordance rate for all assays (>98%). Initially, 1442 SNPs were genotyped. Hardy–Weinberg equilibrium for each SNP was tested in controls with a Pearson χ2 test or a Fisher’s exact test if any of the cell counts were less than five. After exclusion of 166 SNPs with low minor allele frequency (<0.01) and 16 SNPs with substantial deviation from Hardy–Weinberg equilibrium (P<0.001), 1260 SNPs in 380 genes were left for analysis.
First, unconditional logistic regression was used to estimate the odds ratio and calculate the 95% confidence interval for the association between lung cancer risk independently for each SNP, using the homozygote of the common allele as the reference group and adjusting for age (<55 and ≥55 years), sex, smoking (0 pack years, >0 and <25 pack years and ≥25 pack years) and lifetime smoky coal exposure (<130 and ≥130 tons). Gene–dose effects for each SNP were estimated by a linear trend test by coding the genotypes based on the number of variant alleles present (0, 1 and 2). Interactions between the dominant model and lifetime smoky coal exposure were tested on the multiplicative scale for significant SNPs in the four significant cell cycle genes while adjusting for age, sex and smoking.
Second, gene-based analyses were performed on 380 genes. To assess the significance of the association between each gene and lung cancer, we used MatLab to perform a minP test that assesses the significance of the minimal P-value in each gene using a permutation-based resampling procedure (1000 permutations) that takes into account the number of SNPs genotyped in each gene and their underlying linkage disequilibrium (LD) structure (15). A gene was significantly associated with lung cancer if it had a minP ≤0.05, after adjustment for age (<55 and ≥55 years), sex, smoking (0 pack years, >0 and <25 pack years and ≥25 pack years) and lifetime smoky coal exposure (<130 and ≥130 tons). False discovery rates (FDRs) were calculated using the Benjamini–Hochberg method to evaluate the significance of the minP results within the cell cycle pathway (16).
Third, haplotype blocks and structure were determined with Haploview using data from controls for the four significant genes in the cell cycle pathway with more than one SNP (17). Haplotype frequencies were estimated using the expectation–maximization algorithm (18). Haplotypes with frequencies <1% were excluded. The overall difference in haplotype frequencies between cases and controls was assessed using a global score test (19). Haplotype odds ratios and 95% confidence intervals were calculated and adjusted for age (<55 and ≥55 years), sex, smoking (0 pack years, >0 and <25 pack years and ≥25 pack years) and lifetime smoky coal exposure (<130 and ≥130 tons). A sliding window 3-SNP haplotype approach was also performed for the two significant genes in the cell cycle pathway with more than three SNPs to comprehensively evaluate potential disease loci in small genetic regions that may have been overlooked with the single-locus analysis (20).
Finally, pathway-based analysis was performed on all 1260 SNPs available for analysis. Genes were categorized into biological pathways using the GoMiner software (http://discover.nci.nih.gov/gominer/), which utilizes the Gene Ontology database (http://www.geneontology.org) to identify the biological processes and functions of the genes and classify them into biologically coherent categories. To test the significance of each pathway, the proportion of statistically significant genes versus non-significant genes for each pathway was compared with the proportion of significant genes versus non-significant genes in all the remaining pathways using the one-sample test for proportions (21). Exact methods were used when cell counts were less than five. In addition, we used the rank truncated product method to evaluate the excess of highly significant SNPs within each pathway (22). Since the rank truncated product test yielded similar results as the one-sample test for proportions, only the one-sample test for proportions results are reported.
All statistical methods were performed using SAS software, version 9.1 (SAS Institute, Cary, NC), unless stated elsewhere.
Cases and controls were comparable in age, sex and smoking status (Table I). As in previous reports of this study population (10), cases tended to use significantly more in-home smoky coal over the course of their lifetimes than controls (P=0.02).
A total of 1260 functional or likely functional SNPs in 380 genes were successfully genotyped. Pathway-based analysis categorized the 380 genes into 10 pathways (supplementary Table 1 is available at Carcinogenesis Online): cell cycle genes (including apoptosis) (n =49), DNA repair genes (n=49), telomere maintenance genes (n=5), immune response genes (n=63), molecular transport genes (n=27), signal transduction genes (n=35), one-carbon metabolism genes (n=11), xenobiotic metabolism genes (n=29), other metabolism genes (n=57) and other uncategorized genes (n =55). While all pathways had at least one significant gene, including polycyclic aromatic hydrocarbon metabolism-related EPHX1 and GSTM3 in the xenobiotic pathway and telomere maintenance genes TERT and TERF2 (supplementary Table 1 is available at Carcinogenesis Online), only the cell cycle pathway had a significantly increased proportion of significant genes (P=0.044) (Table II).
Gene-based analysis identified 9 of 49 genes as significantly associated with lung cancer risk (PLA2G6 minP = 0.001, CCNA2 minP = 0.006, GSK3β minP = 0.007, v-akt murine thymoma viral oncogene homolog 1 (AKT) minP = 0.010, EGF minP = 0.013, TP53I3 minP = 0.017, PTEN minP = 0.018, MYBL2 minP = 0.033 and CCND3 minP = 0.038) in the cell cycle pathway (Table II). When adjusting for the number of tested genes, only PLA2G6 (FDR = 0.0098), GSK3β (FDR = 0.0098), EGF (FDR = 0.0376) and CCNA2 (FDR = 0.0426) remained significant.
Individual SNP analyses, using logistic regression, found 24 of 44 individual SNPs genotyped in the four significant genes of the cell cycle pathway to be associated with lung cancer risk (Ptrend≤0.05). After accounting for SNPs in high LD (D′ ≥ 0.97) and highly correlated (r2≥0.90) within each gene, only one SNP in each gene was identified as the important SNP associated with lung cancer risk (Table III). Variant carriers of CCNA2 rs3217773, GSK3β rs6781942 and PLA2G6 rs84473 were associated with a significantly increased risk of lung cancer (Ptrend≤0.05), whereas variant carriers of EGF rs2237051 were associated with a significantly decreased risk of lung cancer (Ptrend≤0.05). The magnitude and direction of risk associated with EGF rs2237051, CCNA2 rs3217773, GSK3β rs6781942 and PLA2G6 rs84473 and lung cancer were similar between men and women (data not shown). Lifetime smoky coal use did not interact significantly with any significant SNP in PLA2G6, CCNA2, GSK3β and EGF (data not shown). Supplementary Table 2 (available at Carcinogenesis Online) provides lung cancer risk associated with all genotyped SNPs.
Haplotype analysis of GSK3β was based on 34 SNPs that covered 61 of 75 SNPs in this gene (Figure 1). Of the eight blocks defined by the LD in the controls, the seven blocks significantly associated with lung cancer risk were consistent with the individual SNP results (Table IV). Sliding window analysis identified the GSK3β genomic regions of blocks 4 and 5 to have a slightly more increased risk of lung cancer than the other genomic regions (Figure 1), identifying GSK3β rs6781942 as important, similar to individual SNP analyses. Haplotype analyses for PLA2G6, EGF and CCNA2 did not provide any additional information beyond individual SNP analyses.
Through an exploratory analysis of 10 different biological pathways that are potentially important to carcinogenic processes, only the cell cycle pathway had a significantly increased proportion of significant genes compared with all other pathways. Gene-based analyses identified nine cell cycle genes that were significantly associated with lung cancer susceptibility: AKT1, CCNA2, CCND3, EGF, GSK3β, MYBL2, PLA2G6, PTEN and TP53I3. Only four genes remained significant after adjustment for multiple comparisons and individual SNP analyses identified the most important variant of each gene (CCNA2 rs3217773, EGF rs2237051, GSK3β rs6781942 and PLA2G6 rs84473). Of the four genes identified in the cell cycle pathway, three (CCNA2, EGF and GSK3β) are closely interconnected through the AKT signaling pathway, which is an important regulator of apoptosis and is essential to help cells manage apoptotic stimuli, by regulating cell cycle progression and cellular survival (23). Similar to our findings, a recent study found cell cycle genes to be important in the expression signature of smoking-related lung cancers (24). AKT-dependent apoptosis and cell cycle disruption are triggered by the binding of epidermal growth factor to EGFR on the cell surface (25). The activation of AKT leads to phosphorylation of GSK3β, which inhibits cyclin D, a regulator of entry into the S phase of the cell cycle (26).
The importance of AKT in the regulation of apoptosis and the cell cycle makes its expression essential to homeostasis. The dysregulation of AKT has been seen in many cancers, including lung cancer (27). AKT-dependent apoptosis and cell cycle disruption are triggered by extracellular growth factors that activate AKT (28), such as the binding of epidermal growth factor to EGFR on the cell surface (25). Similar to AKT, epidermal growth factor has also been shown to be upregulated in lung cancer tumors (29,30). In our study, EGF rs2237051 was associated with decreased risk of lung cancer risk. Although this particular variant has not been reported previously to be associated with lung cancer, it was in LD (D′ = 0.86) with EGF rs4444903, which has been associated with increased risk of lung cancer in one Korean population (31), but not another (32).
The activation of AKT leads to the phosphorylation of GSK3β, which inhibits cyclin D (23). Recently, it has been hypothesized that EGFR, and not just AKT, might also be involved with the phosphorylation of GSK3β lung adenocarcinomas (33). Decreased GSK3β expression has been seen in bronchial and tracheobronchial epithelial cells exposed to cigarette smoke (34,35). Whereas GSK3β has been strongly implicated as an important etiologic factor of colorectal and pancreatic cancer (36,37), mutations in the GSK3β have not been reported previously in lung cancer. We observed a significant gene-based association between GSK3β and risk of lung cancer. Haplotype analyses identified a genomic region, which included rs6781942, with an increased risk of lung cancer. GSK3β rs6781942 was significantly associated with an increased risk of lung cancer in homozygote variant carriers. The strength and consistent associations between lung cancer and GSK3β found in our study warrant further investigation.
Finally, at the heart of the cell cycle control system and the end of the AKT pathway is a family of protein kinases known as cyclin-dependent kinases. Changes in cyclin levels result in activation of cyclin–cdk complexes, triggering cell cycle events (38). Cyclin D, which is inhibited by GSK3β, regulates entry into the S phase of the cell cycle, where it is essential for cyclin A–Cdk2 to phosphorylate E2F and inhibit its bindings to DNA, thus inactivating its function as a transcription factor (39). Variant carriers of CCNA2 (rs3217773) were associated with increased risk of lung cancer in our study.
One of the major strengths of our population-based case–control study is the high participation rate. The moderate sample size, on the other hand, may lead to false-positive and false-negative findings (40). Therefore, our findings should be viewed as hypothesis generating until they are replicated in a larger study. We accounted for possible spurious findings due to multiple comparisons by using a permutation method for the gene-based analyses and then evaluating FDRs. The use of a gene-based permutation analysis identifies genes significantly associated with disease status by comparing the observed association with the distribution of gene–disease associations seen in 1000 randomly generated populations. This robust identification allows for sequential honing of important genomic regions and subsequently SNPs associated with lung cancer. Although functionality is not known for all genotyped SNPs, our results are biologically plausible given that variants in cell cycle pathway genes could contribute to lung cancer risk. However, associations with any specific SNP should be cautiously interpreted until these results are replicated, and functionality is determined, especially given that associations with a particular SNP in this study may be attributed to another SNP in LD.
In summary, our findings provide evidence of genetic variation that may be important to lung cancer susceptibility. Our results implicate the cell cycle pathway, particularly the CCNA2, EGF, GSK3β and PLA2G6 genes. The strongest findings in our study were disproportionally concentrated in the biologically interconnected AKT signaling pathway that regulates cell cycle and apoptosis. Our results should be viewed as exploratory until they are replicated in larger studies with more substantial genomic coverage.
Intramural National Cancer Institute (N01-CO-12400) program; Yale University–National Cancer Institute Partnership Predoctoral Fellowship Training Program (NCI TU2 CA105666).
Conflict of Interest Statement: None declared.