|Home | About | Journals | Submit | Contact Us | Français|
Asbestos exposure is a known risk factor for lung cancer. Although recent genome-wide association studies (GWASs) have identified some novel loci for lung cancer risk, few addressed genome-wide gene–environment interactions. To determine gene–asbestos interactions in lung cancer risk, we conducted genome-wide gene–environment interaction analyses at levels of single nucleotide polymorphisms (SNPs), genes and pathways, using our published Texas lung cancer GWAS dataset. This dataset included 317 498 SNPs from 1154 lung cancer cases and 1137 cancer-free controls. The initial SNP-level P-values for interactions between genetic variants and self-reported asbestos exposure were estimated by unconditional logistic regression models with adjustment for age, sex, smoking status and pack-years. The P-value for the most significant SNP rs13383928 was 2.17×10–6, which did not reach the genome-wide statistical significance. Using a versatile gene-based test approach, we found that the top significant gene was C7orf54, located on 7q32.1 (P = 8.90×10–5). Interestingly, most of the other significant genes were located on 11q13. When we used an improved gene-set-enrichment analysis approach, we found that the Fas signaling pathway and the antigen processing and presentation pathway were most significant (nominal P < 0.001; false discovery rate < 0.05) among 250 pathways containing 17 572 genes. We believe that our analysis is a pilot study that first describes the gene–asbestos interaction in lung cancer risk at levels of SNPs, genes and pathways. Our findings suggest that immune function regulation-related pathways may be mechanistically involved in asbestos-associated lung cancer risk.
Asbestos, a term for a group of naturally occurring hydrated silicate fibers, has been widely used in >3000 manufactured products. Exposure to asbestos can result in pleural lung fibrosis, and inhaled asbestos may cause peritoneal malignant mesothelioma and lung cancer (1). It is estimated that occupationally related asbestos exposure contributes to ~5–7% of all lung cancer cases (2). Although the use of asbestos has been banned or severely restricted since early 1970s in many developed countries, asbestos-related lung diseases still pose a great public health threat because of a long-latency period from asbestos exposure to the incidence of asbestos-induced diseases (3). Genetic factors have been suggested to be involved in asbestos-related carcinogenesis and lung genotoxicity by causing oxidative stress, inflammation, DNA damage-repair response, mutations, chromosomal aberrations, mitochondrial malfunction and apoptosis (4,5). To date, the impact of individual genetic variations on asbestos-related lung cancer risk is still not well understood. A small number of studies using a candidate gene approach reported that polymorphisms in genes encoding xenobiotic metabolizing enzymes (e.g. GSTM1, GSTT1, MPO, CYP1A1 and CYP2E1) and manganese superoxide dismutase (e.g. SOD2 and MnSOD) were associated with asbestos-related lung cancer risk (6–8).
In the past few years, genome-wide association studies (GWASs) have successfully identified some novel genetic susceptibility variants located on 15q21, 5p15.33 and 6p21.33 that are involved in lung cancer etiology (9,10). However, even the most significant single nucleotide polymorphisms (SNPs) identified in GWASs have only accounted for a small proportion of the estimated heritability (11). Besides rare variants and structural variants, gene–environment interaction is expected to be a possible source of missing heritability not explained by current GWASs (12,13). Although environmental factors, such as tobacco smoke and asbestos exposure, play an important role in lung cancer development, few reported studies have investigated the potential of the genome-wide gene–environment interactions following the initial findings of lung cancer GWASs. Because current GWASs have been designed to detect the main effect of genetic variants, which often suffer from inadequate sample sizes, the ability to detect gene–environment interactions even in the single SNP analysis has been limited (14).
To improve the power of current GWASs to detect gene–environment interactions at the SNP level, both gene- and pathway-based analytic approaches, integrating prior biological knowledge and association studies, have been proposed recently (15,16). These approaches combine associations of genetic variants in either the same gene or biological pathway with disease risk and enable us to summarize the associations on a functional basis. Furthermore, these approaches enhance researchers’ ability to investigate additional susceptibility genes and pathways to address ‘missing heritability’, with the potential of unraveling the mechanisms underlying the etiology of complex diseases. Several recent studies have shown that gene-based and pathway-based approaches to gene–environment interactions using existing GWAS datasets could successfully facilitate the mining of biological information and provide additional complementary information (17,18).
Although not formerly possible, researchers are now able to identify evidence in the whole genome-wide scale for potential interactions between genetic variants and asbestos exposure in lung cancer risk. Access to the existing GWAS datasets for lung cancer provided us a unique opportunity to further characterize the gene–environment interaction that may play a critical role in the development of lung cancer. In this study, we determined genetic variations involved in the interactions of gene–asbestos exposure in lung cancer risk by conducting genome-wide gene–environment interaction analyses at levels of SNPs and pathways, using data from the published Texas lung cancer GWAS.
The Texas lung cancer GWAS population has been described previously (10). Briefly, this study included 1154 patients newly diagnosed with histopathologically confirmed and untreated non-small-cell lung cancer (NSCLC) and 1137 cancer-free controls, who were non-Hispanic white ever smokers and frequency matched by age (±5 years), sex and smoking status. Cases were recruited from The University of Texas MD Anderson Cancer Center, and cancer-free controls were recruited from the Kelsey-Seybold Clinic, Houston’s largest multispecialty group practice. All the participants signed an informed consent, provided a 30ml blood sample, and completed a personal interview using a questionnaire that included information about environmental exposures including tobacco use and asbestos exposure.
For the Texas lung cancer GWAS, genotyping procedures of Illumina HumanHap300 v1.1 BeadChips (Illumina, San Diego, CA) with genomic DNA and quality control have been described elsewhere (10). In brief, the Chip contained 317 503 tagging SNPs derived from the International HapMap project phase I data. After data cleaning, markers deviated from the Hardy–Weinberg equilibrium in the controls (P < 0.0001) and SNPs with a minor allele frequency <0.01 were excluded. The final analysis included 1154 lung cancer cases and 1137 cancer-free controls with 317 498 tagging SNPs.
Assessment methods for exposure have been described previously (6). In brief, self-reported exposure to asbestos was assessed by a positive answer to ‘have you handled, used, or been in contact with for at least 8h a week for a year or more’. Former smokers were defined as a person who had quit smoking >1 year before enrollment, and current smokers included recent quitters who had quit smoking within the past 12 months. Pack-years were calculated as the years smoked times the average number of cigarettes per day divided by 20. Pack-years were also classified into three groups (<25 pack-years, 25≤ and <50 pack-years and ≥50 pack-years) by upper quartile and lower quartile of pack-years of the controls.
Chi-square tests were used to compare differences in distributions of demographic variables between the cases and controls. Unconditional logistic regression analyses were used to calculate odds ratios (OR) and 95% confidence intervals (CI) to determine the main effect of self-reported asbestos exposure and smoking status on the risk of lung cancer.
For the SNP-level gene–asbestos interaction, we used a standard approach to test gene–environment (G × E) interaction by performing a 1 df test of H 0: βge = 0 for each SNP based on the model using the equation: logit P (D = 1 | g,e) = β0 + βg G + βe E + βge GE, where D is an indicator of the disease status (cases, D = 1; controls, D = 0); G is the genotypic code for each genotype of a SNP (e.g. additive in this study); E is exposure status (exposed, E = 1; unexposed, E = 0). For a given SNP, an unconditional logistic regression model included the main effects for the genotypes (assuming an additive model), asbestos exposure, all covariates (age, sex, smoking status and pack-years) and the interaction term for genotypes and asbestos exposure was implemented in the PLINK1.07 ‘G × E’ procedure (19). The P-value of the interaction term (βge GE) was used to assess the significance of the interaction between genetic variants and asbestos exposure. Since there were 307 944 SNPs having P-values for interaction term in the logistic regression model, a P-value of 1.6×10–7 (0.05/307 944) was considered the threshold for a statistically significant test, taking into account the correction for multiple tests.
To determine whether there were genes that might have an excessive interaction with asbestos exposure and lead to an increased risk of lung cancer, we used a versatile gene-based test (VEGAS) to assess the gene-level interaction by assessing the P-values of the SNP-level gene–asbestos exposure interaction (20). The gene-based test statistic was the sum of the chi-squared 1 df statistics within that gene (with the gene boundaries of 50kb away from two sides of the gene) and the linkage disequilibrium between markers and number of SNPs per gene; this statistic was based on simulations from the multivariate normal distribution. The empirical gene-based P-value was calculated by the proportion of simulated test statistics that had exceeded the observed gene-based test statistic. Since there were 17 572 autosomal genes included in the data analyses, a P-value < 2.8×10–6 (0.05/17 572) was considered statistically significant.
To analyze pathway-level interaction, we used the improved gene-set-enrichment analysis approach (i-GSEA), an implementation and extension of the original gene-set-enrichment analysis (GSEA) (21) that estimates the pathway-level interactions between genetic variants and asbestos exposure using P-values obtained at the SNP level. In the i-GSEA analysis, we use the maximum –log (P-value) or statistics of all the SNPs mapped to a gene to represent the gene (t). Then, we ranked all genes in GWAS by decreasing values, and the genes at the top 5% with the small P-values were considered statistically significant. For each given pathway S, significance proportion-based enrichment score was calculated to estimate the enrichment of genotype–phenotype association in a particular gene set S. Then, i-GSEA performed label permutations to calculate nominal P-values to assess the significance of the pathway-based enrichment score and the false discovery rate (FDR) to correct multiple testing. i-GSEA was implemented by using SNP-label permutations instead of phenotype-label permutations in a classical GSEA in order to calculate nominal P-values. Moreover, i-GSEA focuses on the pathways or gene sets with the highest proportions of significant genes instead of relying solely on the total significance generated from either a few or many significant genes, thus improving the sensitivity to identify pathways or gene sets that represent the combined effects of all possibly modest SNPs/genes. Since i-GSEA uses P-values at both SNP and gene levels as input data, it was convenient to perform the pathway-based analysis in the existing GWAS dataset, especially for the GWAS gene–environment interaction analysis. For P-values at the SNP level, we mapped all SNPs within 20kb around a gene estimated precisely for the association of a gene. Considering the relatively low coverage of Illumina HumanHap300 v1.1 BeadChips, we also compared the results by mapping SNPs within 20kb around the gene and by mapping SNPs within 100kb around the gene. Next, the canonical pathways or gene sets were used for further pathway analyses. These canonical pathways were extracted and curated from Molecular Signatures Database (MSigDB) v2.5 (http://www.broadinstitute.org/gsea/msigdb/) (22). MSigDB included gene sets denoting canonical pathways integrated from a variety of online resources like KEGG (23), signal transduction knowledge environment (Science Signaling, ), BioCarta (25), GO (gene ontology) terms with high confidence (26). To reduce the multiple-testing issue and to avoid too narrow or too broadly defined pathways, we restricted our analysis to pathways with 20–200 genes as the default (27). Pathways or gene sets with FDR < 0.25 were regarded as mild confidence that the G × E interactions were enriched in a pathway; FDR < 0.05 were regarded as high confidence that the interactions were enriched in a gene or pathway.
A total of 1154 lung cancer cases and 1137 controls were included in the study. All of the cases were non-small-cell lung cancer (NSCLC) including adenocarcinoma (51.6%), squamous (26.8%) and other NSCLCs (21.5%). As shown in Table I, age distributions were significantly different but sex distribution was similar between cases and controls. There were more current smokers in controls than in cases (P = 0.005), but cases had smoked more pack-years than the controls (P < 0.001). In the models adjusted for smoking status and pack-years, we found that self-reported asbestos exposure was significantly associated with increased lung cancer risk (OR = 1.64, 95% CI = 1.36–1.97).
Overall, 307 944 SNPs were included in our analysis. The P-values for the interaction between each of SNPs and asbestos exposure are shown in a Manhattan plot (Figure 1). The top 20 significant SNPs are listed in Table II. SNP rs13383928, located in the 3ʹ flanking region of PTHR2, was the most significant SNP and had a P-value of 2.17×10–6 for the SNP-level interaction. The other top significant SNPs were distributed in several chromosomes with similar P-values; however, no SNPs reached the genome-wide significance (10–7).
In the gene-based gene–asbestos interaction analysis, 17 572 genes were mapped according to positions on the University of California, Santa Cruz Genome Browser hg18 assembly (Table III). C7orf54, located on 7q32.1, had a P-value of 8.90×10–5 and was the most significant gene. In contrast, most of the other top 20 significant genes were located at a narrow region of chromosome 11q13, but none reached statistical significance, after multiple test corrections.
As shown in Table IV, 171 112 SNPs of 307 944 SNPs were mapped to 15 961 genes that were assigned to 250 pathways. When mapping SNPs were limited within 20kb around a gene, only 4 pathways were significantly enriched with association signals and had a FDR <0.25 and nominal P-value <0.01. The top two significant pathways were the Fas signaling pathway (nominal P < 0.001 and FDR = 0.034) and the antigen processing and presentation pathway (nominal P = 0.001 and FDR = 0.055). After we performed expanded i-GSEA analyses (i.e. mapping SNPs on a gene within a 100kb flanking range around the gene), we found that the Fas signaling pathway was still the top significant pathway with a nominal P = 0.001 and FDR = 0.16 (Supplementary Table I is available at Carcinogenesis Online).
The Fas signaling pathway was annotated by the Signal Transduction Knowledge Environment database (28), and there were 57 mapped genes covered by the Texas lung cancer GWAS dataset in this pathway, of which 23 genes were significant for the pathway-level interaction (Figure 2A). The antigen processing and presentation pathway was annotated by the KEGG pathway database, and there were 59 mapped genes covered by the Texas lung cancer GWAS dataset, of which 19 genes were significant for the pathway-level interaction (Figure 2B). Although the two pathways were significant at the pathway level, most of the significant genes in the two pathways only had relatively high P-values at the SNP level (Supplementary Tables II and III are available at Carcinogenesis Online).
To the best of our knowledge, this is the first pilot study to have investigated the genome-wide gene–asbestos interactions in lung cancer based on the current GWAS design. Although we did not find statistical evidence for the hypothesized gene–asbestos interaction in the etiology of lung cancer at levels of SNP and gene in our published Texas lung cancer GWAS dataset, the pathway-based analysis suggested a clue that two pathways of immune function regulation (i.e. the Fas signaling and the antigen processing and presentation pathways) might play a role in the etiology of asbestos-related lung cancer.
It is not surprising that we failed in identifying statistical evidence of gene–asbestos exposure interaction at the SNP level, because this finding is similar to what we have seen in most published GWASs that reported results of gene–environment interactions. For example, a GWAS of metabolic traits in the Northern Finland Birth Cohort 1966 (NFBC1966) reported few SNPs that were significant in genome-wide gene–environment interaction analyses (29). Another study of the influence of genetic and environmental factors on susceptibility to rheumatoid arthritis also failed to find genome-wide significant gene–environment interaction . Of >900 GWASs that have been published to date, very few have reported significant gene–environment interactions (11), which suggests that the current GWAS design does not provide enough statistical power to detect interactions at the single SNP level, because such a design mainly focuses on the main effect of a signal SNP on risk of common, complex diseases.
The gene-based analysis of gene–environment interaction considers interactions between environmental exposures and potential biomarkers (usually SNPs) within a gene, thus leading to an increased statistical power for detecting an association (31). We identified three top significant genes, and their functions are either unknown or not well studied. For example, C7orf54 is a gene with unknown functions; LRRC4 is a tumor suppressor gene that has not been well studied but may be involved in modulating the extracellular signal-regulated kinase, protein kinase B, nuclear factor-kappaB pathway in glioma cells (32); SND1 is a component of the RNA-induced splicing complex that may lead to the degradation of specific mRNAs (33). However, there was no statistical evidence of gene–environment interaction for these three top significant genes. Interestingly, most of the other significant genes are located in a region at 11q13. For example, SNPs in RNF121 and NUMA1 have been reported to be associated with risk of Crohn’s disease, an autoimmune disease, in a large GWAS (34). C11orf59 was reported to play a role in the process of host against coronavirus infection (35). Few studies have reported any associations of SNPs at 11q13 with the development of asbestos-induced diseases; however, genetic variants in this region have been reported to be associated with risk of Crohn’s disease, type 2 diabetes and breast cancer (36–38). It is biologically plausible that top significant genes identified in our gene–asbestos interaction analyses might be related to immune function regulation, and the findings from our pathway-based analysis further supported this hypothesis.
Of the two pathways we identified, the top significant pathway was the Fas signaling pathway, in which the participating components are mainly expressed on the cell membrane of lymphocytes and involved in regulation of tissue homeostasis in the immune system by inducing apoptosis (39). Fas signaling could trigger the apoptosis of T lymphocytes and play a critical role against autoimmunity and tumor development (40,41). Although polymorphisms in the Fas signaling pathway have been reported to be associated with an increased risk of lung cancer (42), few studies have reported an interaction between genetic variations in the Fas signaling pathway and asbestos exposure in the etiology of lung cancer. But previous studies showed that mice deficient for Arf were susceptible to accelerated asbestos-induced malignant mesothelioma and that homozygous loss of the Faf1 (FAS-associated factor 1) locus deregulated the tumor necrosis factor-α-induced nuclear factor-kappaB signaling, which had previously been implicated in asbestos-induced oncogenesis (43,44). Recent work also suggested that asbestos exposure might increase FAS expression in lung tissue (45). So our finding provides a possible biological explanation that genetically determined function of the Fas-mediated apoptotic pathway may be involved in asbestos-related lung carcinogenesis.
The second significant pathway, the antigen processing and presentation pathway, is involved in the regulation of immune function, consisting of a subgroup of human immune-system genes set in the major histocompatibilty complex (46). The genes in this pathway help to discriminate and present antigens to T-cell receptors, natural killer cells and other immune cells (47), and studies have found that asbestos can disturb autoimmunity and tumor immunity, while inducing DNA damage and apoptosis in alveolar epithelial cells (48,49). This is because asbestos possesses the super antigenic capacity against T cells and the restricted overexpression of the T-cells receptor V beta without clonal expansion (50). In addition, the function of natural killer cells and regulatory T cells was depressed when they were exposed to asbestos (51). If the immune function is impaired as a result of the asbestos-induced pathogenesis, genetic variations in these pathways will modulate the risk of asbestos-related diseases, including lung cancer. The findings of gene–asbestos interactions at levels of SNP, gene and pathway are seemingly not consistent for specific genes or specific chromosome regions in this study, but it is of interest that all the interactions at these three levels seem to tend to point to the genes or pathways involved in immune function regulation, which might be one of the possible mechanisms involved in the asbestos-related carcinogenesis.
Some limitations in this study should be addressed. One limitation was the use of self-reported asbestos exposure. Our previous study had shown that risk estimates for interactions of genetic variants with self-reported asbestos exposure, self-reported occupational exposure and self-reported employment in asbestos-related industries were similar, in which all these asbestos exposure measurements were reviewed and confirmed by a certified industrial hygienist (6), although reporting bias may still exist. Second, the coverage of the genotyping chip in this study is relatively low. A single SNP may be simultaneously mapped on nearby genes in the gene–asbestos interaction analyses, which may lead to errors in the analyses at the gene and pathway levels. Third, we did not have the access to similar datasets available for us to replicate our findings. This may be due to a myriad of reasons, some of which include lack of asbestos exposure data in publicly available lung cancer GWAS datasets and/or the lack of common or comparable definitions for asbestos exposure. Furthermore, the current pathway-based analysis assumes that the local SNP only modify the function of the local gene, which is only partially true. The cis and trans regulation of the genes should be considered in the future pathway-based analysis. In addition, for the limitation of relatively small sample size, we have not performed the stratified G × E interactions analyses by histology, but the most informative gene–environment interactions between asbestos exposure and lung cancer risk could only be determined in large cohorts of patients with the histology of interest associated with that exposure. Despite these limitations, our study demonstrated the advantage of performing the pathway-based analysis that integrates biological knowledge and biostatistics approaches, thus forming a useful and important tool for analyzing data from existing GWASs and the forthcoming whole genome sequence association studies (52,53).
In summary, we performed a first pilot study to describe the genome-wide gene–asbestos exposure interaction on lung cancer
risk at levels of SNP, gene and pathway. Our findings suggest that the current designed GWASs may have limited power to detect gene–environment interactions at the SNP and gene levels, but the pathway-based approach may be more powerful in performing genome-wide gene–environment analyses.
National Institutes of Health (R01CA131274 and R01ES011740 to Q.W., R01CA055769 and R01CA127219 to M.S., CA121197 to C.I.A., U19 CA148127 to C.I.A., L.J.B, and N.E.C.); Cancer Center Core (P30 CA016672 to MD Anderson Cancer Center).
We thank Dr Jimmy Z. Liu from Queensland Institute of Medical Research for his kind help on gene-based data analysis support and Dr Kunlin Zhang from Institute of Psychology of Chinese Academy of Sciences for his support on pathway-based data analysis. We also thank Dakai Zhu for his UNIX server technical support.
Conflict of Interest Statement: None declared.