|Home | About | Journals | Submit | Contact Us | Français|
Lung cancer in lifetime never smokers is distinct from that in smokers, but the role of separate or overlapping carcinogenic pathways has not been explored. We therefore evaluated a comprehensive panel of 11,737 SNPs in inflammatory-pathway genes in a discovery phase (451 lung cancer cases, 508 controls from Texas). SNPs that were significant were evaluated in a second external population (303 cases, 311 controls from the Mayo Clinic). An intronic SNP in the ACVR1B gene, rs12809597, was replicated with significance and restricted to those reporting adult exposure to environmental tobacco smoke Another promising candidate was a SNP in NR4A1, although the replication OR did not achieve statistical significance. ACVR1B belongs to the TGFR-β superfamily, contributing to resolution of inflammation and initiation of airway remodeling. An inflammatory microenvironment, (second hand smoking, asthma, or hay fever) is necessary for risk from these gene variants to be expressed. These findings require further replication, followed by targeted resequencing, and functional validation.
From etiologic, molecular genetic, and biologic view points, it is now fairly well accepted that lung cancer occurring in lifetime never smokers is distinct from smoking-associated lung cancer (1). It is noteworthy that the top hit from all published ever smoking lung cancer GWAS, the chromosome 15q25 locus encoding nicotinic acetylcholine receptor (NAChR) subunits and a proteasome subunit, has not been implicated in lung cancer risk in never smokers (2). Nevertheless, it is likely that the two disease entities do share some molecular features suggesting separate but overlapping pathways to lung carcinogenesis (1). There is increasing evidence that pathway-based approaches to identify the genetic contribution to cancer susceptibility may provide complementary information to conventional single-marker analyses.
Of intense interest in lung carcinogenesis is the inflammation pathway, since an abnormally prolonged or intense inflammatory response could create a microenvironment that promotes lung cancer development. While tobacco-induced lung cancer is characterized by increased tissue oxidative stress and an abundant and deregulated inflammatory microenvironment (3), a similar role for inflammation in lung cancer in never smokers has not been studied in depth. We therefore evaluated a comprehensive panel of germline genetic variants in inflammatory-pathway genes in risk of lung cancer in lifetime never smokers in a discovery phase of cases and controls selected from an ongoing multi-racial/ethnic lung cancer case-control study that has recruited study participants from The University of Texas MD Anderson Cancer Center from 1995 onwards (4). We performed a replication analysis in an independent sample of never smoking lung cancer cases and controls from the Mayo Clinic (5). Lung cancer in never smokers accounts for 15% of all lung cancers in the US, yet beyond passive smoking and family history of lung cancer, there are few other well established genetic or non-genetic clues to its etiology.
For the discovery phase, there were 451 non small cell lung cancer cases and 508 controls, all lifetime never smokers (Table 1). Of these subjects, about two thirds of the cases and controls (650) were included in our previously published risk model for never smokers (6). Adenocarcinoma was diagnosed in 76% of the cases. On average, the controls were 5 years younger than the cases. Over 60% of both the cases and controls were women. The percentages of self-reported environmental tobacco smoke (ETS) exposure were 83% and 75% for the discovery cases and controls, respectively. The associations between asthma and dust exposure were not statistically significant. However, prior history of hay fever (OR = 0.70, P = 0.02), passive smoking exposure (OR = 1.59, P = 0.01), family history of 2 or more first-degree relatives with any cancer (OR = 2.24, P < 0.001), or 2 or more first-degree relatives with lung cancer (OR = 3.47, P = 0.04) all achieved statistical significance.
The replication set (Table 1) included 303 cases and 311 controls, all lifetime never smokers and well matched on age and gender. ETS exposure was reported by 67% of the cases and 56% of the controls. Passive smoking (OR = 1.56, P = 0.008) and family history of lung cancer (OR = 2.06, P = 0.0002) were significantly associated with risk. Asthma was not a risk factor in this population. Ten cases, but no controls reported a prior history of emphysema (OR = 10.6, P-value = 0.02).
A total of 11,737 SNPs were available for analysis in the Discovery phase. Table 2 summarizes the subpathways, genes, and SNPs included in the customized Illumina inflammation chip, and as outlined in Loza et al.(7). In univariate analysis, assuming an additive model, 21 SNPs were statistically significant with P-value ≤ 0.001 and BFDP levels ≤ 0.8 (Table 3).
In the replication analysis of these 21 SNPs from the Discovery phase, only one, rs12809597 in the ACVR1B gene, was concordant for direction with the discovery phase [discovery OR = 0.72 (0.59, 0.88), P = 0.0012] (Table 3) but was of borderline overall significance [replication OR = 0.80 (0.62, 1.02), P=0.069]. For women specifically, the OR in the replication population was 0.67, P = 0.0097, but was not statistically significant in men, although the numbers were small. In the combined data sets, the overall OR for rs12809597 was 0.72, P = 0.0002. For women only, the overall OR was 0.72, P = 0.0013; for men, the combined OR was 0.74, P = 0.05. A second SNP in this region, rs2701129 in the 5′ UTR of NR4A1, was strongly significant in our data (OR=0.63, p=0.0009) but did not achieve statistical significance in the Mayo Clinic data, although the OR was in the same protective direction (OR = 0.85, P = 0.36).
We also conducted stratified analysis by select variables including ETS exposure, family history of lung cancer, hay fever, and asthma (data not shown). Notably, the significant association between lung cancer risk and rs12809597 was only evident in those who reported ETS exposure, OR = 0.67, P = 7.8×10−5, compared with an OR of 0.78, P = 0.39 in those who denied ETS exposure. In the discovery data this ACVR1B SNP was significantly protective in both men [OR = 0.47 (0.30–0.73), P = 0.0010], and women with ETS exposure [OR = 0.74 (0.54–1.01) P = 0.0543]. In the replication, this pattern was only evident in women with ETS exposure [OR = 0.60 (0.41–0.88), P = 0.009. It is noteworthy that there were only 83 male cases in the replication set, and power is therefore limited for these subset analyses. Likewise, rs2701129 in NR4A1 was only statistically significant in ETS-exposed subjects in the discovery set [OR = 0.61 (0.43–0.87), P = 0.0068]. We also noted a greater significant effect for NR4A1 (OR = 0.31, P = 0.0081) in those with asthma (the risk group for lung cancer in never smokers), compared to those who denied having asthma (OR = 0.69, P = 0.0165). However, although we did not note a similar pattern in our data for ACVR1B, we saw an identical pattern for the ACVR1B SNP in the Mayo Clinic data for those with and without asthma (OR = 0.39, P = 0.02 vs. OR = 0.86, P = 0.27, respectively).
It is also of interest to note that in the discovery set, in those who denied having suffered from hay fever (i.e. the risk group), the ORs were significantly protective for both ACVR1B (OR = 0.70, P = 0.0026 and NR4A1 (OR = 0.54, P = 0.0003). We did not have comparable data for analysis in the replication set. We have previously reported (8) that paradoxically, those with both conditions (asthma and hay fever) had a significantly elevated lung cancer risk (OR = 2.43, 95% = 1.11–5.35). It is in this subgroup (asthma and hay fever) that we detected the greatest protective effect with NR4A1, (OR = 0.28, P = 0.04).
We hypothesized that polymorphisms in genes directly associated with ACVR1B might contribute to the risk noted for the ACVR1B SNP. Therefore, we used an in silico approach, Pathway Studio (9), to identify upstream regulators and downstream targets of ACVR1B. Direct interactions between genes, i.e., direct regulation of gene expression, protein/protein binding, or binding to the promoter region were used to construct the network. Based on these criteria we identified 25 upstream regulators and 39 downstream targets of the ACVR1B gene. In this study, we had genotype data for 11 upstream regulators and 16 downstream targets. Of these, none were nominally significant at P < 0.05 in additive models.
Imputation was performed to increase coverage of SNPs in the region surrounding rs12809597 in the ACVR1B gene for their association with lung cancer risk. Genotyped SNPs in the region 1 Mb from each side of the ACVR1B gene range were retrieved (from 49.53 Mb to 51.68 Mb in build 36 positions). Before imputation, we identified three A/T or C/G SNPs that were in opposite strand orientation to the strand of the 1000 Genomes Project reference data based on comparisons of minor allele frequencies. The strands for these three SNPs were flipped before imputation. MACH version 1.016 was used for imputation and options with the 1000 Genomes Project March 2010 release CEU data as the reference panel (Fig 1). Before imputation there were 30 genotyped SNPs, 23 of which were between 50.58 Mb and 50.74 Mb. After imputation, 156 SNPs exhibited r2 > 0.8 and MAF > 0.01 and were adequately reliably imputed between 50.58 Mb and 50.74 Mb. Best-guessed genotypes were used in the analysis. The most likely candidate SNP, rs1882119 (P=1.76 × 10−4), an imputed SNP (r2 = 0.9849)in this region is in an intron of NR4A1, not ACVR1B. On the other hand, rs2701129 (P =1.96 × 10−4)was directly genotyped. Because the r2 for rs12809597(ACVR1B) and rs2701129(NR4A1) was only 0.013, we further investigated relevant SNPs in NR4A1.
In parallel with the ACVR1B analysis described above, we identified 170 upstream/downstream genes related to NR4A1, of which 65 genes and 568 SNPs had been included in our inflammation panel. Of these, 17 SNPs had P-values < 0.01 in univariate analysis assuming an additive model (Table 4). Five of these SNPs, (NR4A2, NR4A1, TP53, BCL2, and MAP2K2) based on P-values < 0.05 remained statistically significant in models using logistic regression forward or stepwise selection procedures, and with controlling for age, sex, secondhand smoking exposure, and family history of lung cancer (Table 5).
Our original risk model was constructed based on 709 never smokers (330 lung cancer cases and 379 controls) (6). Of the total of 959 never smokers in this new analysis, 650 (68%) overlapped in both analyses. The published AUC for never smokers in that model was 0.57. The point estimate of the AUC for those not included in our original study (N = 309) was 0.56. The AUC statistic for the baseline model in the entire discovery dataset, incorporating the same clinical and epidemiologic variables (age, gender, family cancer history of lung cancer, and ETS exposure) was 0.62, data not shown. With the addition of the replicated SNP, rs12809597, the AUC increased to 0.64, P = 0.098. The comparable model for the Mayo Clinic data with addition of rs12809597 yielded an AUC of 0.60. The same analysis for the discovery data, adding in the NR4A1 SNPs and upstream and downstream regulators yielded an AUC of 0.68 (P = 0.0005), data not shown.
We also summed the number of adverse alleles (ACVR1B, NR4A1, and upstream and downstream regulators) and evaluated the distribution of cases and controls across different strata to determine the cumulative risk in the discovery set (Table 6). Compared to the lowest risk stratum (0–6 risk alleles), the risks increased to an OR of 2.21, P = 0.0272 for 7 risk alleles; OR = 3.26, P = 5.0 × 10−4 for 8 risk alleles and OR = 5.28 for 9 or more risk alleles (P = 3.9 × 10−7 (Table 6). There was a 46% increase in risk for each adverse allele and the P-value for trend was 1.18 × 10−9 (Table 6). Six percent of cases and 13% of controls were in the lowest risk stratum compared with 50% and 35% in the highest risk stratum, respectively.
In this two-stage candidate pathway analysis of inflammation gene variants, we were able to replicate one variant (rs12809597) in the Activin receptor type-1B (ACVR1B)/Activin receptor-like kinase 4 (ALK4) gene that was significantly associated with lung cancer risk in lifetime never smoking cases. This risk was most prominent in women and in those risk subgroups who reported adult exposure to ETS, prior asthma, or no prior hay fever. Further analysis of SNPs 1 Mb from this polymorphism suggested that another promising target was in the 5′ UTR of the Nuclear receptor subfamily 4 group A member 1 (NR4A1) gene, although the OR in the replication Mayo Clinic data did not achieve statistical significance and the association we detected could be attributed to chance.
Inflammation is a complex host defense against biological, chemical, physical, and endogenous irritants. Innate immunity is mediated by a variety of secreted pro-inflammatory cytokines. The inflammation is resolved by anti-inflammatory cytokines. Chronic inflammation results from a dysfunction of these negative regulatory mechanisms (10). While smoking (and perhaps to a lesser extent passive exposure) is the obvious cause of a chronic inflammatory milieu in the lung parenchyma and bronchial epithelium, there are other likely precipitating factors, including infection, inhaled particulate exposures, and pulmonary scarring (11) that can lead to oxidative stress and an inflammatory response, even in non-tobacco-exposed subjects who develop lung cancer. It remains plausible, therefore, that inflammation gene polymorphisms could be important in lung cancer risk in lifetime never smokers as well.
Elevated pre-diagnostic C-reactive protein (CRP) levels, a systemic, but non-specific, marker of chronic inflammation, have been associated with subsequent lung cancer risk (12) with evidence of a dose-response relationship. Conversely, use of non-steroidal anti-inflammatory drugs (NSAID) has been associated with decreased lung cancer risk in some (13–16), but not all studies (17–19). Few of these studies have specifically evaluated the risk in lifetime never smokers, although in one cohort analysis (13) the strongest effect for total NSAID use was for long-term former smokers.
Activin receptor type-1B is a protein encoded by the ACVR1B gene with alternate splicing resulting in multiple transcript variants. Our SNP of interest, rs12809597, is intronic and no function has been reported for this SNP, although it is possible that this tagSNP may be linked to other causal SNP(s) in the gene that affect expression or function. ACVR1B, also known as ALK-4, acts as a transducer of activin or activin-like ligands that are growth and differentiation factors belonging to the transforming growth factor-β(TGF-β) superfamily of signaling proteins, essential regulators of proliferation and apoptosis, and key regulators of inflammation and angiogenesis. Activins signal through a heteromeric complex of receptor serine kinases which include at least two type I (I and IB) and two type II (II and IIB) receptors (20). Activin complexes with ACVR1B and recruits SMAD2 or SMAD3, members of the SMAD family of transcriptional coregulators. ACVR1B has been shown to be mutated in pancreatic tumors (21), and activin signaling mediates growth inhibition and cell cycle arrest in breast cancer cells (22). Moreover, differential expression of this gene has been found in the epithelial cells of a subset of smokers with lung cancer (23) and in bone marrow micrometastases from lung cancer patients (24) although the relevance of the gene’s deregulation in lung cancer is not entirely clear. Whole-genome microarray analysis of ACVR1B expression in large airway epithelial cells indicated some reduction in expression among normal smokers compared to nonsmokers (25) suggesting the possible impact of cigarette smoke exposure on activin signaling. It is therefore of interest that risk from the variant was most apparent in ETS-exposed subjects. Activins have also been implicated in the in etiology of fibrotic diseases (26) and are upregulated during the fibrotic response in vivo (27).
Both the TGF-β and activin signaling pathways are activated upon allergen provocation in asthma and may contribute to the resolution of inflammation and initiation of airway remodeling after allergen challenge (28). Activin may also act as an inhibitor of cytokine-induced proinflammatory chemokine release from the airway epithelium. Activin-A is rapidly induced in TH2 cells upon T-cell activation and may also function as a TH2 immunomodulatory cytokine (29). An enhanced TH2 immune response contributes to the induction of allergy and asthma.
We have previously shown that self-reported, physician-diagnosed asthma was significantly associated with risk of lung cancer in lifetime never smokers who were a subset of this larger analysis (OR = 1.82) with evidence of a dose-response pattern for duration (P = 0.007 for trend) (8), although this pattern was not evident in our discovery data. In their meta-analysis, Santillan et al. (30) also found asthma to be a significant risk factor for lung cancer in never smokers. Our data have also demonstrated a protective effect of prior hay fever on lung cancer risk in never smokers (6). Cockcroft et al. (31) suggested that patients with respiratory atopy appeared to have some degree of protection against developing malignancies of endodermal origin, attributable to enhanced immune surveillance in a stimulated immune system.
It was of special interest that the most significant odds ratio for the ACVR1B SNP was obtained in the subset of cases and controls that reported adult exposure to ETS, although these subset analyses are based on small sample sizes. No such association was evident in those who denied such exposure. It could be argued that an inflammatory microenvironment is more likely to exist in those exposed passively to tobacco smoke, and that exposure is necessary for the impact of the gene variant to be apparent.
The association with NR4A1 (also known as Nur77) is intriguing, but must be viewed with considerable caution. It is an orphan receptor within the nuclear hormone receptor superfamily and a potent inhibitor of NF-kappa B activation (32). NRA41 is overexpressed in patients with atopic dermatitis compared with healthy volunteers (33). Protective effects of the NR4A1 SNP were also largest in putative risk subgroups (asthma, no prior hay fever).
There was a 1% increase in the AUC (0.64) in an expanded clinical and epidemiologic risk model incorporating the ACVR1B SNP and an additional 5% (0.68) when we also added the upstream and downstream targets of NR4A1. These improvements in risk prediction incorporating these genes were statistically significant. The final AUC of 0.68 is similar, but the incremental improvement in AUC larger, to that obtained from a risk prediction model of lung cancer in ever smokers in which we incorporated top lung cancer GWAS hits, the chromosome 15q nicotinic receptor gene cluster (tag SNP rs1051730 G>A), and two SNPs from the 5p15.33 region (rs2736100 and rs401681) (34). However, higher AUC values are desirable for the model to have clinical utility and for any public health impact or recommendation, especially since the incidence of lung cancer in never smokers is substantially lower compared with that in ever smokers.
In a parallel analysis, rs2701129 was associated with an OR of 0.78, P = 0.014 in 1096 cases and 727 controls, all ever smokers, whom we have genotyped using the same Illumina platform, although rs12809597 was not a risk predictor. rs12809597 was not directly genotyped in the GWAS and the r2 value was not sufficiently robust for imputation. rs2701129 was genotyped in GWAS, but was not statistically significant.
Although the chemical constituents of sidestream (SS) and mainstream (MS) smoke are qualitatively the same, differences in pH, combustion temperature, and degree of dilution with air contribute to quantitative differences in their chemical composition and their emission rates. For example, nitrosamines and other carcinogens are present in greater concentrations in SS than in MS. (35). The ever smoker cases for GWAS also differ from the never smoker cases in this analysis. For example, over 25% of our ever smoker cases report preexisting chronic obstructive pulmonary disease (COPD) that is almost non-existent in never smokers. One could therefore hypothesize that the pathogenic processes for smokers and never smokers are not equivalent, although certain etiologic pathways could be shared, such as the involvement of inflammation.
We acknowledge the limitations of this study, and the challenge in drawing causal inferences from association analyses. There were relatively small sample sizes both for the discovery and replication sets, and this problem is exaggerated in subset analyses. We also relied on self-reported questionnaire data for assessment of ETS exposure, raising the potential for both misclassification and recall bias. Nevertheless, for residential exposure to ETS, most studies in the past have confirmed that self-reports were generally reliable (36), and practical approaches to alternative measurement of ETS exposure decades prior to onset of lung cancer have not been established. In national survey data, the accuracy of self-reported secondhand smoke (SHS) exposure in the work, home, or home and work ranged from 87% to 92%, although workers reporting no SHS exposure were only 28% accurate (37). Thus there could be underreporting of ETS exposure but overreporting is less likely.
In summary, this analysis used a candidate pathway approach to comprehensively evaluate SNPs in inflammation genes as predisposing to lung cancer risk in lifetime never smokers. We replicated a SNP in the TGF-β family in ETS-exposed patients or those with inflammatory/allergic conditions, and using in silico analyses, we were able to identify upstream and downstream SNPs of our target SNPs that further contributed to risk. Recent progress in identification of novel SNPs, especially those generated from the 1000 Genomes Project, have identified several polymorphisms in the ACVR1B gene which could be candidates for causal variants. Those SNPs include 6 polymorphisms located in the coding region of the ACVR1B: rs34488074, rs114081852, rs117020497, rs114735080, rs77643569, and rs34050429. We plan to include these SNPs in the next phase of our targeted sequencing studies.
This analysis focuses on lung cancer cases and controls who reported themselves to be lifetime never smokers, i.e. smoked less than 100 cigarettes over a lifetime. Cases for the discovery phase were consecutive Caucasian patients with newly diagnosed, histopathologically confirmed, and previously untreated non small cell lung cancer with no age, gender, ethnicity, tumor histology, or disease stage restrictions. Medical history, family history of cancer, adult environmental tobacco exposure history, and occupational history were obtained through an interviewer-administered risk-factor questionnaire. We did not validate self reports of passive smoking exposure. Case exclusion criteria for the study included prior chemotherapy or radiotherapy or recent blood transfusion.
We recruited our control population from the Kelsey-Seybold Clinic, Houston’s largest multidisciplinary physician practice. Potential controls were first surveyed with a short questionnaire for their willingness to participate in research studies and provide preliminary data for matching demographic characteristics with those of cases (4). Controls were frequency matched to the cases on the basis of age (±5 years), sex, smoking status, and ethnicity. Exclusion criteria were similar and also included no prior cancer. To date, the response rate among both the cases and controls has been approximately 75%. Upon receiving informed consent, a 40 mL blood sample was drawn into coded, heparinized tubes from study participants. Genomic DNA was extracted from peripheral blood lymphocytes and stored at −80° C.
The replication phase was conducted among never smoking cases and controls recruited between January 1997 and September 2008 and who were included in a published GWAS (5). These lifetime never smoking lung cancer cases were recruited from the Mayo Clinic and community residents who were never smokers were selected as controls and matched to the patients according to age, sex, and ethnic background. Personal interviews with structured questionnaires were used to elicit demographic, epidemiologic, and exposure data. Institutional review board approval was obtained from the MD Anderson Cancer Center, Kelsey-Seybold Foundation (Houston, Texas), and Mayo Clinic (Rochester, MN).
Candidate genes for the discovery phase were selected based on the following criteria. We searched the Gene Oncology database (38) and the National Center for Biotechnology Information (NCBI) PubMed (39) to identify a list of inflammation pathway-related genes. For each gene, we selected haplotype tagging SNPs (htSNPs) located within 10 kb upstream of the transcriptional start site or 10 kb downstream of the transcriptional stop site based on data from the International HapMap Project (40) release 24/Phase II. Using the LD select program (41) and the UCSC Golden Path Gene Sorter program (42), we further divided identified SNPs into bins based on an r2 threshold of 0.8 and minor allele frequency (MAF) greater than 0.05 in Caucasians to select tagging SNPs. We also included SNPs in the coding (synonymous SNPs, nonsynonymous SNPs) and regulatory regions (promoter, splicing site, 5′ UTR, and 3′ UTR). Functional SNPs and SNPs previously reported to be associated with cancer were also included. We also extensively used the inflammation pathway gene list and functionally-defined subpathways as outlined in Loza et al. (7), which suggested that variants in multiple genes in inflammation pathways may likely cooperate in additive or synergistic ways to impact disease risk. The complete set of selected SNPs was submitted to Illumina technical support for Infinium chemistry designability, beadtype analyses, and iSelect Infinium Beadchip synthesis.
Of the total number of selected SNPs, 2.9% could not be designed due to designability score failure. An additional 12% could not be incorporated into the beadchip due to manufacturing issues (within the norm stated by Illumina). Overall, slightly less than 15% of all SNPs were not designed. We did not seek surrogates for failed SNPs, because of the relatively low failure rate for designability (<3%) and constraint on the total number of beadtypes for the custom chip design.
A total of 19,949 SNPs were genotyped in the discovery samples using Illumina’s Infinium iSelect HD Custom Genotyping BeadChip according to the standard 3 day protocol (San Diego, CA). Of these, 11,930 SNPs were in inflammation pathways and the remaining SNPs were identified from ongoing GWAS for further query in separate analyses. Genotypes were autocalled using the BeadStudio software. Any SNP with a call rate lower than 95% was excluded from further analysis. (n=203). A further 27 SNPs were removed due to difference in genotype between the original and the duplicate sample (error rate). We also deleted 93 SNPs that were at the same chromosomal position and 89 SNPs with MAF=0. The final data set included 19,537 SNPs, of which 11,737 SNPs were in the inflammation pathway.
For the Mayo Clinic samples, whole genome amplification (WGA) was performed before SNP genotyping. The WGA was set up in four separate reactions, each of which included 25 ng of genomic DNA and standard amplification procedures with a total reaction volume of 25 μl (REPLI-g Midi Kit, Qiagen). After WGA, the four reactions were pooled, mixed, and quantified by the picogreen method. Genotyping was performed in Dynamic Arrays (Fluidigm, CA) containing integrated fluidic circuits (IFCs). 75 ng of the WGA-DNA was pre-amplified using 0.2X primer multiplex of the source primers. 2.3 μl of pre-amplified DNA was then loaded onto the array. 3 μl of each Applied Biosystems TaqMan genotyping assay in a 5 μl assay reaction volume was loaded onto the array. The assay was run for 40 PCR cycles under vacuum pressure. The endpoint read was performed on an EP1 machine using a CCD camera to detect VIC and FAM dyes. SNP Genotyping Analysis Software was used to autocall SNP genotype clusters with a confidence of 95%. The specific SNPs identified from this pathway-based analysis were not included in the Mayo GWAS chip (5) and were directly genotyped for this analysis. The never smoker GWAS with the Mayo Clinic samples had a rather limited sample size and an additional GWAS in never smokers is underway including our discovery set of never smoking cases and controls.
Pearson’s χ2 test was used to assess the differences in categorical variables and t-tests were used for continuous variables in both discovery and replication data sets. All tests were two-sided. For each SNP, Hardy-Weinberg equilibrium was assessed among controls using a chi-squared test. To assess case-control associations of SNP genotypes with lung cancer risk we used unconditional logistic regression, implemented using SAS/Genetics version 9.2. Single-SNP association tests were carried out using PLINK 1.07 (43).
We applied the Bayesian false discovery probability test (BFDP) (44) to evaluate the chance of obtaining a false positive association. This approach calculates the probability of declaring no association given the data and a specified prior on the presence of an association, and has a noteworthy threshold that is defined in terms of the costs of false discovery and non-discovery. Four levels of prior probability of 0.01, 0.03, 0.05, and 0.07 and odds ratios from 1.3 through 2.0 were tested and selected levels of noteworthiness for BFDP were set at 0.8 (i.e. false non-discovery rate is four times as costly as false discovery). We used the most conservative prior of 0.01 to determine that the association was unlikely to represent a false-positive result.
In stratified analyses, we used logistic regression to examine associations of selected SNPs with lung cancer case-control status for subgroups of subjects defined by sidestream tobacco exposure, history of hay fever, asthma, or family history of lung cancer, comparing each sub-group of cases against controls within that subgroup.
We also performed a stepwise forward logistic regression analysis in which we allowed significant univariate SNPs to enter a model according to the strength of association, provided they showed association with disease (P < 0.05). SNPs were retained for analysis if they continued to show association (P < 0.05) given other SNPs in the model. Linkage disequilibrium (LD) between SNPs was calculated for cases and controls using PLINK before all the SNPs were entered into the model. If two SNPs were in high LD (r2 ≥ 0.8), only one SNP was entered into the model. Linkage disequilibrium was visualized using Haploview v. 4.1 (45) to summarize r2 statistics.
For the replication analysis we included all SNPs that were statistically significant at P-values < 0.001 and BFDP levels ≤ 0.8 with prior probability of 0.01. For risk model construction, we retained all epidemiologic variables that were components of our published risk prediction model for never smokers (6). However, since the Mayo Clinic study did not have data available on prior hay fever, we elected to leave this variable out of the model. For each risk model, we calculated specificity and sensitivity of the resulting logistic regression model by constructing receiver operator characteristic (ROC) curves and calculating the area under the curve (AUC) statistic to estimate the models’ ability to discriminate between patients and controls for the two populations separately and combined. Approximate 95% confidence intervals for the AUC were calculated assuming a binegative exponential distribution using SAS statistical software. An AUC of 0.5 indicates chance prediction (equivalent to a coin toss), while a statistic of 0.7 or higher indicates good discrimination. We also constructed expanded models that included any replicated SNPs. We performed pairwise comparisons of AUCs of the baseline multiple logistic model and the expanded model including genetic data using a contrast matrix to evaluate differences of the areas under the empirical ROC curves (46).
Beyond passive smoking and family history of lung cancer, little is known about the etiology of lung cancer in lifetime never smokers that account for about 15% of all lung cancers in the United States. Our two-stage candidate pathway approach examined a targeted panel of inflammation genes and has identified novel structural variants that appear to contribute to risk in patients who report prior exposure to sidestream smoking.
Financial support; National Cancer Institute (CA55769 (MRS), CA127219 (MRS), CA80127 (PY), CA84354 (PY), U19CA148127 (CIA), CA121197 (CIA), CA123235 (CJE), CA131327 (CJE), CA149462 (OG), Kelsey Seybold Research Foundation, and Mayo Foundation Fund
Disclosure of Potential Conflicts of Interest: The authors declare that they have no competing financial interests. None of the sponsors played a role in the study design, collection, analysis, and interpretation of the data, in the writing of this report, or in the decision to submit the paper for publication.