|Home | About | Journals | Submit | Contact Us | Français|
Black/white disparities in lung cancer incidence and mortality mandate an evaluation of underlying biological differences. We have previously shown higher risks of lung cancer associated with prior emphysema in African American compared with white lung cancer patients.
We therefore evaluated a panel of 1440 inflammatory gene variants in a two phase analysis (discovery and replication), added top GWAS lung cancer hits from Caucasian populations, and 28 SNPs from a published gene panel. The discovery set (477 self-designated African Americans cases, 366 controls matched on age, ethnicity, and gender) were from Houston, Texas. The external replication set (330 cases, 342 controls) was from the EXHALE study at Wayne State University.
In discovery, 154 inflammation SNPs were significant (P<0.05) on univariate analysis, as was one of the gene panel SNPs (rs308738 in REV1, P=0.0013), and three GWAS hits, rs16969968 P=0.0014 and rs10519203 P=0.0003 in the 15q locus and rs2736100, the HTERT locus, P=0.0002. One inflammation SNP, rs950286, was successfully replicated with a concordant odds ratio of 1.46(1.14-1.87) in discovery, 1.37(1.05-1.77) in replication, and a combined OR of 1.40 (1.17-1.68). This SNP is intergenic between IRF4 and EXOC2 genes. We also constructed and validated epidemiologic and extended risk prediction models. The AUC for the epidemiologic discovery model was 0.77 and 0.80 for the extended model. For the combined datasets, the AUC values were 0.75 and 0.76, respectively.
As has been reported for other cancer sites and populations, incorporating top genetic hits into risk prediction models, provides little improvement in model performance and no clinical relevance.
Most family-based, candidate and genome wide studies of lung cancer etiology have focused on Caucasians or Asians, yet there is growing recognition of the disparity between blacks and whites in lung cancer rates, treatment options, and outcomes (1-4). African American (AA) men tend to smoker fewer cigarettes per day than their Caucasian counterparts, yet they exhibit higher incidence rates of lung cancer (75 per 100,000 vs. 64 per 100,000), higher mortality rates and their five-year survival rate is 12 percent versus 15 percent for whites (2). This disproportionate burden of lung cancer in African Americans requires a concerted approach to examine underlying biological differences.
There is strong evidence of the role of an inflammatory microenvironment in lung cancer etiology. (5) Epidemiological studies have shown that lung cancer risk in humans is strongly associated with radiographic evidence of parenchymal lung destruction (emphysema). It is well established that tobacco-induced chronic obstructive airways disease is characterized by a sustained inflammatory reaction in the airways and lung parenchyma. Our data have shown higher risks of lung cancer associated with prior emphysema in African American compared with white lung cancer patients (6).
While the role of nicotine dependence genes and specifically those in the region of extensive linkage disequilibrium on chromosome 15q25.1-15q25.1 region have been studied in detail in cross ethnic studies that have included African American populations (7), other genetic factors associated with lung cancer in Caucasians have been less extensively explored. We therefore evaluated a comprehensive panel of inflammatory gene variants in risk of lung cancer in a two phase analysis (discovery and replication) and added other selected variants that have been implicated in lung cancer risk in Caucasian populations.
We have previously constructed a risk prediction model for African Americans that included smoking-related variables [smoking status, pack-years smoked, age at smoking cessation (former smokers), and number of years since smoking cessation (former smokers)], self-reported physician diagnoses of emphysema or hay fever, and exposures to asbestos or wood dusts (6). In this analysis we also added significant genetic variants to the model to see if they improved model performance, and validated the epidemiologic and extended models in the external population used for replication of the genetic variants.
The discovery set of self-designated AA cases and controls was derived from an ongoing multi-racial/ethnic lung cancer case-control study (8, 9), and were included in the construction of the model noted above (6) as well as in a previous detailed analysis of a region of extensive linkage disequilibrium on chromosome 15q25.1 (7). Cases were consecutive patients at MD Anderson Cancer Center and the Lyndon Baines Johnson County Hospital recruited from 8/30/95 to 11/5/2009 who presented with newly diagnosed, histopathologically confirmed and previously untreated lung cancer with no age, gender, tumor histology, or disease stage restrictions. Medical history, family history of cancer, smoking habits, and occupational history were obtained through an interviewer-administered risk-factor questionnaire. Institutional review board approval at M. D. Anderson Cancer Center was obtained for this study. Case exclusion criteria (designed for the parent study that employed functional lymphocyte-based assays) included prior chemotherapy or radiotherapy within the past six months, or recent blood transfusion. There were 477 subjects for whom DNA was available for genotyping.
Controls were selected to be free of previous cancers (excluding non melanoma skin cancer) and were recruited (366 controls) from the Kelsey-Seybold Foundation, Houston's largest multidisciplinary physician practice, and from local community centers. Controls were frequency matched to the cases on age, ethnicity and gender. Never smokers were defined as those who had smoked <100 cigarettes in their lifetimes; former smokers were those who had quit smoking >1 year before diagnosis (cases) or interview (controls); and current smokers included those who had quit smoking within the past 12 months. The overall response rate was about 75%. Since the genetic analyses were performed on a subset of all the African American cases and controls, the original study design criteria were not met for age and gender.
The external replication set (330 cases and 342 controls) was collected as part of the EXHALE study at Wayne State University (10). Cases were identified through the population-based Metropolitan Detroit Cancer Surveillance System, an NCI-funded SEER registry. Rapid case ascertainment was used to identify histologically-confirmed cases within several months of diagnosis. African-Americans diagnosed with a first primary lung cancer from November 1, 2005 through June 30, 2010 were recruited for the study. Controls were gathered through community-based recruitment and were frequency matched on age (±5 years), sex, and African-American ethnicity. (10) Institutional Review Board approval was obtained through Wayne State University.
For this analysis in African Americans, we included for analysis the top 598 SNPs (p<0.05) selected from interrogating a comprehensive panel of 11737 inflammation pathway SNPs in Caucasian ever smokers with lung cancer (11), using an Infinium iSelect BeadChip, details of which have been previously published. We added an additional 842 SNPs from inflammation pathways that were nominally significant at P<0.05 in our GWAS data in Caucasians. We also included 28 SNPs from the gene panel of Young et al (12) that includes variants in metabolism of smoking-derived carcinogens (NAT 2 and CYP2E1), inflammatory cytokines (Interleukins 1, 8 and 18, Tissue necrosis factor alpha1 receptor, Toll-like receptor 9), smoking addiction (dopamine D2 receptor and Dopamine transporter 1), anti-oxidant response to smoking (α1 anti-chymotrypsin and extracellular superoxide dismutase), cell cycle control, DNA repair and apoptosis (XPD, p73, Bcl-2, FasL,Cerb1 and REV1) and integrins implicated in apoptosis.
Finally we added the top hits from Lung GWAS in Caucasian populations on chromosome 15q25 and chromosome 5p15 (13, 14). The total set of SNPs was sent to Illumina (San Diego, CA) for custom iSelect Infinium design, out of which 1773 (85%) were successfully designed. Genotyping was performed on Illumina's iScan platform according to manufacturer's standard protocol. Genotypes were autocalled using the BeadStudio software. We excluded any SNP with a call rate lower than 95% (n=6) and 13 SNPs with MAF<0.01. All replicated SNPs derived from 27 duplicate samples were concordant. . The final set included 1754 SNPs. Genotyping for the replication set was performed using a 5′-nuclease assay (TaqMan, Applied Biosystems, Foster City, CA).
Pearson's χ2 test was used to assess the differences in categorical variables and t- tests were used for continuous variables in both discovery and replication data sets. All tests were two sided. To assess case-control associations of SNP genotypes with lung cancer risk we used unconditional logistic regression, implemented using SAS/Genetics version 9.2, with and without adjustment for age, sex, pack-year and family history of cancer in first degree relatives. Single-SNP association tests were carried out using PLINK 1.07 (http://pngu.mgh.harvard.edu/purcell/plink/) (15).
We applied the Bayesian false discovery probability test (BFDP) (16) to evaluate the chance of obtaining a false positive association. This approach calculates the probability of declaring no association given the data and a specified prior on the presence of an association, and has a noteworthy threshold that is defined in terms of the costs of false discovery and non-discovery. Four levels of prior probability of 0.01, 0.03, 0.05 and 0.07 and prior odds ratios of 1.2, 1.3, 1.4 and 1.5 were tested and selected levels of noteworthiness for BFDP were set at 0.8, i.e. false non-discovery rate is four times as costly as false discovery. Since we are employing a candidate pathway rather than a hypothesis generating approach, we believe that an OR of 1.5 is reasonable. We used a conservative prior of 0.05 to determine that the association is unlikely to represent a false-positive result.
In stratified analyses, we used logistic regression to examine associations of selected SNPs with lung cancer case-control status for subgroups of subjects defined by gender, packyears, histology, history of emphysema or family history of lung cancer, comparing each sub-group of cases against controls within that subgroup.
We incorporated exposure variables (age, sex, cigarette pack years, prior self-reported history of emphysema as diagnosed by a physician, hay fever, and asbestos and wood dust exposures) that were components of our published risk prediction model for African Americans (6) in a logistic regression model.Wood dust exposure fell out of the final model. We also performed a backward selection logistic regression analysis in which we allowed significant univariate SNPs to remain in the model according to the strength of association, provided they showed association with disease (P<0.05). SNPs were retained for analysis if they continued to show association (P<0.10) given other SNPs in the model. Tests for SNP by SNP interaction were evaluated using logistic regression analysis.
For risk model construction and validation, we used the discovery and replication sets separately and combined to explore both epidemiologic and extended models with genetic variables added. For each risk model, we calculated specificity and sensitivity of the resulting logistic regression model by constructing receiver operator characteristic (ROC) curves and calculating the area under the curve (AUC) statistic to estimate the model's ability to discriminate between patients and controls for the two populations separately and combined. Approximate 95% confidence intervals for the AUC were calculated assuming a binegative exponential distribution using SAS statistical software. An AUC of 0.5 indicates chance prediction (equivalent to a coin toss), while a statistic of 0.7 or higher indicates good discrimination. We performed pairwise comparisons of AUCs of the baseline multiple logistic model, and the expanded model including genetic data using a contrast matrix to evaluate differences of the areas under the empirical ROC curves (17). For validation of the model we included all demographic variables and SNPs that were selected in the final model from the discovery set. We used the estimates derived from the discovery set to fit this replication set and calculated the AUC, positive predictive value (PPV) and misclassification rate.
In the discovery set, the cases were significantly more likely to be male and older than the controls, reflecting incomplete matching (Table 1) since the genetic analyses were performed on a subset of all the available cases and controls,. The replication set was well matched on gender and age. In both groups, the cases were significantly more likely to be ever smokers and report heavier smoking histories than their respective controls (P<0.001). The mean number of cigarettes smoked per day was 20.7 for the discovery cases versus 19.3 for the replication cases. In our parallel case-control study in whites, the average number of cigarettes per day was 28.1 in cases compared with 26.4 in white controls (P ≤ 0.001) (17). Adenocarcinoma was the most common histological diagnosis followed by squamous cell cancer. We performed pair-wise analysis for the significant SNPs in the model, and no interactions were found to be statistically significant at p=0.05 level.
There were 154 inflammation pathway SNPs that were statistically significant (P<0.05) on univariate analysis. In addition, one of the 28 SNPs from the Young et al. (11) panel of SNPs was significant (rs308738 in REV1 on 2q11.1-q11.2, P=0.0013), as were three of the GWAS top hits, rs16969968 P=0.0014 and rs10519203 P=0.0003 in the 15q locus (r-squared=0.136), and rs2736100, the HTERT locus on chr 5 P=0.0002, data not shown.
For replication, we selected the top inflammation hits (p<0.01 n=27) from the discovery analysis as well as the top inflammation pathway SNPs from our discovery analysis of inflammation hits (p<0.05, n=33) in white ever smokers (11). Four inflammation SNPs were statistically significant in the replication analysis (Table 2), but only one was successfully replicated with a concordant odds ratio and after controlling for false discovery. rs950286 was associated with significantly elevated risks of 1.46(1.14-1.87) in the discovery set and 1.37(1.05-1.77) in the replication set. The combined risk estimate was 1.40 (1.17-1.68), P=0.002. This SNP was significant in ever but not never smokers and in those with adenocarcinoma, but not squamous cell cancer. There were no differences by gender or family history. The risk estimate was higher in those with self-reported emphysema than those without, (2.87 vs 1.47), but the former was not significant (P=0.19). In addition, rs7124327 was of borderline significance in the discovery set (OR=0.72(0.52-1.01), was not significant in the replication set (OR=0.85(0.60-1.20)), but reached statistical significance in the combined dataset (OR=0.77(0.61-0.98), P=0.0331).
In multiple logistic regression analysis of the combined population groups (Table 3), pack years [OR=1.03(1.02-1.03), P= <0.0001]; asbestos exposure [OR=1.64(1.24-2.16), P=0.005]; emphysema [OR= 3.46(2.18-5.48), P= <0.0001]; prior hay fever [OR=0.66(0.49-0.89),P=0.0072]; and family history of cancer [OR=1.17(1.05-1.31), P=0.004] were all statistically significant. Next we added the significant and borderline significant inflammation SNPs (rs950286 and rs7124327), together with the chromsome 15 and chromosome 5 SNPs and Rev1 to the logistic regression model (Table 3). With the exception of one SNP that was not significant, (rs3087386, P=0.3996), all others remained statistically significant.
For the risk prediction models, (Table 4) we computed the area under the curve values (C statistic) for a baseline epidemiologic model incorporating variables that were available in both data sets (age, gender, pack years, asbestos exposure, self-reported emphysema and hay fever, as well as family history of cancer in a first degree relative). The replication data lacked information on prior hay fever and we used the available variable of self-reported allergies. The AUC for the discovery set was 0.77 (0.74-0.81). When we added the SNPs to the epidemiologic model, the AUC increased to 0.80 (0.77-0.84), P= 0.0014. In the replication set, the AUC for the extended model was 0.68, with a positive predictive value of 68%. Using only the epidemiologic data, the PPV was 72%. If we conservatively removed the variable of hay fever/allergies and rs3087386, that was not significant in multivariate analysis, the PPV for the validation set was 71%, and the misclassification rate was 36%, data not shown. Adding the genetic data to the replication epidemiologic model actually reduced model performance (AUC=0.67). Combining the data for both sets yielded AUC values of 0.75(0.72-0.77) and 0.76(0.73-0.79) for the baseline and extended models respectively (Table 4). We used the Hosmer-Lemeshow goodness-of-fit test to assess model calibration for the combined data. The resultant test statistic was 6.64 with p=0.5755 indicating that the fitted model was an adequate model We also performed Net Reclassification Index (NRI) analysis (18).. The NRI for the expanded model was 0.27(0.16-0.38) p<0.0001. 13% of the cases and 14%of the controls were correctly reclassified using the extended model, an improvement in model performance that could be considered only as modest..
In this two phase analysis of SNPs in the inflammation pathway and top lung cancer hits in African Americans, we replicated one SNP in an inflammation gene, a suggestive association with another; as well as REV1 from the Young et al. gene set list (in the Discovery set only) and three GWAS hits from Caucasian studies. Our data also showed that the epidemiologic model constructed with the discovery data, fit the replication data fairly well. However, adding these SNPs to the epidemiologic risk prediction model did not substantively improve the model performance.
The replicated inflammation SNP, rs950286, was significant in ever but not never smokers and in patients with the histologic diagnosis of adenocarcinoma. In white ever smokers from our GWAS, the OR for this SNP was 1.25, P=0.022. This OR is concordant with the data reported here for African Americans. This SNP is intergenic between IRF4 and EXOC2 genes. Interferon regulatory factor 4 is a protein encoded by IRF4, also known as MUM1. MUM1/IRF4 protein is a member of the interferon regulatory factor (IRF) family of transcriptional factors that are downstream regulators of interferon signaling (19). It is expressed in cells of the immune system, where it transduces signals from various receptors to activate or repress gene expression and regulates the differentiation of mature B cells into antibody-secreting plasma cells (20). Exocyst complex component 2 is a component of the exocyst complex, a multiple protein complex essential for targeting exocytic vesicles to specific docking sites on the plasma membrane.
There was also a suggestive association with rs7124327, on CD44 that encodes a cell-surface glycoprotein involved in cell-cell interactions, cell adhesion and migration. It is a receptor for hyaluronic acid and can also interact with other ligands, including collagens, and matrix metalloproteinases (MMPs). Leung et al (21) have shown tumorigenicity of CD44+ cells using both in vitro and in vivo approaches. They also demonstrated tumor cell expression of CD44 in about half of non small cell lung tumors. High CD44 expression was a negative prognostic marker in patients with resected NSCLC, particularly those with AC histology, and was independent of tumor stage (22). This gene has not been previously implicated in lung cancer risk. However, we noted that this SNP was also significantly associated with risk (and in the same direction) in our white ever smokers (OR=0.86, P=0.04) (11).
The association with two SNPs in the chromosome 15q locus (rs16969968 and rs1051730) has been reported previously in both Caucasian GWAS populations and in African American studies, including our own (7, 23, 24). rs16969968 has been robustly associated with smoking intensity (P < 0.01) in each of three populations (European ancestry, Asian, and African American with an odds ratio of 1.33, 95% CI= 1.25–1.42, P= 1.1 × 10−17 in meta-analysis across all population samples (23). rs16969968 and rs1051730 are highly correlated (r2= 1) in European ancestry and Asian populations, but display only modest correlation in African Americans (r2= 0.40; HapMap 3 Release 2). This variant is the most strongly associated polymorphism across all three populations and causes an amino acid change in the nicotinic receptor α5 subunit that alters function of its receptor . Chen et al (23) maintain that rs16969968, rather than rs1051730, is most likely to be driving the association with smoking intensity.
Finally, rs2736100, a G/T variation in the TERT gene (intronic) on human chromosome 5 was statistically significant in our data. A number of well-designed GWAS and meta-analysis have implicated variants at the 5p15.33 locus in cancer risk at several different sites, including lung cancer in both whites and Asians (26). To our knowledge this is the first demonstration that the SNP is also important in African Americans. The strongest risk association was noted for adenocarcinoma in all genetic models. Recently, mean relative telomere length was associated with four genetic variants of the hTERT gene, including rs2736100 (27). TERT gene amplification is responsible for TERT mRNA overexpression in a majority of adenocarcinomas (28). TERT is active in rapidly dividing cells of the immune system and is said to be related to endothelial nitric oxide synthase control (29), also involved in the immune response.
The only significant SNP from the Young et al. panel was rs3087386 in REV1 at chromosome 2q11.1-q11.2. The Rev1 proteins contain a BRCT domain, important in protein-protein interactions. A suggested role for the human Rev1-like protein is as a scaffold that recruits DNA polymerases involved in translesion synthesis triggered by several types of damaged bases, including those caused by benzo[a]pyrene. In preclinical models, lowering of REV1 transcripts was associated with a significant decrease in the multiplicity of carcinogen-induced lung tumors and complete abolishment of tumor formation in 27% of the carcinogen-exposed mice (30). These data support the central role of the translesion synthesis pathway in the development of lung cancer. Sakiyama (31) reported that ORs in homozygotes for the REV1-257Ser allele were higher in heavy-smoker squamous cell cases than light-smoker squamous cell cases.
Our previously published risk prediction model for African-Americans (6) on a slightly larger dataset than used for this analysis, exhibited good discrimination (AUC=0.75). Our discovery data yielded a similar AUC (0.77) and we were able to demonstrate moderate discrimination (68%) for the external validation set, which is an improvement over our parallel model for white subjects. We demonstrated an improvement with addition of the significant SNPs in the discovery set, and combined data sets, but not for the replication data. This may partly be explained by the fact that two SNPs exhibited ORs in discordant directions in the replication set, that could be attributed to small sample sizes and allele frequencies in controls that differed between Detroit and Texas, perhaps indicating varying levels of admixture. Although the marginal increase in the extended model was statistically significant, it is not likely to be of any clinical relevance
Other limitations of this study include differences in sampling strategies for the two populations. Although the discovery set was hospital-based and the replication set was population-based, there were no significant differences in stage between the two case groups. The percentages of local/regional disease were 55% and 59% respectively. Likewise the distribution of distant disease was 45% and 41% respectively, P=0.25. We performed stratified analyses to evaluate the SNPs in the model with gender and smoking status(ever/never). However, none of those were found to be statistically significant (p>0.05). Unfortunately we lacked information on the severity of emphysema and were unable to evaluate this phenotype in greater detail. Another limitation could be the method of selection of inflammation SNPs that was based on the CEPH database and included only those with a MAF greater than 5%. Due to differences in population structure, the allele frequencies of the selected SNPs may differ in African Americans, and other ethnic-specific SNPs may have been missed. A subset of these cases and controls from both the discovery and replication sets was included in an admixture mapping study (10). The mean proportion of African ancestry in cases and controls combined was 80.7% in the replication population and 78.4% in the Texas participants. This small difference is unlikely to drive any of the reported associations. The average percent information extracted by the marker panel, representing a measure of the overall coverage of the panel was 74.1%. In addition, there were no significant differences in allelic distribution for our SNPs of interest in the two control groups, further suggesting that the populations are similar in ancestry.
In summary, this analysis demonstrates, as has been reported for other cancer sites and different populations, that incorporating top genetic hits into epidemiologic risk prediction models, has not yet proven to be of any clinical or prognostic value in these African American lung cancer cases.
Grant support: RO1CA55769 (MR Spitz), RO1CA127219 (MR Spitz), U19 CA148127, RP100443 (CI Amos), CA121197 (CI Amos) R01 CA060691 (AG Schwartz), N01- PC35145 (AG Schwartz) and P30CA22453 (AG Schwartz).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.