|Home | About | Journals | Submit | Contact Us | Français|
Lung cancer continues to be the leading cause of cancer death in the USA and the best example of a cancer with undisputed evidence of environmental risk. However, a genetic contribution to lung cancer has also been demonstrated by studies of familial aggregation, family-based linkage, candidate gene studies and most recently genome-wide association studies (GWAS). The African-American population has been underrepresented in these genetic studies and has patterns of cigarette use and linkage disequilibrium that differ from patterns in other populations. Therefore, studies in African-Americans can provide complementary data to localize lung cancer susceptibility genes and explore smoking dependence-related genes. We used admixture mapping to further characterize genetic risk of lung cancer in a series of 837 African-American lung cancer cases and 975 African-American controls genotyped at 1344 ancestry informative single-nucleotide polymorphisms. Both case-only and case–control analyses were conducted using ADMIXMAP adjusted for age, sex, pack-years of smoking, family history of lung cancer, history of emphysema and study site. In case-only analyses, excess European ancestry was observed over a wide region on chromosome 1 with the largest excess seen at rs6587361 for non-small-cell lung cancer (NSCLC) (Z-score = −4.33; P = 1.5 × 10−5) and for women with NSCLC (Z-score = −4.82; P = 1.4 × 10−6). Excess African ancestry was also observed on chromosome 3q with a peak Z-score of 3.33 (P = 0.0009) at rs181696 among ever smokers with NSCLC. These results add to the findings from the GWAS in Caucasian populations and suggest novel regions of interest.
Lung cancer continues to be the leading cause of cancer death in the USA (1) and the best example of a cancer with undisputed evidence of environmental risk. Although 80– 90% of lung cancer incidence is attributed to cigarette smoking, only 15% of smokers develop this disease suggesting that there are individual differences in susceptibility. Differential risk is also seen by race; African-Americans tend to smoke fewer cigarettes (2), yet have higher incidence rates for lung cancer (1), and have higher risk associated with a family history of lung cancer than Caucasians (3). Contributions to the understanding of the genetic susceptibility to lung cancer risk have come from a family-based linkage study (4), candidate gene studies (reviewed in ref. 5) and genome-wide association studies (GWAS) (6–9) conducted in primarily Caucasian populations. Studies of candidate genes have produced mixed results (5), in part reflecting the small size of many of these studies, leaving open the question of the contribution to risk from common low-penetrant genes. The family linkage study and the GWAS have identified several chromosomal regions of interest including 6q23–25 (4), 15q25 (6–8) and 5p15 and 6p21 (10). The 15q25 region contains the nicotinic acetylcholine receptor subunit genes CHRNA3 and CHRNA5 raising the possibility that these genes indirectly affect lung cancer risk through smoking behavior (8,11) versus a direct affect on lung cancer risk (6–9).
Examining genetic risk in African-Americans provides complementary data to localize lung cancer susceptibility genes and explore smoking dependence-related genes because this population has different patterns of cigarette use and linkage disequilibrium (LD) than is seen in other populations. Since admixture between Europeans and Africans occurred relatively recently, large chromosomal regions of European or African origin remain in admixture-induced LD. Admixture mapping represents an alternative strategy for gene discovery that benefits from the LD characteristics of admixed populations (12,13). This chromosomal structure supports the use of ancestry informative single-nucleotide polymorphisms (SNPs) as markers for disease genes that occur differentially between populations of different origin. This approach also requires fewer markers and smaller sample sizes than GWAS (14,15). This study is the first to use admixture mapping to further characterize genetic risk of lung cancer, taking advantage of the higher risks for lung cancer in African-Americans compared with Caucasians.
Institutional Review Board approval was obtained at all study sites and informed consent was obtained from all participants. Cases and controls were enrolled through three sites: Wayne State University (WSU), M. D. Anderson Cancer Center (MDA) and the National Cancer Institute (NCI). WSU identified population-based cases through the Metropolitan Detroit Cancer Surveillance System, a participant in the NCI’s Surveillance, Epidemiology and End Results program. Three WSU studies were used in this analysis, all conducted by the same study staff using identical procedures; studies differed only in the eligibility of cases as detailed in other publications. The Family History Study focused on never-smokers and cases diagnosed before age 50 years regardless of smoking habits (16,17), the Women’s Epidemiology of Lung Diseases study focused on non-small-cell lung cancer (NSCLC) in women (18,19) and the Exploring Health, Ancestry and Lung Epidemiology (EXHALE) study is ongoing and includes only African-Americans. With the exception of the EXHALE study, population-based controls were chosen using random digit dialing methods. The EXHALE study relies on volunteer controls identified as friends of the cases and through advertising. All study controls were frequency matched to cases by 5-year-age group, sex and self-reported race. In total, 444 African-American cases and 559 African-American controls from WSU studies were included.
The MDA cases and controls were selected from a large hospital-based case–control study. This population has been described in detail elsewhere (20). Histopathologically confirmed lung cancer cases were recruited from the University of Texas M. D. Anderson Cancer Center and the Michael E. DeBakey VA Medical Center in Houston, TX. Controls were recruited from the Kelsey–Seybold Foundation, a multispeciality physician group practice in the Houston metropolitan area, and were matched to the cases on race, age (±5 years) and sex. In total, 192 African-American cases and 188 African-American controls from MDA were included.
The NCI cases and controls were selected from an ongoing case–control study in Baltimore, MD. The study population and enrollment criteria have been described previously (21). Histopathologically confirmed lung cancer cases were recruited from seven hospitals in the Baltimore, Maryland metropolitan area. Cancer-free population controls were collected through a list derived from the Maryland Department of Transportation. Two hundred and one African-American cases and 228 African-American controls were included from the NCI study.
The WSU and NCI studies used commercially available kits to isolate DNA. The Gentra AutoPure Kit (Qiagen, Valencia, CA) was used to extract DNA from 932 whole blood samples and the Gentra Puregene Kit (Qiagen) was used for 71 mouthwash and saliva samples from WSU. DNA was isolated from whole blood for the NCI samples using the Flexi Gene DNA kit (Qiagen).
For the MDA samples, a standard protocol was used to isolate DNA from the lymphocyte cell pellet collected from the buffy coat by centrifugation of whole blood. Pellets were digested with proteinase K and RNase, and the DNA was extracted with phenol and chloroform/isoamyl alcohol and then precipitated with ethanol.
Concentrated DNA samples for 30 Center d’Etude du Polymorphisme Humain (CEPH) individuals were purchased from the Coriell Institute for Medical Research in Camden, NJ (http://ccr.coriell.org) and used for quality control.
Genotyping was conducted for all samples at the WSU Applied Genomics Technology Center. Isolated DNA samples were diluted to the optimal concentration of ~75 ng/μl and were run on the commercially available African-American Admixture Panel from Illumina (http://www.Illumina.com). This panel contains 1509 genome-wide SNPs informative for West African versus European ancestry. The Illumina BeadStudio platform was used to cluster the genotypes and provide the completion rates for individual samples and a measure of the predicted accuracy of each SNP (gentrain score).
A total of 1864 eligible samples were genotyped. Forty-one samples were excluded after failure to meet sample quality checks (call rates <95% or low DNA concentration). An additional 11 samples had high European ancestry estimates (>85% European ancestry) and were excluded due to potential sample misclassification. Thus, a total of 1812 samples were eligible for analysis.
In addition to excluding entire samples of poor quality, SNPs with poor genotyping accuracy were excluded from all analyses using the following criteria. First, SNPs were excluded from all analyses if they were in the lowest 5% of gentrain scores; the X chromosome SNPs had both poor gentrain scores and high heterozygosity rates in men, so were excluded. Second, SNPs that were out of Hardy–Weinberg equilibrium, as indicated by a P-value <1 × 10−5, were excluded from analysis. Finally, SNPs with >15% of the genotypes varying from the HapMap reported genotypes in 30 CEPH samples were excluded. In the 1257 SNPs with HapMap genotypes available for comparison, there was a concordance rate of 99.9% among the previously genotyped CEPH samples compared with genotyping conducted in this study. After all exclusions, 1344 SNPs were eligible for inclusion in the analyses.
Data from 1812 participants and 1344 SNPs were analyzed using ADMIXMAP software. This program fits a Bayesian probability model using a Markov chain Monte Carlo algorithm to compare observed versus expected ancestry across the genome conditional on a priori ancestral genotype frequencies. Regions of excess European or African ancestry suggest that risk variants more common in one of the ancestral populations might be present. Both case–control and case-only analyses were conducted adjusting for age at diagnosis for cases or age at interview for controls, sex, smoking pack-years, family history of lung cancer, history of emphysema and study site. The case-only test compares ancestry at each locus to overall ancestry. The case–control test compares ancestry at each locus between cases and controls while adjusting for individual admixture and covariates included in the model (http://homepages.ed.ac.uk/pmckeigu/admixmap/manual_desc.html). West African Yoruban (YRI) and CEPH Europeans from Utah (CEU) HapMap genotype results were used to estimate prior allele frequencies for the ancestral populations. Analyses were repeated stratified by sex, smoking status and histology type.
Results from the ADMIXMAP analyses are presented in terms of standard normal Z statistics, ancestry risk ratios (aRRs) from case-only analyses, ancestry odds ratios from case–control analyses and P-values. A negative Z-score indicates excess European ancestry at a locus and a positive Z-score indicates excess African ancestry at a locus among cases. Similarly, a risk ratio or an odds ratio >1 indicates increased risk of lung cancer associated with carrying two African alleles in comparison with no African alleles; values <1 indicate reduced risk of lung cancer associated with carrying two African alleles in comparison with no African alleles. Regions with an absolute value Z-score of 3.00 or greater at multiple SNPs (P-value ~0.0025) in the case-only analysis were further evaluated by conducting repeat analyses that dropped the SNP with the highest absolute Z-score to ensure that results were not driven by a single SNP. Findings are presented for any SNP with an absolute value Z-score ≥ 3.00 in the case-only analysis, which also demonstrated a similar increase in ancestry in the case–control analysis. All SNPs included in these analyses were a minimum of 200 kb apart with a median spacing of 1400 kb. In areas with high absolute Z-scores, ADMIXMAP results evaluating residual LD between SNPs were reviewed. As a result of this review, three SNPs that were tightly linked to adjacent SNPs (score test for residual LD, P < 0.001) on chromosome 1 were removed before final analysis. Additionally, ADMIXMAP was run while excluding every other SNP in the chromosome 1 peak area. This analysis continued to show a chromosome 1 peak that was reduced in size but still very strong. Two-sided tests of significance were applied to case-only and case–control analyses to capture both African and European alleles that may be associated with increased risk of lung cancer.
Table I describes the characteristics of cases and controls (overall and by study site). In the entire sample, females were overrepresented among both cases and controls. Collectively, cases were slightly older, more probably to be smokers and smoked more than controls. Cases and controls were well matched by sex for the WSU and NCI populations; however, MDA cases and controls varied significantly. Overall, 17.9% of cases and 3.7% of controls reported an emphysema diagnosis. Rates of emphysema did not differ significantly across the studies. Adenocarcinoma occurred more frequently in the Detroit population due to the overrepresentation of this histology type in one of the WSU studies and squamous cell carcinoma was more prevalent in the MDA and NCI populations. NSCLC comprised >95% of the known histology types across all three studies. Smoking status did not differ significantly across the studies (P = 0.11); however, average ancestry (P = 0.0003), age (P < 0.0001) and sex (P < 0.0001) were significantly different. The mean proportion of African ancestry in cases and controls combined was 80.7% in the WSU study participants, 78.4% in the MDA participants and 78.8% for the NCI participants. Overall, the population demonstrated six generations of admixture per morgan, and the mean African ancestry did not differ between cases and controls. The average percent information extracted by the marker panel was 74.1%.
Table II displays the local ancestry results for the peak SNP (i.e. the SNP with the highest absolute Z-score) from the case-only test in two regions of suggestive ancestry association. These regions showed an excess of either European or African ancestry in the case-only analysis that was supported by the corresponding case–control analysis. The strongest evidence for ancestry association was for a region on chromosome 1q42 that displayed excess European ancestry among cases (Figure 1a). The peak occurred at rs6587361 for all subjects, NSCLC only and ever smokers with NSCLC in the case-only analysis (Table II). The maximum Z-score was −4.33 for NSCLC (P = 1.5 × 10−5), with carriers of two African alleles at this locus having a reduction in lung cancer risk of 0.49 (95% confidence interval 0.36–0.68) in comparison with carriers of two European alleles after adjustment for covariates. Excess European ancestry was strongest in women with NSCLC, with a maximum Z-score of −4.82 (P = 1.4 × 10−6) associated with rs6587361 (data not shown). Case–control analysis also indicated excess European ancestry; increased Z-scores for the same SNP demonstrated a large difference in ancestry between the cases and controls (Table II). Again, the association with European ancestry in the case–control analyses was strongest in women with NSCLC in an SNP immediately adjacent to and downstream from rs6587361 (rs2045520, Z-score = −3.93; P = 8.5 × 10−5) (data not shown). Analysis stratified on the most predominant histology type, adenocarcinoma, revealed attenuated Z-scores for both the case-only and case–control analyses. For adenocarcinomas, the peak European ancestry was located at rs493950 (Table II).
The second region of suggestive ancestry association was indicated by several SNPs included in a region of excess African ancestry on chromosome 3q25 (Figure 1b). The maximum Z-score for the case-only analysis was associated with rs181696 among ever smokers with NSCLC (Z-score = 3.33; P = 0.0009) (Table II). Carriers of two African alleles at this locus have an increased lung cancer risk of 1.72 (95% confidence interval 1.25–2.37) compared with carriers of two European alleles. Case–control analyses showed an increased risk of lung cancer associated with African ancestry for many SNPs in the region, including rs181696 (Z-score = 3.22; P-value = 0.0013). Analysis of adenocarcinoma cases again revealed a weaker association with ancestry for both the case-only and case–control analyses, with the highest Z-score at rs181696 (Table II).
Supplementary Table I (available at Carcinogenesis Online) highlights additional SNPs that were found in the chromosomes 1 and 3 regions of interest in the NSCLC analysis, as well as two additional regions on 6p24 and 15q12–13.
In this admixture analysis of 1812 African-Americans, ADMIXMAP results revealed ancestry associations at 1q42 and 3q25. Both of these areas were identified by increased Z-scores in both the case-only and case–control analyses. This is the first reported admixture analysis of lung cancer.
For the excess European ancestry region on chromosome 1, the most significant Z-scores were found near a gene that has been strongly associated with lung cancer risk, Epoxide Hydrolase 1 (EPHX1), located at 1q42.1 (http://www.ncbi.nlm.nih.gov/gene). This gene metabolizes polycyclic aromatic hydrocarbons, which are found in cigarette smoke as well as other sources. A meta-analysis by Kiyohara showed a significantly decreased risk of lung cancer associated with the low-activity variant of the exon 3 (Tyr113His) polymorphism among whites (22). Only two studies of EPHX1 appear to have included African-Americans (23,24). Wu et al. (24) did not find significant associations between either Tyr113His or His139Arg and lung cancer among African-Americans. When London et al. (23) combined these two polymorphisms, a small decrease in risk was associated with the predicted slow activity genotype. Neither of these two more extensively studied SNPs was included in this admixture panel. It is possible that these or other SNPs in this gene or in this region are responsible for the association we see with lung cancer risk.
Excess African ancestry was displayed on chromosome 3q25. Many studies have reported increased copy number in this region in lung tumors (25). In a 2008 paper, Qian et al. (26) suggest many possible candidate genes in this region, including the oncogene PIK3CA and report that multiple studies have shown amplification of the 3q region to be much more probable in squamous cell carcinoma cases as compared with adenocarcinoma cases. We did not have sufficient numbers of squamous cell carcinoma cases (n = 172) to repeat our analyses in this subpopulation; however, our adenocarcinoma results add support to this finding. When results were restricted to adenocarcinoma cases, Z-scores were significantly lower in this region.
Several recent GWAS for lung cancer have also identified potential susceptibility gene regions at 15q15, 15q25, 5p15 and 6p21 (6–10,27). Within our study, we found excess European ancestry at 6p24 and excess African ancestry at 15q11–13 in the case–control analyses (supplementary Table I is available at Carcinogenesis Online). These areas are a large distance removed from the regions identified in the GWAS (>17 Mb and >14 Mb for the 6p21 and 15q15 regions, respectively). Potential genes of interest in the 15q11–13 region include the genes encoding the inhibitory neurotransmitter GABA receptor gamma 3 (GABRG3, rs8042276), receptor alpha 5 (GABRA5) and receptor beta 3 (GABRB3). Although these 15q11–13 receptor genes have yet to be linked to lung cancer risk or nicotine addiction, GABRA gene clusters on chromosomes 4 and 5 have been associated with nicotine dependence (28,29). Another potential candidate gene is CHRNA7, located further downstream on chromosome 15q14. While CHRNA3 and CHRNA5 were both identified as candidate genes in the lung cancer GWAS, CHRNA7 was not. However, it has been shown to be upregulated in lung cell lines exposed to the tobacco-specific nitrosamine NNK (4-(methylnitrosamino)-1-(3-pyridyl)-1-butonone) and estrogen (30). CHRNA7 is a candidate gene for schizophrenia, where it has been shown to be differentially expressed in smokers compared with nonsmokers (31).
It is possible that we did not see significant associations in the same areas as the GWAS mentioned above because those studies only included white and Asian populations. Furthermore, the regions identified using GWAS methods might be associated with lung cancer risk but might not be as strongly linked to ancestry.
We report both case-only and case–control analysis because these methods complement each other. The case-only analysis is more powerful than the case–control analyses because it compares ancestry estimates at a given locus with its inherent variability to the case’s genome-wide ancestry, which has been estimated over many markers to reduce variability (32). The case–control method compares the admixture at a given locus between the cases and controls (32). Thus, while our Z-scores were sometimes lower for the case–control results, they validate the case-only results in these regions by ruling out the possibility that the findings are due to selection in these regions among African-Americans.
There are several strengths to this study. The admixture mapping approach to gene discovery is a powerful method that takes advantage of the LD patterns among the recently admixed African-American population so that fewer individuals and fewer SNPs are required than for a GWAS. In addition, this study focused on African-Americans, a population that is known to have different smoking patterns and possibly different genetic risks than white and Asian populations. Although racial differences in LD and smoking behaviors warrant the inclusion of African-Americans in genetic studies, few studies have been done in this population.
A proposed cutoff for genome-wide statistical significance has been suggested for admixture mapping studies in African-Americans at P < 10−5 (33). In the case-only NSCLC strata, SNP rs6587361 (P = 1.5 × 10−5) at the 1q42.13 locus approached this significance level, and among women with NSCLC, rs6587361 (P = 1.4 × 10−6) met this threshold. Furthermore, we performed a post-hoc power calculation based on the methods of Hoggart et al. (33) for the case-only analysis. Based on the mean observed marker information (37.98) in our sample across the marker map and a two-tailed genome-wide significance threshold of 10−5, we had 80% power to detect an aRR < 0.40 (or >2.52). This calculation indicates that the sample was reasonably powered to detect the ancestry association signal observed among NSCLC subjects at 1q42.13 (for SNP rs6587361, an aRR of 0.49 among NSCLC subjects overall and an aRR of 0.36 among female NSCLC subjects). Taken together, these results indicate that at 1q42.13, there is a suggestive ancestry association locus for NSCLC.
The study has a large sample size with >1800 cases and controls and used a well-developed set of ancestry informative markers. Women made up 57% of the study population and were disproportionally represented in the WSU group due to the nature of one of the WSU studies. It is possible that the findings in women are driven by study site differences rather than a sex specific effect; however, study stratified analyses did not support a study site-specific finding (data not shown). The only histology categories with large enough numbers for meaningful analyses were NSCLC and adenocarcinoma and therefore, our results might not be generalizable to all histology types. We also had too few never smoking cases for meaningful analyses.
Lung cancer is one of the few cancers for which no substantial progress has been made in early detection and treatment. Given that it is the leading cause of death from cancer, it is exceedingly important to better understand the underlying biology to develop targeted treatments and to identify high-risk populations for targeted behavioral interventions and access to developing screening methods. It is probably that many genes contribute to an individual’s lung cancer risk. Admixture mapping provides an alternative approach to the identification of lung cancer susceptibility genes. New regions were identified in this admixture study. These results add to the findings from the GWAS in Caucasian populations and suggest novel candidate gene areas. These regions need to be confirmed in additional studies with finer mapping.
National Institutes of Health (NIH) (R01-CA87895, R01-CA60691, R01-CA55769, R01-CA127219, R01-CA141716, , R01-CA133996, R01-CA121197, U19-CA148127 and contract numbers N01-PC35145 and P30CA22453); Cancer Prevention Research Institute of Texas (RP10043); Intramural Research Program of the National Cancer Institute/NIH, Center for Cancer Research.
Conflict of Interest Statement: None declared.