Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cancer Res. Author manuscript; available in PMC 2011 January 10.
Published in final edited form as:
PMCID: PMC3018237

CYP1A1/2 Haplotypes and Lung Cancer and Assessment of Confounding by Population Stratification


Prior studies of lung cancer and CYP1A1/2 in African American and Latino populations have shown inconsistent results and not yet investigated the haplotype block structure of CYP1A1/2 or addressed potential population stratification. To investigate haplotypes in the CYP1A1/2 region and lung cancer in African Americans and Latinos, we conducted a case-control study (1998-2003). African Americans (N = 535) and Latinos (N = 412) were frequency-matched on age, sex, and self-reported race/ethnicity. We used a custom genotyping panel containing 50 single nucleotide polymorphisms in the CYP1A1/2 region and 184 ancestry informative markers selected to have large allele frequency differences between Africans, Europeans, and Amerindians. Latinos exhibited significant haplotype main effects in two blocks, even after adjusting for admixture (odds ratio (OR) = 2.02, 95% confidence interval (CI): 1.28 – 3.19 and OR = 0.55, 95% CI: 0.36 – 0.83) but no main effects were found among African Americans. Adjustment for admixture revealed substantial confounding by population stratification among Latinos but not African Americans. Among Latinos and African Americans interactions between smoking level and haplotypes were not statistically significant. Evidence of population stratification among Latinos underscores the importance of adjusting for admixture in lung cancer association studies, particularly in Latino populations. These results suggest a variant occurring within the CYP1A2 region may be conferring an increased risk of lung cancer in Latinos.

Keywords: lung cancer, haplotype, CYP1A1/2, population stratification, admixed


Lung cancer is the third most frequently diagnosed malignancy in the United States and its mortality surpasses all other cancers. Racial/ethnic differences in lung cancer incidence and mortality persist with African Americans and Latinos experiencing the highest and lowest incidence and mortality, respectively, among all racial/ethnic groups in the U.S. (1). Reasons for the disparate incidence rates remain unclear perhaps due to the paucity of studies in Latinos and investigations in African Americans have yet to definitively ascertain genetic loci associated with their increased cancer susceptibility. By examining recently admixed populations with contrasting incidence and accounting for population substructure, genetic susceptibility loci for the disease may be identified.

CYP1A1 and CYP1A2 are located on chromosome 15q in opposite orientation and separated by 23.3 kb (2). CYP1A1 is a major phase I enzyme, present in lung tissue, which activates procarcinogens present in tobacco smoke (3-5). CYP1A2 is preferentially expressed in hepatic tissue, although it is also present in lung tissue (6, 7). CYP1A2 biotransforms the tobacco-specific nitrosamines NNK and NNN (5) and is activated among smokers (8, 9). Mechanistically, lung cancer susceptibility is mediated by allelic variants of CYP1A1/2 resulting in phenotypes characterized by intermediate or poor metabolism of procarcinogens. These carcinogens can become ultimate carcinogens or proximate intermediates able to form DNA adducts causing mutations in tumor suppressor genes and eventually initiating carcinogenesis (4, 5).

Considerable racial/ethnic differences in allele and haplotype frequencies (10, 11) occur in the CYP1A1/2 region, commensurate with racial/ethnic differences in CYP1A1 induction (12) and greater than 60-fold inter-individual differences noted in CYP1A2 activity (4). Polymorphisms in CYP1A1 have been extensively investigated in lung cancer etiology, especially the CYP1A1 T6235C (rs4646903, also referred to as M1 or Msp1) and A4889G variants (rs1048943, also known as M2 or Ile462Val). Investigations of these two loci suggest ethnic variation in genetic susceptibility of lung cancer, although ambiguity exists regarding whether these loci are involved in lung carcinogenesis in African Americans or Latinos (13-17). Less attention has been focused on CYP1A2 in lung cancer susceptibility despite its role in metabolizing important tobacco carcinogens and other compounds. Overall, the lack of consistent associations in studies of CYP1A1/2 conducted in different racial/ethnic populations may be due to differences in allele frequencies or linkage disequilibrium patterns.

Debate has ensued as to whether population stratification is of concern and if studies in recently admixed populations provide credible results (18, 19). The presence of both lung cancer and allele frequency variation across populations suggests population stratification should be considered. We investigated CYP1A1/2 haplotypes and lung cancer in African Americans and Latinos, accounting for influences of potential confounding by population stratification.

Materials and Methods

Study population

Newly diagnosed lung cancer patients residing in the San Francisco Bay Area were identified using rapid case ascertainment methods conducted by the Northern California Cancer Center and Summit Medical Center from September 1998 through March 2003. Incident patients were eligible for participation if they 1) self-identified as African American or Latino, 2) were 21 years or older, 3) resided within the counties of Alameda, Contra Costa, Santa Clara, San Francisco, or San Mateo, and 4) had a diagnosis of primary lung cancer. Cases meeting eligibility criteria were invited to participate in an in-person interview and to donate a biologic (blood or buccal smear) sample. A total of 368 cases (255 African Americans and 113 Latinos) are included in this analysis.

Potential controls were recruited from the following: 1) random-digit dialing, 2) Health Care Financing Administration (HCFA) records for persons aged 65 or older, and 3) community-based sources, such as churches, health fairs, and senior centers. For each case, approximately twice as many eligible controls were recruited having the same age (+/- 10 years), sex, and self-identified race/ethnicity. Eligible controls were invited to participate in an in-person interview and to donate a biologic sample. A total of 579 controls (280 African Americans and 299 Latinos) are included in the analysis.

The San Francisco Bay Area Lung Cancer Study was approved by the University of California Committee for the Protection of Human Subjects. Written, informed consent was obtained from all participating subjects.

Interview Data Collection and Specimen Processing

Epidemiologic data were collected during in-person interviews using a structured questionnaire to ascertain exposure histories from before diagnosis for cases and prior to interview date for controls. At the time of interview, blood and buccal specimens were collected. Specimens were transported to a University of California (UC), San Francisco laboratory within 48 hours of collection and processed for storage until ready for future genotyping. When samples from all participants were collected, biospecimens were thawed and DNA isolated by automated phenol chloroform extraction using the Autogen 3000 (Autogen, Inc., Holliston, MA). DNA concentration was measured by fluorescence (PicoGreen, Invitrogen Corporation, Carlsbad, CA) and normalized to 30-100ng/ul, for a total concentration of 150-500ng used for genotyping. Whole genome amplification was performed on samples yielding insufficient DNA (blood, N = 2 and buccal, N = 4) in accordance with the Omniplex protocol (Sigma-Aldrich Corporation, St. Louis, MO) and the amplified product was cleaned with Millipore's Montage PCR96 filter plate (Millipore Corporation, Billerica, MA) (20).


Genetic markers included 50 SNPs in the CYP1A1/2 region on chromosome 15. CYP1A1/2 SNPs were identified from the following sources: 1) published literature, 2) International HapMap Project (21) and 3) SNP500Cancer database (22). Literature SNPs were selected if they had a minor allele frequency (MAF) greater than 5% in any HapMap population (Build 34) and were previously identified as either associated with cancer or a tagSNP of a reported haplotype in the CYP1A1/2 region, including several SNPs previously characterized (11). The HapMap database was used to identify tagSNPs (r2 ≥ 0.8) in the CYP1A1/2 region having a MAF greater than 5% in the Yoruba and CEPH populations. SNPs 10,000 base pairs upstream and downstream of the CYP1A1/2 gene boundaries were included to ensure gene coverage when generating tagSNPs from HapMap data. The SNP500Cancer database was queried for SNPs in CYP1A1 and CYP1A2. SNPs were selected for genotyping if the MAF was greater than 5% in any SNP500Cancer or Human Diversity Panel population. Based on HapMap CEPH data (Build 36 and r2 ≥ 0.8), the total number of SNPs either genotyped or tagged by SNPs in our panel was 85 across the genotyped CYP1A1/2 region for a marker density of one SNP per 1.4kb. Based on HapMap Yoruban data, the total number of SNPs either genotyped or tagged by SNPs in our panel was 65 for a marker density of one SNP per 1.9kb.

DNA collected from African American and Latino participants was genotyped according to the manufacturer's protocol at the UC Davis Genome Center using the Illumina Bead Station 500G Golden Gate™ genotyping platform. A total of 996 African American or Latino participants were genotyped. Participants were selected for genotyping if they were a lung cancer case (Latino or African American) or a Latino control. A random sample of African American controls was selected to complete the study. Participants were removed from statistical analyses if they self-reported more than one ethnicity (N = 44) or DNA sample quality was poor (N = 5) resulting in a final sample size of 947 self-described African American and Latino participants.

In addition to CYP1A1/2 SNPs, a panel of 184 autosomal biallelic ancestry informative markers (AIMs) distinguishing the continental ancestor populations comprising Latinos and African Americans was genotyped to address potential population stratification. The AIMs panel (previously designed by author M.F.S.) was developed by selecting SNPs across the genome with high informativeness for ancestry between Amerindian, European and sub-Saharan African continental populations (23, 24). Criteria for the AIMs included validation in multiple subgroups for each continent and a lack of linkage disequilibrium (r2 < 0.6) between AIMs in continental populations. Genotyping of the AIMs was conducted not only using DNA collected from the Latino and African American participants but also using DNA from European Americans (San Francisco Bay Area, US, N = 47), West Africans (Bini from Edo State, and Kanuri from Nigeria, N = 46), and Amerindians (Mayans from Bola De Oro and Cienega Grande, Guatemala, N = 46) to improve genetic ancestry estimation of the Latinos and African Americans. Mean fixation indices (FST), estimated using FSTAT following Weir and Cockerham (25), were 0.52 for West Africans versus Europeans, 0.52 for West Africans versus Amerindians, and 0.48 for Europeans versus Amerindians.

Genotypes obtained from genomic and whole genome amplified (WGA) DNA were called using separate clustering analyses. Genotype call rates (GenCall ≥ 0.25) averaged 99.41%) and 99.16% (standard deviation ± 0.010) for genomic and WGA samples, respectively. Genotype reproducibility was verified with duplicates of unamplified DNA and WGA/genomic DNA pairs. Unamplified duplicates (N = 31) had a mean reproducibility of 99.99%. WGA/genomic DNA pairs amplified from blood (N = 18 pairs) and buccal specimens (N = 28 pairs) exhibited a mean genotype reproducibility of 99.39% and 98.49%, respectively.

Statistical Analysis

Analyses were conducted separately for African Americans and Latinos. Exact tests for Hardy-Weinberg Equilibrium and measures of linkage disequilibrium were conducted using SAS v9.1/Genetics software (SAS, Cary, NC). Using African American and Latino control genotyping data, the haplotype block structure was estimated using the confidence interval algorithm (26) in Haploview 3.2. (27) Haplotypes and their frequencies were estimated from unphased individual genotype data using the HAPPY macro (28). Genetic ancestry of African American and Latino participants was determined using 184 AIMs and a maximum likelihood approach based on estimation methods described by Chakraborty et al. (29, 30) and Hanis et al. (31) To improve ancestry assignment, ancestral population AIM frequencies were input along with the genotypes of the admixed participants.

Logistic regression models were used to estimate odds ratios assessing the association between CYP1A1/2 haplotypes and lung cancer, adjusting for the variables age, sex and smoking pack-years. Subject-specific haplotype probabilities were incorporated as covariates into regression models which estimated odds ratios associated with having a specific haplotype under an additive model. Likelihood ratio tests were used to assess the influence of the explanatory variables. Individual admixture estimates obtained from the AIMs were added as continuous covariates to logistic regression models to assess possible confounding by population heterogeneity. Only two of the three admixture proportions (European and Amerindian) were included in the model since the three admixture proportions sum to one making them collinear. Models with and without genetic admixture were compared to identify population stratification. A confounding risk ratio (CRR) comparing odds ratios adjusted and unadjusted for admixture was chosen to provide a quantitative measure of the amount of confounding. Admixture proportions were included as a covariate in follow-up logistic regression analyses if adjusted and unadjusted haplotype odds ratios differed by 10% or greater. For haplotypes demonstrating significant associations with lung cancer, individual SNP analyses were conducted for genotyped SNPs present in each haplotype. Logistic regression models for single SNP analyses included variables coding for the heterozygous and homozygous variants such that no inheritance mode was assumed. Trend tests were conducted using the log-additive model by assigning 0, 1 or 2 copies of the minor allele to genotypes. A type I error rate of α = 0.05/7 haplotype blocks = 0.007 was set for inference of main effects. This approach decreases the type I error without being too conservative for this candidate gene study having a priori hypotheses. Exploratory analyses examined interactions between haplotypes and smoking using logistic regression and a type 1 error rate of α = 0.1 to infer the presence or absence of interaction. Smoking level was represented in the model as a three-level design variable, allowing for the joint influence between smoking and haplotypes to be unconstrained.


Descriptive statistics

Participation rates (completed the questionnaire and provided a biologic sample) for African American and Latino cases were 69.6% and 54.0%, respectively. The median time between diagnosis and interview date for cases was 99 days for African Americans and 123 days for Latinos. Among African American and Latino cases 19.4% and 16.2%, respectively, were not recruited due to death. Participation rates for African American and Latino controls were 58.1% and 55.6%, respectively. Among African American controls, 24.9% were from population-based sources (random-digit dialing and HCFA lists) and 75.1% were identified by community-based outreach methods. Among Latino controls, 31.8% were from population-based sources and 68.2% were from community-based outreach methods.

African American (N=535) and Latino (N=412) lung cancer cases and controls did not differ according to the frequency strata-matched variables age and sex (Table 1). Smoking patterns varied between cases and controls for African Americans and Latinos. Among Latinos, significantly more cases were born in the US and controls had lower household income compared to cases (Table 1). All genotyped CYP1A1/2 SNPs were in Hardy-Weinberg Equilibrium (HWE) among Latino controls, whereas African American controls had 8 SNPs that were not in HWE (Supplemental Table 1), which decreased to one SNP after correction for multiple testing using false discovery rate (32).

Table 1
Characteristics of San Francisco Bay Area Lung Cancer Study participants by race/ethnicity, San Francisco Bay Area, California, 1998-2003

Linkage disequilibrium and haplotype estimation

Block structure for the genotyped region of chromosome 15 in the 412 unrelated Latino study participants and the 535 unrelated African Americans is shown in Supplemental Figures 1 and 2, respectively. Three blocks defined the genotyped region for Latinos and four smaller blocks defined the region for African Americans (Supplemental Tables 2 and 3). Haploview identified a total of ten and fifteen haplotype-tagging SNPs among Latinos and African Americans, respectively, for discrimination of haplotypes with estimated frequencies greater than 5% in the CYP1A1/2 region (Supplemental Figures 3 and 4).

M1 and M2 variants

The M1 and M2 variants were not genotyped in the panel of 50 SNPs since M1 did not meet design criteria for the Illumina platform and a SNP near M2 was selected for genotyping. Polymerase chain reaction (PCR) genotyping data were available for both M1 and M2 loci. In Latino controls, M1 was in high linkage disequilibrium (r2 ≥ 0.8) with several SNPs in block 1 (rs17861120, rs12441817, and rs886605), while the M2 variant was in high linkage disequilibrium with two SNPs in block 1 (rs17861120 and rs12441817) and two SNPs (rs16972208 and rs17861140) in block 2. In African American controls, M1 was in high linkage disequilibrium with two SNPs in block 1 (rs17861109 and rs4886605) and M2 was in high linkage disequilibrium with two SNPs in block 2 (rs16972208 and rs17861140), indicating these SNPs are tightly linked. Together M1 and M2 were also in linkage disequilibrium. Since M1 and M2 are in linkage disequilibrium with SNPs in blocks 1 and block 2 for Latinos and African Americans, observed associations in haplotypes in blocks 1 and 2 will indirectly assess associations with the frequently investigated CYP1A1 variants M1 and M2.

Haplotype associations among Latinos

Associations between haplotypes and lung cancer risk for Latinos are presented in Table 2. Only haplotype C in block 2 (haplotype 2C) was significantly associated with an increased risk of lung cancer (odds ratio [OR] = 2.17, 95% confidence interval [CI]: 1.39 – 3.41), adjusting for the frequency-matched variables and number of smoking pack-years. None of the other haplotype associations were significantly associated with lung cancer (p-value > 0.007). Haplotype 2C remained significantly associated (p-value < 0.007) with lung cancer (OR = 2.02, 95% CI: 1.28 – 3.19) and haplotype 3B became significantly associated (OR = 0.55, 95% CI: 0.36 – 0.83), after adjusting for the influence of admixture. The global test for haplotype association for block 2 remained significant before and after adjusting for admixture (p-value = 0.002) (Table 2). Comparison of the crude and adjusted ORs using the CRR showed at least a 10% reduction in five of the point estimates (50% of the haplotype associations) with all estimates decreasing after adjustment for admixture (Table 2). Individual admixture was included as a covariate in logistic regression analyses to control for this confounding by population stratification.

Table 2
Assessment of confounding by population stratification for the association between CYP1A1/2 haplotypes and lung cancer among Latinos participating in the San Francisco Bay Area Lung Cancer Study, 1998-2003

Stratification by European ancestry revealed no significant associations among Latinos having less than 54% European ancestry (Table 3). Among participants with European ancestry greater than or equal to 54%, the odds ratio for haplotype 2C showed a statistically significant increased association with lung cancer (OR = 3.56, 95% CI: 1.82 – 6.95); however, the Woolf test for homogeneity revealed no difference in the two odds ratios for haplotype 2C (p = 0.014), based on our specified type I error rate (p = 0.007).

Table 3
Association between CYP1A1/2 haplotypes and lung cancer, stratified by European ancestry, among Latinos participating in the San Francisco Bay Area Lung Cancer Study, 1998-2003

Among light smoking Latinos, haplotype 2C was again found to be strongly associated (p < 0.007) with an increased risk of lung cancer (OR = 7.80, 95% CI: 2.74 – 22.15) (Table 4). A significant decreased association with lung cancer was observed among non-smokers having haplotype 3B (OR = 0.34, 95% CI: 0.17 – 0.71) (Table 4). An interaction was revealed between smoking exposure and a haplotype in block two (Table 4); however, after adjusting for multiple comparisons (p-value = 0.04*7 = 0.28), this result was no longer statistically significant (p-value > 0.1).

Table 4
Association between CYP1A1/2 haplotypes and lung cancer, stratified by smoking status, among Latinos participating in the San Francisco Bay Area Lung Cancer Study, 1998-2003

Haplotype associations among African Americans

Associations between haplotypes and lung cancer in African Americans with and without adjustment for admixture are presented in Table 5. None of the haplotypes were associated with lung cancer and only one of the admixture adjusted estimates differed from the crude estimates by more than 10%. Further analyses did not include individual admixture as a covariate since strong evidence of confounding by population stratification was not apparent.

Table 5
Assessment of confounding by population stratification for the association between CYP1A1/2 haplotypes and lung cancer among African Americans participating in the San Francisco Bay Area Lung Cancer Study, 1998-2003

Associations remained null when stratifying by the median European ancestry (17%) among the African American controls (data not shown). Evaluation of interaction between smoking exposure and haplotypes in African Americans revealed no significant interactions between CYP1A1/2 haplotypes and smoking and no consistent relationships were observed among any of the haplotype blocks (data not shown).

Single SNP associations

Among Latinos, two SNPs (rs2472299 and rs762551) in haplotype 2C were significantly (p < 0.007) associated with an increased risk of lung cancer (Table 6), analogous to the positive association identified with haplotype 2C. The remaining associations with lung cancer for SNPs present in this haplotype were not significant. Only one SNP (rs11072508) in haplotype 3B was associated with a decreased risk of lung cancer (p = 0.007) (Supplemental Table 4), corresponding to the decreased haplotype association observed with haplotype 3B. No single SNP associations were conducted for African Americans since haplotype associations were null.

Table 6
Single SNP analysis for SNPs present in haplotype 2C, among Latinos participating in the San Francisco Bay Area Lung Cancer Study, 1998-2003


To examine whether the frequently posited CYP1A1/2 region is associated with lung cancer incidence, haplotypes in the chromosome 15q region were analyzed in a genetic association study of lung cancer. Haplotype block structure inferred from extensive genotyping data in the CYP1A1/2 region differed between African Americans and Latinos, with African Americans demonstrating greater haplotype diversity and smaller blocks as expected based on known population origins.

Evidence suggests a positive association between haplotype 2C and lung cancer in Latinos. Haplotype 2C was consistently associated with lung cancer in these analyses, suggesting a genetic variant in haplotype 2C confers an increased risk of lung cancer in Latinos. The increased association between haplotype 2C and lung cancer in Latinos was robust, not only remaining after adjusting for admixture but becoming stronger when observed among participants with European ancestry greater than or equal to 54%. Although the odds ratios for haplotype 2C stratified by European ancestry did not significantly differ, the observed association suggests Latinos having high European ancestry may carry a susceptibility variant located in haplotype 2C. Exploratory analyses suggest an increased risk for lung cancer for this haplotype among light smokers, consistent with reports in the literature suggesting a greater susceptibility to lung cancer for certain gene variants at lower carcinogen levels (33).

Within haplotype 2C are two SNPs (rs762551 and rs2472299) with variant alleles present only in this common haplotype of block 2. Single SNP analyses for SNPs contained within haplotype 2C confirmed these were the only two SNPs associated with an increased risk of lung cancer. SNP rs762551, located in intron 1, was found to have different frequencies between high and low metabolic phenotypes for CYP1A2, although not statistically significant perhaps due to small sample sizes (34). Jiang notes intron 1 likely contains the regulatory region of CYP1A2 since it is highly conserved between human, rat, and mouse genes (34). A recent study by Aklillu and colleagues (35) conducted in Ethiopians found this same variant (rs762551) alone and in a haplotype did not influence CYP1A2 activity supporting the lack of an association observed in the African Americans in this study. Pavanello et al. identified not only increased CYP1A2 metabolic activity but also increased urine mutagenicity among Italian heavy smokers having the ancestral A allele of this variant (36). Sachse et al. describe increased CYP1A2 activity for the A/A genotype of this variant in Caucasians (37). Another report demonstrated Swedish subjects homozygous for the A/A genotype had increased metabolic activity which was not observed in Koreans (8). The variant C allele and C/C genotype was associated with an increased risk of lung cancer in our Latino study population. The reason for the associations with opposite alleles of our study and prior studies is unknown but may be due to differences in linkage disequilibrium with other genetic variants (38). The significant association with rs2472299, which is located upstream of both CYP1A1 and CYP1A2, supports the involvement of this regulatory region with lung cancer (2).

The combined results of this study and previously published results suggest haplotype 2C may contain a variant, possibly rs2472299 or rs762551, in the CYP1A2 locus with a functional role in the genetic susceptibility of lung cancer for certain racial/ethnic groups. While it is possible the observed association of an increased lung cancer risk with haplotype 2C is a result of linkage disequilibrium with M1 and M2 in CYP1A1 rather than other variants captured by this haplotype, it is difficult to separate the individual effect of these linked polymorphisms from the variants present on haplotype 2C.

To our knowledge, only two CYP1A1 studies of lung cancer have been conducted in Latinos (16, 17), one of which includes some subjects in this analysis (16). While this study did not directly assess the M1 and M2 variants, linkage disequilibrium between these variants and SNPs in block 1 allows associations between M1 and M2 and lung cancer to be elucidated. Results of this analysis do not corroborate prior findings with these variants perhaps due to small sample sizes or different linkage disequilibrium patterns in the Latino populations. Reasons for these inconsistent results are unknown.

The significance of the negative associations observed with haplotype B in block 3 and single SNP rs11072508, located within haplotype 3B, in Latinos is unclear. SNP rs11072508 maps to the CYP1A2 gene and is the only SNP in haplotype 3B that displays a variant allele compared to the other haplotypes. This variant allele may be associated with low CYP1A2 enzyme activity reducing the risk of lung cancer or could be a marker locus in linkage disequilibrium with another variant associated with a reduced risk of lung cancer. It is also possible this association is a result of random variation. However, this haplotype was associated with a reduced risk of lung cancer in several analyses in this study, becoming significant after accounting for confounding by admixture and among non-smokers suggesting it may not be due to type I error.

CYP1A1 and CYP1A2 haplotypes do not appear to be associated with lung cancer in African Americans, although additional studies are necessary for confirmation. These findings are compatible with other published reports (13-15, 17) and with previous studies in this population (16). Among African Americans, little evidence for confounding by population stratification was apparent. Our results suggest the association between CYP1A1/2 and lung cancer differs by racial/ethnic group. The cancer promoting effects of tobacco smoke may be mediated by several metabolic pathways and it is possible other genes or enzymes may play an important role in lung cancer in African Americans.

Population stratification was present among Latino participants. After adjusting for admixture proportions, associations became less strong revealing positive confounding by population substructure. Importantly, the significance of associations remained after adjusting for admixture. Our results suggest population stratification may confound genetic association studies of lung cancer in Latinos.

A potential limitation of this study is the focus on only two CYP genes since the cancer promoting effects of tobacco smoke may be mediated by several metabolic pathways. Broad substrate specificity exists for CYP1A1/2 and other enzymes may metabolize cigarette smoke carcinogens. Consideration of CYP1A1/2 in concert with other metabolizing loci will allow evaluation of possible epistasis in lung cancer susceptibility. Another limitation is the M1 and M2 variants were not included on the Illumina CYP panel. However, these SNPs were genotyped using PCR technology and subsequently found to be in linkage disequilibrium with SNPs in the CYP1A1/2 haplotypes, allowing inferences to be made about M1 and M2 even though they were not directly assessed in the haplotypes. An important limitation of this study is its modest sample size which likely limits the statistical power for detection of weak associations with lung cancer. Moreover, the gene-environment interactions and sub-group analyses should be considered exploratory due to the limited statistical power. Replication of the results is warranted in studies with larger sample sizes of African Americans and Latinos to confirm the role of these two candidate genes.

A strength of this study is its haplotype-based approach to identifying a genetic variant associated with lung cancer. To our knowledge, no other lung cancer studies in African Americans or Latinos have done extensive genotyping allowing examination of haplotype associations in this CYP1A1/2 region. Haplotypes capture most of the genetic variation across a large chromosomal region, allowing for reduced genotyping efforts and an efficient statistical approach that provides more information than single marker analyses. Although there are genetic variants in the CYP1A1/2 region which were not genotyped, it is likely that many of the untested markers are in linkage disequilibrium with the inferred haplotypes. The presence of linkage disequilibrium allows interrogation of disease variants as long as a genotyped marker variant is in linkage disequilibrium with the disease susceptibility variant. Inclusion of the ancestry informative genetic marker panel provided requisite information regarding whether population heterogeneity is causing spurious associations and confounding this genetic association study of lung cancer. This analysis benefited from newly available statistical methods and genetic markers and was able to address potential confounding by population stratification. Adding ancestry markers adds confidence to these results based on relatively small sample sizes.

In summary, our results show consistent evidence variants in the CYP1A2 region may increase lung cancer risk among Latinos. Future studies should confirm this result in Latino populations and consider examining whether this association may be present in European populations. It is unknown whether variants comprising this haplotype or linked to the polymorphisms in this haplotype alter CYP1A2 enzyme activity. Demonstration of a phenotype for variants in this haplotype, such as examination of ethnic variation in blood levels of nicotine and cotinine and fine mapping of this region in an ancestral population such as Europeans would lend support for a susceptibility locus for lung cancer. Moreover, admixture was found to have an important impact on the relationship between CYP1A1/2 and lung cancer in Latinos, thus future studies of CYP1A1/2 in lung cancer should consider potential bias by population stratification prior to making inferences about results.

Supplementary Material

Supp 1

Supp 2


The authors thank the Northern California Cancer Center and Summit Medical Center for their assistance with case ascertainment. We thank Dr. Rick Kittles and Dr. Gabriel Silva for kindly providing ancestral DNA from West Africans and Mayans. The authors also thank Dr. John Belmont for his support with collection of the Mayan population samples.

Grant support:

This work was supported by grants from National Institute of Environmental Health Sciences (R01ES06717 to J.K.W., 2R01ES09137-06 to P.A.B.), National Institute of Arthritis and Musculoskeletal and Skin Diseases (R01AR050267 to M.F.S.) and National Institute of Diabetes and Digestive and Kidney Diseases (R01K071185 to M.F.S.).


Disclosure of Potential Conflicts of Interest: None declared



1. Ries LAG, Melbert D, Krapcho M, et al. SEER Cancer Statistics Review, 1975-2004 [database on the Internet] National Cancer Institute; Bethesda Maryland: [2007 April 17]. Available from1.
2. Corchero J, Pimprale S, Kimura S, Gonzalez FJ. Organization of the CYP1A cluster on human chromosome 15: implications for gene regulation. Pharmacogenetics. 2001;11:1–6. [PubMed]
3. Nebert DW, Dalton TP, Okey AB, Gonzalez FJ. Role of aryl hydrocarbon receptor-mediated induction of the CYP1 enzymes in environmental toxicity and cancer. J Biol Chem. 2004;279:23847–50. [PubMed]
4. Nebert DW, Dalton TP. The role of cytochrome P450 enzymes in endogenous signalling pathways and environmental carcinogenesis. Nat Rev Cancer. 2006;6:947–60. [PubMed]
5. Hecht SS. Cigarette smoking: cancer risks, carcinogens, and mechanisms. Langenbecks Arch Surg. 2006;391:603–13. [PubMed]
6. Bernauer U, Heinrich-Hirsch B, Tonnies M, Peter-Matthias W, Gundert-Remy U. Characterisation of the xenobiotic-metabolizing Cytochrome P450 expression pattern in human lung tissue by immunochemical and activity determination. Toxicol Lett. 2006;164:278–88. [PubMed]
7. Wei C, Caccavale RJ, Kehoe JJ, Thomas PE, Iba MM. CYP1A2 is expressed along with CYP1A1 in the human lung. Cancer Lett. 2001;171:113–20. [PubMed]
8. Ghotbi R, Christensen M, Roh HK, Ingelman-Sundberg M, Aklillu E, Bertilsson L. Comparisons of CYP1A2 genetic polymorphisms, enzyme activity and the genotype-phenotype relationship in Swedes and Koreans. Eur J Clin Pharmacol. 2007;63:537–46. [PubMed]
9. McLemore TL, Adelberg S, Liu MC, et al. Expression of CYP1A1 gene in patients with lung cancer: evidence for cigarette smoke-induced gene expression in normal lung tissue and for altered gene regulation in primary pulmonary carcinomas. J Natl Cancer Inst. 1990;82:1333–9. [PubMed]
10. Wooding SP, Watkins WS, Bamshad MJ, Dunn DM, Weiss RB, Jorde LB. DNA sequence variation in a 3.7-kb noncoding sequence 5' of the CYP1A2 gene: implications for human population history and natural selection. Am J Hum Genet. 2002;71:528–42. [PubMed]
11. Jiang Z, Dalton TP, Jin L, et al. Toward the evaluation of function in genetic variability: characterizing human SNP frequencies and establishing BAC-transgenic mice carrying the human CYP1A1_CYP1A2 locus. Hum Mutat. 2005;25:196–206. [PubMed]
12. Cosma G, Crofts F, Currie D, Wirgin I, Toniolo P, Garte SJ. Racial differences in restriction fragment length polymorphisms and messenger RNA inducibility of the human CYP1A1 gene. Cancer Epidemiol Biomarkers Prev. 1993;2:53–7. [PubMed]
13. Shields PG, Caporaso NE, Falk RT, et al. Lung cancer, race, and a CYP1A1 genetic polymorphism. Cancer Epidemiol Biomarkers Prev. 1993;2:481–5. [PubMed]
14. Cote ML, Wenzlaff AS, Bock CH, et al. Combinations of cytochrome P-450 genotypes and risk of early-onset lung cancer in Caucasians and African Americans: a population-based study. Lung Cancer. 2007;55:255–62. [PMC free article] [PubMed]
15. Wenzlaff AS, Cote ML, Bock CH, et al. CYP1A1 and CYP1B1 polymorphisms and risk of lung cancer among never smokers: a population-based study. Carcinogenesis. 2005;26:2207–12. [PubMed]
16. Wrensch M, Miike R, Sison J, et al. CYP1A1 variants and smoking-related lung cancer in San Francisco Bay Area Latinos and African Americans. Int J Cancer. 2005;113:141–7. [PubMed]
17. Ishibe N, Wiencke JK, Zuo ZF, McMillan A, Spitz M, Kelsey KT. Susceptibility to lung cancer in light smokers associated with CYP1A1 polymorphisms in Mexican- and African-Americans. Cancer Epidemiol Biomarkers Prev. 1997;6:1075–80. [PubMed]
18. Thomas DC, Witte JS. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol Biomarkers Prev. 2002;11:505–12. [PubMed]
19. Wacholder S, Rothman N, Caporaso N. Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol Biomarkers Prev. 2002;11:513–20. [PubMed]
20. Hansen HM, Wiemels JL, Wrensch M, Wiencke JK. DNA quantification of whole genome amplified samples for genotyping on a multiplexed bead array platform. Cancer Epidemiol Biomarkers Prev. 2007;16:1686–90. [PubMed]
21. The International HapMap Consortium The International HapMap Project. Nature. 2003;426:789–96. [PubMed]
22. Packer BR, Yeager M, Staats B, et al. SNP500Cancer: a public resource for sequence validation and assay development for genetic variation in candidate genes. Nucleic Acids Res. 2004;32:D528–32. [PMC free article] [PubMed]
23. Tian C, Hinds DA, Shigeta R, Kittles R, Ballinger DG, Seldin MF. A genomewide single- nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet. 2006;79:640–9. [PubMed]
24. Tian C, Hinds DA, Shigeta R, et al. A genomewide single-nucleotide-polymorphism panel for Mexican American admixture mapping. Am J Hum Genet. 2007;80:1014–23. [PubMed]
25. Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38:1358–70.
26. Gabriel SB, Schaffner SF, Nguyen H, et al. The structure of haplotype blocks in the human genome. Science. 2002;296:2225–9. [PubMed]
27. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5. [PubMed]
28. Kraft P, Cox DG, Paynter RA, Hunter D, De Vivo I. Accounting for haplotype uncertainty in matched association studies: a comparison of simple and flexible techniques. Genet Epidemiol. 2005;28:261–72. [PubMed]
29. Chakraborty R, Weiss KM. Frequencies of complex diseases in hybrid populations. Am J Phys Anthropol. 1986;70:489–503. [PubMed]
30. Chakraborty R. Gene admixture in human populations: Models and predictions. Yearb Phys Anthropol. 1986;29:1–43.
31. Hanis CL, Chakraborty R, Ferrell RE, Schull WJ. Individual admixture estimates: disease associations and individual risk of diabetes and gallbladder disease among Mexican-Americans in Starr County, Texas. Am J Phys Anthropol. 1986;70:433–41. [PubMed]
32. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc B. 1995;57:289–300.
33. Vineis P. Molecular epidemiology: low-dose carcinogens and genetic susceptibility. Int J Cancer. 1997;71:1–3. [PubMed]
34. Jiang Z, Dragin N, Jorge-Nebert LF, et al. Search for an association between the human CYP1A2 genotype and CYP1A2 metabolic phenotype. Pharmacogenet Genomics. 2006;16:359–67. [PubMed]
35. Aklillu E, Carrillo JA, Makonnen E, et al. Genetic polymorphism of CYP1A2 in Ethiopians affecting induction and expression: characterization of novel haplotypes with single-nucleotide polymorphisms in intron 1. Mol Pharmacol. 2003;64:659–69. [PubMed]
36. Pavanello S, Pulliero A, Lupi S, Gregorio P, Clonfero E. Influence of the genetic polymorphism in the 5'-noncoding region of the CYP1A2 gene on CYP1A2 phenotype and urinary mutagenicity in smokers. Mutat Res. 2005;587:59–66. [PubMed]
37. Sachse C, Brockmoller J, Bauer S, Roots I. Functional significance of a C-->A polymorphism in intron 1 of the cytochrome P450 CYP1A2 gene tested with caffeine. Br J Clin Pharmacol. 1999;47:445–9. [PMC free article] [PubMed]
38. Lin PI, Vance JM, Pericak-Vance MA, Martin ER. No gene is an island: the flip-flop phenomenon. Am J Hum Genet. 2007;80:531–8. [PubMed]
39. Wacholder S, Rothman N, Caporaso N. Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst. 2000;92:1151–8. [PubMed]