Large fractions of the human population do not express GSTM1 and GSTT1 (GSTM1/T1) enzymes because of deletions in these genes. These variations affect xenobiotic metabolism and have been evaluated in relation to lung cancer risk, mostly based on null/present gene models. We measured GSTM1/T1 heterozygous deletions, not tested in genome-wide association studies, in 2120 controls and 2100 cases from the Environment And Genetics in Lung cancer Etiology (EAGLE) study. We evaluated their effect on mRNA expression on lung tissue and peripheral blood samples and their association with lung cancer risk overall and by histology types. We tested the null/present, dominant and additive models using logistic regression. Cigarette smoking and gender were studied as possible modifiers. Gene expression from blood and lung tissue cells was strongly down-regulated in subjects carrying GSTM1/T1 deletions by both trend and dominant models (p<0.001). In contrast to the null/present model, analyses distinguishing subjects with 0, 1 or 2 GSTM1/T1 deletions revealed several associations. There was a decreased lung cancer risk in never-smokers (OR=0.44;95%CI=0.23–0.82; p=0.01) and women (OR=0.50;95%CI=0.28–0.90; p=0.02) carrying 1 or 2 GSTM1 deletions. Analogously, male smokers had an increased risk (OR=1.13;95%CI=1.0–1.28; p=0.05) and women a decreased risk (OR=0.78;95%CI=0.63–0.97; p=0.02) for increasing GSTT1 deletions. The corresponding gene-smoking and gene-gender interactions were significant (p<0.05). Our results suggest that decreased activity of GSTM1/T1 enzymes elevates lung cancer risk in male smokers, likely due to impaired carcinogens’ detoxification. A protective effect of the same mutations may be operative in never-smokers and women, possibly because of reduced activity of other genotoxic chemicals.
GST; copy numbers; gene expression; lung cancer; smoking and gender differences
Affordable early screening in subjects with high risk of lung cancer has great potential to improve survival from this deadly disease. We measured gene expression from lung tissue and peripheral whole blood (PWB) from adenocarcinoma cases and controls to identify dysregulated lung cancer genes that could be tested in blood to improve identification of at-risk patients in the future. Genome-wide mRNA expression analysis was conducted in 153 subjects (73 adenocarcinoma cases, 80 controls) from the Environment And Genetics in Lung cancer Etiology (EAGLE) study using PWB and paired snap-frozen tumor and non-involved lung tissue samples. Analyses were conducted using unpaired t-tests, linear mixed effects and ANOVA models. The area under the receiver operating characteristic curve (AUC) was computed to assess the predictive accuracy of the identified biomarkers. We identified 50 dysregulated genes in stage I adenocarcinoma versus control PWB samples (False Discovery Rate ≤0.1, fold change ≥1.5 or ≤0.66). Among them, eight (TGFBR3, RUNX3, TRGC2, TRGV9, TARP, ACP1, VCAN, and TSTA3) differentiated paired tumor versus non-involved lung tissue samples in stage I cases, suggesting a similar pattern of lung cancer-related changes in PWB and lung tissue. These results were confirmed in two independent gene expression analyses in a blood-based case-control study (n=212) and a tumor-non tumor paired tissue study (n=54). The eight genes discriminated patients with lung cancer from healthy controls with high accuracy (AUC=0.81, 95% CI=0.74–0.87). Our finding suggests the use of gene expression from PWB for the identification of early detection markers of lung cancer in the future.
microarray gene expression; peripheral blood; lung cancer; stage I
While lung cancer is largely caused by tobacco smoking, inherited genetic factors play a role in its etiology. Genome-wide association studies (GWAS) in Europeans have robustly demonstrated only three polymorphic variations influencing lung cancer risk. Tumor heterogeneity may have hampered the detection of association signal when all lung cancer subtypes were analyzed together. In a GWAS of 5,355 European smoking lung cancer cases and 4,344 smoking controls, we conducted a pathway-based analysis in lung cancer histologic subtypes with 19,082 SNPs mapping to 917 genes in the HuGE-defined “inflammation” pathway. We identified a susceptibility locus for squamous cell lung carcinoma (SQ) at 12p13.33 (RAD52, rs6489769), and replicated the association in three independent samples totaling 3,359 SQ cases and 9,100 controls (odds ratio=1.20, Pcombined=2.3×10−8).
The combination of pathway-based approaches and information on disease specific subtypes can improve the identification of cancer susceptibility loci in heterogeneous diseases.
Lung cancer; histology; squamous cell carcinoma; pathway analysis; RAD52
The molecular drivers that determine histology in lung cancer are largely unknown. We investigated whether microRNA (miR) expression profiles can differentiate histological subtypes and predict survival for non-small cell lung cancer.
We analyzed miR expression in 165 adenocarcinoma (AD) and 125 squamous cell carcinoma (SQ) tissue samples from the Environmental And Genetics in Lung cancer Etiology (EAGLE) study using a custom oligo array with 440 human mature antisense miRs. We compared miR expression profiles using t-tests and F-tests and accounted for multiple testing using global permutation tests. We assessed the association of miR expression with tobacco smoking using Spearman correlation coefficients and linear regression models, and with clinical outcome using log-rank tests, Cox proportional hazards and survival risk prediction models, accounting for demographic and tumor characteristics.
MiR expression profiles strongly differed between AD and SQ (global p<0.0001), particularly in the early stages, and included miRs located on chromosome loci most often altered in lung cancer (e.g., 3p21-22). Most miRs, including all members of the let-7 family, were down-regulated in SQ. Major findings were confirmed by QRT-PCR in EAGLE samples and in an independent set of lung cancer cases. In SQ, low expression of miRs down-regulated in the histology comparison was associated with 1.2 to 3.6-fold increased mortality risk. A 5-miR signature significantly predicted survival for SQ.
We identified a miR expression profile that strongly differentiated AD from SQ and had prognostic implications. These findings may lead to histology-based therapeutic approaches.
Genome-wide association studies (GWAS) focus on relatively few highly significant loci while less attention is given to other genotyped markers. Employing pathway analysis to existing GWAS data may shed light on relevant biological processes, and illuminate new candidate genes. We employed a pathway-based approach to the breast cancer GWAS data of the National Cancer Institute (NCI) Cancer Genetic Markers of Susceptibility (CGEMS) project that includes 1145 cases and 1142 controls. Pathways were retrieved from three databases: KEGG, BioCarta, and the NCI’s Protein Interaction Database (PID). Genes were represented by their most strongly associated SNP, and an enrichment score (ES) reflecting the overrepresentation of gene-based association signals in each pathway was calculated using a weighted Kolmogorov-Smirnov procedure. Finally, hierarchical clustering was used to identify pathways with overlapping genes, and clusters with excess of association signals were determined by the adaptive rank-truncated product (ARTP) method. A total of 421 pathways containing 3962 genes were included in our study. Of these, three pathways (‘Syndecan-1-mediated signaling ‘, ‘Signaling of Hepatocyte Growth Factor Receptor’ and ‘Growth Hormone Signaling’) were highly enriched with association signals (PES < 0.001, False Discovery Rate (FDR) = 0.118). Our clustering analysis revealed that pathways containing key components of the RAS/RAF/MAPK canonical signaling cascade, were significantly more likely to have excess of association signals than expected by chance (PARTP = 0.0051, FDR = 0.07). These results suggest that genetic alterations associated with these three pathways and one canonical signaling cascade may contribute to breast cancer susceptibility.
Pathways; GWAS; Breast cancer; Susceptibility; Genetics
Epidemiological and mechanistic evidence on the association of quercetin-rich food intake with lung cancer risk and carcinogenesis are inconclusive. We investigated the role of dietary quercetin and the interaction between quercetin and P450 and glutathione S-transferase (GST) polymorphisms on lung cancer risk in 1822 incident lung cancer cases and 1991 frequency-matched controls from the Environment And Genetics in Lung cancer Etiology study. In non-tumor lung tissue from 38 adenocarcinoma patients, we assessed the correlation between quercetin intake and messenger RNA expression of the same P450 and GST metabolic genes. Multivariate odds ratios (ORs) and 95% confidence intervals (CIs) for sex-specific quintiles of intake were calculated using unconditional logistic regression adjusting for putative risk factors. Frequent intake of quercetin-rich foods was inversely associated with lung cancer risk (OR = 0.49; 95% CI: 0.37–0.67; P-trend < 0.001) and did not differ by P450 or GST genotypes, gender or histological subtypes. The association was stronger in subjects who smoked >20 cigarettes per day (OR = 0.35; 95% CI: 0.19–0.66; P-trend = 0.003). Based on a two-sample t-test, we compared gene expression and high versus low consumption of quercetin-rich foods and observed an overall upregulation of GSTM1, GSTM2, GSTT2, and GSTP1 as well as a downregulation of specific P450 genes (P-values < 0.05, adjusted for age and smoking status). In conclusion, we observed an inverse association of quercetin-rich food with lung cancer risk and identified a possible mechanism of quercetin-related changes in the expression of genes involved in the metabolism of tobacco carcinogens in humans. Our findings suggest an interplay between quercetin intake, tobacco smoking, and lung cancer risk. Further research on this relationship is warranted.
Although pneumonia has been suggested as a risk factor for lung cancer, previous studies have not evaluated the influence of number of pneumonia diagnoses in relation to lung cancer risk.
The Environment And Genetics in Lung cancer Etiology (EAGLE) population-based study of 2,100 cases and 2,120 controls collected information on pneumonia more than one year before enrollment from 1,890 cases and 2,078 controls.
After adjusting for study design variables, smoking, and chronic bronchitis, pneumonia was associated with decreased risk of lung cancer (odds ratio (OR), 0.79; 95% confidence interval (CI), 0.64–0.97), especially among individuals with ≥3 diagnoses versus none (OR, 0.35; 95% CI, 0.16–0.75). Adjustment for chronic bronchitis contributed to this inverse association. In comparison, pulmonary tuberculosis was not associated with lung cancer (OR, 0.96; 95% CI, 0.62–1.48).
The apparent protective effect of pneumonia among individuals with multiple pneumonia diagnoses may reflect an underlying difference in immune response and requires further investigation and confirmation.
Careful evaluation of number of pneumonia episodes may shed light on lung cancer etiology.
pneumonia; epidemiology; lung cancer; multiple infections; tuberculosis
Lung cancer kills more than 1 million people worldwide each year. Whereas several human papillomavirus (HPV)–associated cancers have been identified, the role of HPV in lung carcinogenesis remains controversial.
We selected 450 lung cancer patients from an Italian population–based case–control study, the Environment and Genetics in Lung Cancer Etiology. These patients were selected from those with an adequate number of unstained tissue sections and included all those who had never smoked and a random sample of the remaining patients. We used real-time polymerase chain reaction (PCR) to test specimens from these patients for HPV DNA, specifically for E6 gene sequences from HPV16 and E7 gene sequences from HPV18. We also tested a subset of 92 specimens from all never-smokers and a random selection of smokers for additional HPV types by a PCR-based test for at least 54 mucosal HPV genotypes. DNA was extracted from ethanol- or formalin-fixed paraffin-embedded tumor tissue under strict PCR clean conditions. The prevalence of HPV in tumor tissue was investigated.
Specimens from 399 of 450 patients had adequate DNA for analysis. Most patients were current (220 patients or 48.9%) smokers, and 92 patients (20.4%) were women. When HPV16 and HPV18 type–specific primers were used, two specimens were positive for HPV16 at low copy number but were negative on additional type-specific HPV16 testing. Neither these specimens nor the others examined for a broad range of HPV types were positive for any HPV type.
When DNA contamination was avoided and state-of-the-art highly sensitive HPV DNA detection assays were used, we found no evidence that HPV was associated with lung cancer in a representative Western population. Our results provide the strongest evidence to date to rule out a role for HPV in lung carcinogenesis in Western populations.
MiR arrays distinguish themselves from gene expression arrays by their more limited number of probes, and the shorter and less flexible sequence in probe design. Robust data processing and analysis methods tailored to the unique characteristics of miR arrays are greatly needed. Assumptions underlying commonly used normalization methods for gene expression microarrays containing tens of thousands or more probes may not hold for miR microarrays. Findings from previous studies have sometimes been inconclusive or contradictory. Further studies to determine optimal normalization methods for miR microarrays are needed.
We evaluated many different normalization methods for data generated with a custom-made two channel miR microarray using two data sets that have technical replicates from several different cell lines. The impact of each normalization method was examined on both within miR error variance (between replicate arrays) and between miR variance to determine which normalization methods minimized differences between replicate samples while preserving differences between biologically distinct miRs.
Lowess normalization generally did not perform as well as the other methods, and quantile normalization based on an invariant set showed the best performance in many cases unless restricted to a very small invariant set. Global median and global mean methods performed reasonably well in both data sets and have the advantage of computational simplicity.
Researchers need to consider carefully which assumptions underlying the different normalization methods appear most reasonable for their experimental setting and possibly consider more than one normalization approach to determine the sensitivity of their results to normalization method used.
Investigators planning studies within cohorts have many options for choosing an efficient sampling design for genome-wide association and other molecular epidemiology studies. Consideration of person-year and proportional hazards analyses of full cohorts may add further insight. Empirical evidence from genome-wide association studies can supplement intuition and simulations in comparing properties of various case-control designs within cohorts. Additional theoretical and empirical work, justification of sampling choice in publications, and consideration of context and scientific aims can improve designs and, thereby, increase the scientific value and cost-effectiveness of future studies.
control sampling; genome-wide; empirical study
Chronic obstructive pulmonary disease (COPD) has been consistently associated with increased risk of lung cancer. However, previous studies have had limited ability to determine whether the association is due to smoking.
The Environment And Genetics in Lung cancer Etiology (EAGLE) population-based case-control study recruited 2100 cases and 2120 controls, of whom 1934 cases and 2108 controls reported about diagnosis of chronic bronchitis, emphysema, COPD (chronic bronchitis and/or emphysema), or asthma more than 1 year before enrollment. We estimated odds ratios (OR) and 95% confidence intervals (CI) using logistic regression. After adjustment for smoking, other previous lung diseases, and study design variables, lung cancer risk was elevated among individuals with a history of chronic bronchitis (OR = 2.0, 95% CI = 1.5–2.5), emphysema (OR = 1.9, 95% CI = 1.4–2.8), or COPD (OR = 2.5, 95% CI = 2.0–3.1). Among current smokers, association between chronic bronchitis and lung cancer was strongest among lighter smokers. Asthma was associated with a decreased risk of lung cancer in males (OR = 0.48, 95% CI = 0.30–0.78).
These results suggest that the associations of personal history of chronic bronchitis, emphysema, and COPD with increased risk of lung cancer are not entirely due to smoking. Inflammatory processes may both contribute to COPD and be important for lung carcinogenesis.
Polymorphisms in genes coding for enzymes that activate tobacco lung carcinogens may generate inter-individual differences in lung cancer risk. Previous studies had limited sample sizes, poor exposure characterization, and a few single nucleotide polymorphisms (SNPs) tested in candidate genes. We analyzed 25 SNPs (some previously untested) in 2101 primary lung cancer cases and 2120 population controls from the Environment And Genetics in Lung cancer Etiology (EAGLE) study from six phase I metabolic genes, including cytochrome P450s, microsomal epoxide hydrolase, and myeloperoxidase. We evaluated the main genotype effects and genotype-smoking interactions in lung cancer risk overall and in the major histology subtypes. We tested the combined effect of multiple SNPs on lung cancer risk and on gene expression. Findings were prioritized based on significance thresholds and consistency across different analyses, and accounted for multiple testing and prior knowledge. Two haplotypes in EPHX1 were significantly associated with lung cancer risk in the overall population. In addition, CYP1B1 and CYP2A6 polymorphisms were inversely associated with adenocarcinoma and squamous cell carcinoma risk, respectively. Moreover, the association between CYP1A1 rs2606345 genotype and lung cancer was significantly modified by intensity of cigarette smoking, suggesting an underling dose-response mechanism. Finally, increasing number of variants at CYP1A1/A2 genes revealed significant protection in never smokers and risk in ever smokers. Results were supported by differential gene expression in non-tumor lung tissue samples with down-regulation of CYP1A1 in never smokers and up-regulation in smokers from CYP1A1/A2 SNPs. The significant haplotype associations emphasize that the effect of multiple SNPs may be important despite null single SNP-associations, and warrants consideration in genome-wide association studies (GWAS). Our findings emphasize the necessity of post-GWAS fine mapping and SNP functional assessment to further elucidate cancer risk associations.
Lung cancer is the leading cause of cancer mortality worldwide. Tobacco smoking is its primary cause, and yet the precise molecular alterations induced by smoking in lung tissue that lead to lung cancer and impact survival have remained obscure. A new framework of research is needed to address the challenges offered by this complex disease.
We designed a large population-based case-control study that combines a traditional molecular epidemiology design with a more integrative approach to investigate the dynamic process that begins with smoking initiation, proceeds through dependency/smoking persistence, continues with lung cancer development and ends with progression to disseminated disease or response to therapy and survival. The study allows the integration of data from multiple sources in the same subjects (risk factors, germline variation, genomic alterations in tumors, and clinical endpoints) to tackle the disease etiology from different angles. Before beginning the study, we conducted a phone survey and pilot investigations to identify the best approach to ensure an acceptable participation in the study from cases and controls. Between 2002 and 2005, we enrolled 2101 incident primary lung cancer cases and 2120 population controls, with 86.6% and 72.4% participation rate, respectively, from a catchment area including 216 municipalities in the Lombardy region of Italy. Lung cancer cases were enrolled in 13 hospitals and population controls were randomly sampled from the area to match the cases by age, gender and residence. Detailed epidemiological information and biospecimens were collected from each participant, and clinical data and tissue specimens from the cases. Collection of follow-up data on treatment and survival is ongoing.
EAGLE is a new population-based case-control study that explores the full spectrum of lung cancer etiology, from smoking addiction to lung cancer outcome, through examination of epidemiological, molecular, and clinical data. We have provided a detailed description of the study design, field activities, management, and opportunities for research following this integrative approach, which allows a sharper and more comprehensive vision of the complex nature of this disease. The study is poised to accelerate the emergence of new preventive and therapeutic strategies with potentially enormous impact on public health.
Tobacco smoking is responsible for over 90% of lung cancer cases, and yet the precise molecular alterations induced by smoking in lung that develop into cancer and impact survival have remained obscure.
We performed gene expression analysis using HG-U133A Affymetrix chips on 135 fresh frozen tissue samples of adenocarcinoma and paired noninvolved lung tissue from current, former and never smokers, with biochemically validated smoking information. ANOVA analysis adjusted for potential confounders, multiple testing procedure, Gene Set Enrichment Analysis, and GO-functional classification were conducted for gene selection. Results were confirmed in independent adenocarcinoma and non-tumor tissues from two studies. We identified a gene expression signature characteristic of smoking that includes cell cycle genes, particularly those involved in the mitotic spindle formation (e.g., NEK2, TTK, PRC1). Expression of these genes strongly differentiated both smokers from non-smokers in lung tumors and early stage tumor tissue from non-tumor tissue (p<0.001 and fold-change >1.5, for each comparison), consistent with an important role for this pathway in lung carcinogenesis induced by smoking. These changes persisted many years after smoking cessation. NEK2 (p<0.001) and TTK (p = 0.002) expression in the noninvolved lung tissue was also associated with a 3-fold increased risk of mortality from lung adenocarcinoma in smokers.
Our work provides insight into the smoking-related mechanisms of lung neoplasia, and shows that the very mitotic genes known to be involved in cancer development are induced by smoking and affect survival. These genes are candidate targets for chemoprevention and treatment of lung cancer in smokers.