|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: SW JJ ML JS PB NC. Performed the experiments: TD HL FM JF MH SM. Analyzed the data: SW JJ ML JF JS DC AP PB TD MR AD AB NC. Contributed reagents/materials/analysis tools: JJ PY ML PB NC. Wrote the paper: ML. Other: Participated in data interpretation: DC PB PY AP AD JF TD HL AB SM JJ NC JS SW MR. Read and approved the manuscript: JJ NC DC PB PY AP JF AB SM HL TD AD JS SW MR. Interpreted the data: ML.
Tobacco smoking is responsible for over 90% of lung cancer cases, and yet the precise molecular alterations induced by smoking in lung that develop into cancer and impact survival have remained obscure.
We performed gene expression analysis using HG-U133A Affymetrix chips on 135 fresh frozen tissue samples of adenocarcinoma and paired noninvolved lung tissue from current, former and never smokers, with biochemically validated smoking information. ANOVA analysis adjusted for potential confounders, multiple testing procedure, Gene Set Enrichment Analysis, and GO-functional classification were conducted for gene selection. Results were confirmed in independent adenocarcinoma and non-tumor tissues from two studies. We identified a gene expression signature characteristic of smoking that includes cell cycle genes, particularly those involved in the mitotic spindle formation (e.g., NEK2, TTK, PRC1). Expression of these genes strongly differentiated both smokers from non-smokers in lung tumors and early stage tumor tissue from non-tumor tissue (p<0.001 and fold-change >1.5, for each comparison), consistent with an important role for this pathway in lung carcinogenesis induced by smoking. These changes persisted many years after smoking cessation. NEK2 (p<0.001) and TTK (p=0.002) expression in the noninvolved lung tissue was also associated with a 3-fold increased risk of mortality from lung adenocarcinoma in smokers.
Our work provides insight into the smoking-related mechanisms of lung neoplasia, and shows that the very mitotic genes known to be involved in cancer development are induced by smoking and affect survival. These genes are candidate targets for chemoprevention and treatment of lung cancer in smokers.
Lung cancer is the leading cause of cancer death worldwide. Cigarette smoking is responsible for about 90% of lung cancers and decreases survival, and yet the precise molecular alterations induced by smoking in lung that develop into cancer and impact survival have remained obscure. Using Affymetrix HG-U133A microarrays on 135 fresh frozen adenocarcinoma and paired non-tumor tissue samples from current, former and never smokers from the Environment And Genetics in Lung cancer Etiology (EAGLE) study (http://dceg.cancer.gov/eagle), we sought to identify the genes that are altered by smoking in lung, and those, within the smoking signature, that have a role in lung carcinogenesis and outcome from lung cancer. We chose adenocarcinoma, the predominant histological subtype of lung cancer, because it occurs in subjects with no history of smoking as well as in smokers, providing a range of exposures ideal for the study of smoking-induced carcinogenesis. Specifically, in early stage adenocarcinoma tissue we compared gene expression from current (C) and never (N) smokers and identified the major genes using stringent criteria for gene selection (p<0.001 and fold change >1.5), the Benjamini-Hochberg procedure to calculate the False Discovery Rate (FDR), and Gene Ontology (GO) to classify the gene functional categories. We then verified whether the comparison between former (F) and never (N) smokers identified similar genes. We performed Gene Set Enrichment Analysis (GSEA) to identify common gene patterns where the single-gene analysis revealed only few overlapping genes. We further explored whether the genes that differentiated lung tumors of smokers from never smokers (C/N and F/N) also differentiated early stage tumor tissue (T) from paired non-tumor (NT) tissue to confirm the role of these genes in smoking-related lung carcinogenesis. We finally explored the impact of the smoking signature on survival from lung cancer in smokers. We validated C/N genes by Real Time-PCR in 68 samples used for the present microarray analysis, and confirmed them in 40 independent samples from EAGLE and a Mayo Clinic study of lung cancer.
This study included 105 subjects from EAGLE, a large population-based study of lung cancer conducted in the Lombardy region of Italy. EAGLE lung cancer cases were enrolled from the following 13 hospitals: A.O. Ospedale Niguarda Ca' Granda, Milano; A.O. Spedali Civili, Brescia; Istituto Clinico Humanitas, Rozzano (MI); Ospedale di Circolo e Fondazione Macchi, Varese; Fondazione IRCCS Ospedale Maggiore Policlinico, Mangiagalli and Regina Elena, Milano; Istituto Scientifico Universitario Ospedale San Raffaele, Milano; A.O. Ospedale Luigi Sacco, Milano; A.O. San Paolo, Milano; A.O. Ospedale San Carlo Borromeo, Milano; IRCCS Policlinico San Matteo, Pavia; A.O. San Gerardo, Monza; A.O. Ospedale Fatebenefratelli, Milano; Ospedale San Giuseppe, Milano. The healthy controls in EAGLE were randomly selected from the same residential area of the lung cancer cases. After description of the EAGLE study by the study personnel, and discussion with potential participants, written informed consent was obtained under a protocol approved by the Institutional Review Board of each participating hospital and by the National Cancer Institute (Bethesda, MD). Subjects in this gene expression study, 44–79 years old, had histologically confirmed primary adenocarcinoma of the lung, stages I–IV, and provided detailed smoking and medical history information.
Overall, 180 adenocarcinoma and non-tumor tissue samples were selected for the analyses, including duplicate or triplicate samples from 14 subjects for quality control. Samples had been snap-frozen in liquid nitrogen within 20 minutes of surgical resection. A single pathologist confirmed the hospital-based diagnosis of adenocarcinoma, estimated the presence of malignant cells in each sample based on H&E-stained fresh frozen sections, and classified the samples as Tumor (T) and Non-Tumor (NT). From the original 180 samples, 148 provided sufficient quantity of high-quality RNA for microarray analyses; 13 additional samples were excluded because of technical problems. Normalization was conducted on the remaining 135 microarrays; corresponding CEL files and information conform to the MIAME guidelines are publicly available on the GEO database (accession number=GSE10072). After normalization, 13 samples were excluded because of low percentage of tumor cells in the tumor tissues. This report is based on 122 samples, of which 15 duplicates/triplicates were averaged, resulting in 107 final expression values from 58 tumor and 49 non-tumor tissues from 20 never smokers, 26 former smokers, and 28 current smokers. Quality assurance and distribution of cell types across smoking groups are described in Appendix S1A, S1B, and S1C.
All statistical analyses were accomplished using R program language. Gene expression data were processed and normalized using Bioconductor Affy package, based on the Robust Multichip Average (RMA) method for single-channel Affymetrix chips. All 22,283 probe sets based on RMA summary measure were used in class comparison analyses.
Average linkage hierarchical clustering of samples was based on one minus Pearson correlation as the dissimilarity metric.
An ANOVA analysis adjusting for sex was used to test whether genes were differentially expressed between smoking groups (C/N and F/N), between tumor tissue and non-tumor tissue (T/NT), or by pack years of cigarette smoking. Further analyses adjusted by tumor grade or excluding 6 subjects with emphysema or chronic bronchitis or 3 subjects who received chemotherapy prior to the study were conducted, with essentially unaltered results. For analyses including paired tissues (T/NT tissue samples from the same subjects), a linear mixed effects model was used to account for intra-person correlation.
To limit false positive findings, genes were considered statistically significant if their p-values were less than the stringent threshold of 0.001. Under the null hypothesis of no difference in expression profiles, and considering the analysis of 22,283 probes, we expect that by chance the average number of false positive findings will be ≤23. We used the Benjamini-Hochberg procedure to calculate the False Discovery Rate (FDR). We further restricted significant genes to those which showed at least 1.5 fold ratio of geometric means of expression between two groups. Gene selection based on p<0.001 (two-sided) and fold-change >1.5 are referred to as “stringent criteria”.
The Cox Proportional Hazards model was used to estimate the effect of gene expression changes in C/N on survival from lung cancer in smokers. Of the 74 subjects included in this study (all stages), 34 (22 smokers) were alive, and 40 (32 smokers) were deceased as of May 2007. Among the deceased subjects, 36 died of lung cancer. The remaining 4 (2 smokers) died of other cancers and were censored at time of death in the analysis. The time from lung cancer to death or date of last follow-up was between 28 days and 5.0 years for the deceased subjects, and 3.7 and 5.7 years for the subjects alive in May 2007. The relative risk of gene expression was defined as the hazard ratio associated with one standard deviation change of the expression. Analyses were adjusted for stage, sex, and smoking. Age was similarly distributed across the groups and was not adjusted for in the analysis.
We verified the self-reported current smoking status by measuring plasma cotinine levels. The total cotinine (free plus cotinine N-glucuronide) concentration in plasma was quantified by GC/MS analysis using a method similar to that used for urinary cotinine, with the addition of a solid phase extraction step carried out on an MCX column (Waters Corporation, Milford, MA).
One individual who reported to have quit smoking 2.6 years before the study had high cotinine levels (135 ng/ml) and was re-classified as a current smoker.
Gene Set Enrichment Analysis (GSEA) was used to compare expression in groups of genes (gene-sets), between different tissues or between different comparison groups within the same tissue. GSEA analysis reveals a pattern of common gene-sets even when single-gene analysis reveals very few overlapping genes between groups. We modified the standard GSEA method by substituting an ANOVA test for the standard two-sample t-test to adjust for sex. Furthermore, we changed the permutation test for calculating the p-values by permuting residuals and using as weights the observed ANOVA coefficients divided by the standard error values. Up- and down-regulated genes were included in different gene-sets for the analyses.
We used quantitative real-time PCR (QRTPCR) to confirm the differential expression of 19 C/N selected genes (20 probes), including 14 genes from T and 5 from NT analyses. Primer and probe sets for the selected genes as well as control probes for GUSB and S18 (ABI) were run on 7500 Taqman under the manufacturer's standard protocol. Ct values were normalized based on GUSB expression.
Validation assays were performed in 68 samples used in the original microarray analyses, including 43 T (27 C and 16 N smokers), and 25 NT (18 C and 7 N smokers).
Confirmation assays were performed in 40 independent samples, including 19 T (12 C and 7 N smokers) and 21 NT samples (12 C and 9 N smokers). These samples were collected in EAGLE (10 T samples from 7 C and 3 N smokers, and 12 NT samples from 7 C and 5 N smokers-these samples were not used for the microarray analyses), and from the Mayo Clinic, Rochester, MN (9 T and 9 NT paired samples from 5 C and 4 N smokers).
To investigate the molecular changes associated with smoking in the tumor tissue, we compared gene expression changes between current and never (C/N) smokers (Table 1). To avoid potential alteration of gene expression due to advanced tumor status, we limited smoking comparisons in tumor tissue to the early stages (stages I and II). Unless specified differently, “T” samples represent early stage adenocarcinomas. Results from the advanced tumor stage tissues are reported for completeness in Appendix S2C.
Using stringent selection criteria (fold-change >1.5 and p-value<0.001), we identified 64 up- and 98 down-regulated probe-sets, representing 54 up- and 81 down-regulated genes (Appendix S2A, S2B). Most of the significantly up-regulated genes were involved in cell cycle/mitosis/cell division (e.g., TTK, CENPF, NEK2), while many of those down-regulated were involved in cell adhesion/cell cycle arrest (e.g., ADRB2, APLP2, MACF1), consistent with a role of these genes in neoplasia development.
The GoMiner results (Appendix S2D) confirmed that the mitosis genes (12 altered genes among the 127 mitotic genes on the HG-U133A chip, p<0.001), and more generally those involved in cell cycle were the most commonly altered in the tumor tissue (Table 2).
To verify whether the C/N smoking signature in the tumor was present also in former smokers, we compared the C/N and F/N signatures in T and found 26 probes (22 down- and 4 up-regulated, representing 21 genes) that differentiated both C/N and F/N using stringent selection criteria (Appendix S2E). Some of these genes, e.g., STOM, SSX2IP, TRPC6, APLP2 (2 probes), and DHRS7, exhibited a persistent alteration even in subjects (n=6) who quit smoking more than 20 years before the study. The GSEA analysis showed that among the 64 up- and 98 down-regulated probes found in the C/N comparison in T, 58 and 90 probes, representing 50 up- and 73 down-regulated genes, were also up- and down-regulated, respectively in the F/N smoking comparison (p<0.001, Fig. 1, and Appendix S2F, S2G). All cell cycle genes that differentiated C/N were also altered in F/N, although less prominently (Table 2), indicating that alterations of these genes persist following smoking cessation. Importantly, the mitosis/cell cycle genes identified in C/N and F/N also differentiated the early stage tumor from the non-tumor tissue samples (T/NT, paired analysis) (Table 2), while pack years of cigarette smoking, a composite index of intensity and duration that does not consider the time when smoking occurred, were not associated with gene expression in either T or NT.
The C/N comparison in NT revealed 28 up- and 75 down-regulated probes, representing 25 up- and 73 down-regulated genes with the stringent selection criteria (Table 1, and Appendix S3A, S3B). As expected, the CYP1B1 gene, known to be induced by smoking,  was strongly up-regulated. The GoMiner results showed that the most smoking-altered genes were involved in cellular defense response (5 of 90 cellular defense genes on the chip, p<0.001), and more generally in immune response (Appendix S3C).
MACF1, UBE21, and CBX7 (p<0.001), and C16orf30 (p=0.001) were shared between T and NT C/N comparisons. C16orf30 and UBE21, both on chromosome 16p13.3, are located within 246kb, but they do not appear to share specific transcriptional regulation mechanisms (Appendix S4A). The GSEA analysis revealed some similarities between T and NT in the overall pattern of smoking-induced alteration (p=0.08 and 0.04, for up- and down-regulated genes, respectively, Appendix S4B, S4C, and S4D). Notably, NEK2 and TTK were among those similarly altered in both T and NT in the GSEA analysis. In contrast, the F/N comparison in NT showed no statistically significant genes (Table 1), and was not further explored.
We studied the overall gene expression signature of smoking in T and NT (98+64 C/N in T, 75+28 C/N in NT, minus 3 overlapping probes between T and NT, for a total of 262 probe-sets representing 230 genes) in relation to survival from adenocarcinoma in smokers (n=54, Appendix S5A). Since only 262 probe-sets were included in this analysis, we used a less stringent criterion of p<0.01 for gene selection (Table 3). Altered expression in NT of genes involved in the mitotic spindle formation, e.g., NEK2 (p<0.001) and TTK (p=0.001) were associated with a 3-fold increased mortality risk (Table 3, analysis adjusted for stage, sex, and smoking).
We selected 19 genes (20 probes) for validation by QRTPCR, including 14 genes for T and 5 for NT tissue, based on fold change (>2) and cancer relevance.
Validation was based on 68 samples, including 43 T and 25 NT, also used for the microarray analysis. All 19 genes were up-regulated in the C/N comparison in these samples (Table 4).
Confirmation was based on 40 independent samples (19 T and 21 NT) from EAGLE (samples not used for microarray analysis) and the Mayo Clinic, Rochester, MN. All the 14 genes in T and 4 of 5 genes in NT were up-regulated by smoking also in the independent samples (Table 4).
In a population-based study with fresh frozen tissue samples of adenocarcinoma and noninvolved lung tissue (mostly paired samples), we identified a smoking signature that persists years after smoking cessation and is related to lung cancer development and survival.
Aneuploidy and chromosome instability are two of the most common abnormalities in cancer cells that arise through unequal segregation of chromosomes between daughter cells during mitosis. Thus, mitotic alterations are highly relevant for carcinogenesis. We found that smoking induces deregulation of this very mitotic process proceeding from lung tissue changes through cancer development to cancer death or survival. In fact, the smoking signature we identified comprises genes that regulate the mitotic spindle formation. These genes, such as NEK2,  and CENPF (both on 1q32-q41), TPX2,  and STK6 (or AURKA) (related to the Aurora-A activation pathway important in tumor progression), TTK (linked to cell mitosis through EGFR, a critical drug target for lung adenocarcinoma), and BIRC5 (Survivin), have all been found over-expressed in smoking-related tumors. While previous studies have proposed these genes as targets for therapeutic interventions,, – our work suggests that they may be targets for chemoprevention in smokers as well. In fact, they were strongly induced by smoking in the early stage tumor tissue and some, e.g., NEK2 and TTK, were also associated with increased mortality risk. The latter finding was most evident in non-tumor tissue, likely reflecting the widely recognized field-cancerization effect by smoking, while in the tumor tissue, smoking-related genes' effects on survival may be masked by extensive molecular alterations occurring during tumorigenesis.
In the non-tumor tissue, current smoking strongly altered immune response genes, consistent with the defense mechanisms of the lung tissue against the acute toxic effects of smoking. Among the gene most strongly down-regulated in NT was CX3CR1, located on chromosome 3p21.3, an area known to be often deleted in lung cancer, particularly in smokers.
Current knowledge of gene expression altered by cigarette smoking is based on bronchoscopy-obtained airway epithelial cells or macrophages, – or peripheral leukocytes from healthy smokers rather than directly on lung tissue. The few studies with lung tissue samples are very small or used RNA amplification or RNA pooling methods. Our results are consistent with some previous findings, such as smoking-related alteration of CYP1B1,  or of the mitotic pathway in cancer survival. However, earlier studies were often limited by the small sample size, or lacked information on potential confounders, or availability of paired tumor and non-tumor lung tissue samples for the distinction of gene changes involved in lung carcinogenesis from those representing a transient smoking effect. We overcame these pitfalls with a relatively large sample size of fresh tumor and non-tumor lung tissues, detailed covariate information (e.g., sex, age, stage, previous lung diseases or chemotherapy), biochemical validation of the smoking status, and confirmation of the main findings in independent tissue samples.
In conclusion, our study provides clues on how cigarette smoking affects lung cancer development and survival. Functional assays to confirm these findings are warranted. If confirmed, these genes could become important targets for chemoprevention and treatment for lung cancer in smokers.
Quality Assurance. 1A Description of analysis of sample quality assurance 1B Samples' description 1C Surfactant genes in Tumor (T) and Non-Tumor (NT) lung tissues by smoking
(0.07 MB DOC)
Current/Never (C/N) and Former/Never (F/N) smoking comparisons in early stage Tumor (T) tissue. 2A Current/Never (C/N) comparison, early stage Tumor (T) tissues: up-regulated probes. 2B Current/Never (C/N) comparison, early stage Tumor (T) tissues: down-regulated probes. 2C Current/Never (C/N) comparison, late stage Tumor tissues: up+down-regulated probes. 2D Gene Ontology (GO) functional categories for the Current/Never (C/N) smoker comparison. 2E Current/Never (C/N) and Former/Never (F/N) comparisons: overlapping probe list. 2F Gene list from GSEA comparison of up-regulated C/N genes and F/N genes in early stage Tumor (T) tissues. 2G Gene list from GSEA comparison of down-regulated C/N genes and F/N genes in early stage Tumor (T) tissues.
(0.62 MB DOC)
Current/Never (C/N) smoking comparisons in Non-Tumor (NT) lung tissue. 3A Current/Never (C/N) comparison in Non-Tumor (NT) lung tissues: up-regulated probes. 3B Current/Never (C/N) comparison in Non-Tumor (NT) lung tissues: down-regulated probes . 3C Gene Ontology (GO) functional categories for the Current/Never (C/N) comparison (up and down-regulated genes) in Non-Tumor (NT) lung tissues.
(0.21 MB DOC)
Comparison between Tumor (T) and Non-Tumor (NT) lung tissue for the genes whose expression significantly differentiates Current from Never smokers (C/N) in early stage lung Tumor (T). 4A C16orf30 and UBE21 transcription sites. 4B Comparison of C/N results in early stage Tumor (T) tissues vs. C/N results in Non-Tumor (NT) lung tissues by GSEA analysis. 4C Gene list from GSEA comparison of up-regulated C/N genes between early stage Tumor (T) tissues and Non-Tumor (NT) tissues. 4D Gene list from GSEA comparison of down-regulated C/N genes between early stage Tumor (T) tissues and Non-Tumor (NT) tissues.
(0.51 MB DOC)
Mortality risk in smokers associated with the expression of genes differentiating Current from Never smokers (C/N) in Tumor and Non-Tumor tissue samples. 5A Current/Never (C/N) genes and related mortality risk in Tumor and Non-Tumor lung tissues (all stages) from Current and Former smokers.
(0.55 MB DOC)
We thank Ms. Juliet Joly for her technical help with the manuscript preparation, and Drs. Subhashree Madhavan and Wei Lin for their help with the CEL files on the GEO site, Drs. Abbas Shakoori and Konstantin Shilo for assistance with the pathology specimens, and Dr. Kenneth Buetow for support.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This study was supported by the Intramural Research Program of NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics and the Center for Cancer Research, and from the NIH-R01-84354 grant to P.Y. J.F. was supported by the Cancer Prevention Fellowship Program, National Cancer Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.