1.  Relation between smoking history and gene expression profiles in lung adenocarcinomas 
BMC Medical Genomics  2012;5:22.
Lung cancer is the worldwide leading cause of death from cancer. Tobacco usage is the major pathogenic factor, but all lung cancers are not attributable to smoking. Specifically, lung cancer in never-smokers has been suggested to represent a distinct disease entity compared to lung cancer arising in smokers due to differences in etiology, natural history and response to specific treatment regimes. However, the genetic aberrations that differ between smokers and never-smokers’ lung carcinomas remain to a large extent unclear.
Unsupervised gene expression analysis of 39 primary lung adenocarcinomas was performed using Illumina HT-12 microarrays. Results from unsupervised analysis were validated in six external adenocarcinoma data sets (n=687), and six data sets comprising normal airway epithelial or normal lung tissue specimens (n=467). Supervised gene expression analysis between smokers and never-smokers were performed in seven adenocarcinoma data sets, and results validated in the six normal data sets.
Initial unsupervised analysis of 39 adenocarcinomas identified two subgroups of which one harbored all never-smokers. A generated gene expression signature could subsequently identify never-smokers with 79-100% sensitivity in external adenocarcinoma data sets and with 76-88% sensitivity in the normal materials. A notable fraction of current/former smokers were grouped with never-smokers. Intriguingly, supervised analysis of never-smokers versus smokers in seven adenocarcinoma data sets generated similar results. Overlap in classification between the two approaches was high, indicating that both approaches identify a common set of samples from current/former smokers as potential never-smokers. The gene signature from unsupervised analysis included several genes implicated in lung tumorigenesis, immune-response associated pathways, genes previously associated with smoking, as well as marker genes for alveolar type II pneumocytes, while the best classifier from supervised analysis comprised genes strongly associated with proliferation, but also genes previously associated with smoking.
Based on gene expression profiling, we demonstrate that never-smokers can be identified with high sensitivity in both tumor material and normal airway epithelial specimens. Our results indicate that tumors arising in never-smokers, together with a subset of tumors from smokers, represent a distinct entity of lung adenocarcinomas. Taken together, these analyses provide further insight into the transcriptional patterns occurring in lung adenocarcinoma stratified by smoking history.
PMCID: PMC3447685  PMID: 22676229
Lung cancer; Smoking; Gene expression analysis; Adenocarcinoma; EGFR; Never-smokers; Immune response
2.  Gene expression subtraction of non-cancerous lung from smokers and non-smokers with adenocarcinoma, as a predictor for smokers developing lung cancer 
Lung cancer is the commonest cause of cancer death in developed countries. Adenocarcinoma is becoming the most common form of lung cancer. Cigarette smoking is the main risk factor for lung cancer. Long-term cigarettes smoking may be characterized by genetic alteration and diffuse injury of the airways surface, named field cancerization, while cancer in non-smokers is usually clonally derived. Detecting specific genes expression changes in non-cancerous lung in smokers with adenocarcinoma may give us instrument for predicting smokers who are going to develop this malignancy.
We described the gene expression in non-cancerous lungs from 21 smoker patients with lung adenocarcinoma and compare it to gene expression in non-cancerous lung tissue from 10 non-smokers with primary lung adenocarcinoma.
Total RNA was isolated from peripheral non-cancerous lung tissue. The cDNA was hybridized to the U133A GeneChip array. Hierarchical clustering analysis on genes obtained from smokers and non-smokers, after subtracting were exported to the Ingenuity Pathway Analysis software for further analysis.
The genes subtraction resulted in disclosure of 36 genes with high score. They were subsequently mapped and sorted based on location, cellular components, and biochemical activity. The gene functional analysis disclosed 20 genes, which are involved in cancer process (P = 7.05E-5 to 2.92E-2).
Detected genes may serve as a predictor for smokers who may be at high risk of developing lung cancer. In addition, since these genes originating from non-cancerous lung, which is the major area of the lungs, a sample from an induced sputum may represent it.
PMCID: PMC2570656  PMID: 18811983
3.  The Role of DNA Methylation in the Development and Progression of Lung Adenocarcinoma 
Disease Markers  2007;23(1-2):5-30.
Lung cancer, caused by smoking in ∼87% of cases, is the leading cause of cancer death in the United States and Western Europe. Adenocarcinoma is now the most common type of lung cancer in men and women in the United States, and the histological subtype most frequently seen in never-smokers and former smokers. The increasing frequency of adenocarcinoma, which occurs more peripherally in the lung, is thought to be at least partially related to modifications in cigarette manufacturing that have led to a change in the depth of smoke inhalation. The rising incidence of lung adenocarcinoma and its lethal nature underline the importance of understanding the development and progression of this disease. Alterations in DNA methylation are recognized as key epigenetic changes in cancer, contributing to chromosomal instability through global hypomethylation, and aberrant gene expression through alterations in the methylation levels at promoter CpG islands. The identification of sequential changes in DNA methylation during progression and metastasis of lung adenocarcinoma, and the elucidation of their interplay with genetic changes, will broaden our molecular understanding of this disease, providing insights that may be applicable to the development of targeted drugs, as well as powerful markers for early detection and patient classification.
PMCID: PMC3851711  PMID: 17325423
AAH; adenocarcinoma; BAC; CpG island; DNA methylation; hypermethylation; hypomethylation
4.  A smoking-associated 7-gene signature for lung cancer diagnosis and prognosis 
International Journal of Oncology  2012;41(4):1387-1396.
Smoking is responsible for 90% of lung cancer cases. There is currently no clinically available gene test for early detection of lung cancer in smokers, or an effective patient selection strategy for adjuvant chemotherapy in lung cancer treatment. In this study, concurrent coexpression with multiple signaling pathways was modeled among a set of genes associated with smoking and lung cancer survival. This approach identified and validated a 7-gene signature for lung cancer diagnosis and prognosis in smokers using patient transcriptional profiles (n=847). The smoking-associated gene coexpression networks in lung adenocarcinoma tumors (n=442) were highly significant in terms of biological relevance (network precision = 0.91, FDR<0.01) when evaluated with numerous databases containing multi-level molecular associations. The gene coexpression network in smoking lung adenocarcinoma patients was confirmed in qRT-PCR assays of the identified biomarkers and involved signaling pathway genes in human lung adenocarcinoma cells (H23) treated with 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK). Furthermore, the western blotting results of p53, phospho-p53, Rb and EGFR in NNK-treated H23 and transformed normal human lung epithelial cells (BEAS-2B) support their functional involvement in smoking-induced lung cancer carcinogenesis and progression.
PMCID: PMC3481011  PMID: 22825454
smoking; lung cancer diagnosis and prognosis; gene signature; signaling pathway; coexpression networks
6.  An embryonic stem cell-like signature identifies poorly-differentiated lung adenocarcinoma, but not squamous cell carcinoma 
An embryonic stem cell profile correlates with poorly differentiated breast, bladder and glioma cancers. In this manuscript, we assess the correlation between the embryonic stem cell profile and clinical variables in lung cancer.
Experimental Design
Microarray gene expression analysis was done using Affymetrix Human Genome U133A on 443 samples of human lung adenocarcinoma and 130 samples of squamous cell carcinoma. To identify gene-set enrichment patterns we used the Genomica software.
Our analysis showed that an increased expression of the embryonic stem cell gene set and decreased expression of Polycomb target gene set identified poorly-differentiated lung adenocarcinoma. In addition, this gene expression signature was associated with markers of poor prognosis and worse overall survival in lung adenocarcinoma. However, there was no correlation between this embryonic stem cell gene signature and any histological or clinical variable assessed in lung squamous cell carcinoma.
This work suggests that not all poorly-differentiated non-small cell lung cancers exhibit a gene expression profile similar to ESC, and that other characteristics may play a more important role in the determination of differentiation and survival in squamous cell carcinoma of the lung.
PMCID: PMC2787085  PMID: 19808871
Embryonic genes; stem cell; Affymetrix; lung; cancer
7.  Effect of active smoking on the human bronchial epithelium transcriptome 
BMC Genomics  2007;8:297.
Lung cancer is the most common cause of cancer-related deaths. Tobacco smoke exposure is the strongest aetiological factor associated with lung cancer. In this study, using serial analysis of gene expression (SAGE), we comprehensively examined the effect of active smoking by comparing the transcriptomes of clinical specimens obtained from current, former and never smokers, and identified genes showing both reversible and irreversible expression changes upon smoking cessation.
Twenty-four SAGE profiles of the bronchial epithelium of eight current, twelve former and four never smokers were generated and analyzed. In total, 3,111,471 SAGE tags representing over 110 thousand potentially unique transcripts were generated, comprising the largest human SAGE study to date. We identified 1,733 constitutively expressed genes in current, former and never smoker transcriptomes. We have also identified both reversible and irreversible gene expression changes upon cessation of smoking; reversible changes were frequently associated with either xenobiotic metabolism, nucleotide metabolism or mucus secretion. Increased expression of TFF3, CABYR, and ENTPD8 were found to be reversible upon smoking cessation. Expression of GSK3B, which regulates COX2 expression, was irreversibly decreased. MUC5AC expression was only partially reversed. Validation of select genes was performed using quantitative RT-PCR on a secondary cohort of nine current smokers, seven former smokers and six never smokers.
Expression levels of some of the genes related to tobacco smoking return to levels similar to never smokers upon cessation of smoking, while expression of others appears to be permanently altered despite prolonged smoking cessation. These irreversible changes may account for the persistent lung cancer risk despite smoking cessation.
PMCID: PMC2001199  PMID: 17727719
8.  Effects of Prenatal Tobacco Exposure on Gene Expression Profiling in Umbilical Cord Tissue 
Pediatric research  2008;64(2):147-153.
Maternal smoking doubles the risk of delivering a low birth weight infant. The purpose of this study was to analyze differential gene expression in umbilical cord tissue as a function of maternal smoking, with an emphasis on growth-related genes. We recruited 15 pregnant smokers and 15 women who never smoked during pregnancy to participate. RNA was isolated from umbilical cord tissue collected and snap frozen at the time of delivery. Microarray analysis was performed using the Affymetrix GeneChip Scanner 3000. Six hundred seventy-eight probes corresponding to 545 genes were differentially expressed (i.e., had an intensity ratio > +/−1.3 and a corrected significance value p < 0.005) in tissue obtained from smokers versus nonsmokers. Genes important for fetal growth, angiogenesis, or development of connective tissue matrix were up regulated among smokers. The most highly up-regulated gene was CSH1, a somatomammotropin gene. Two other somatomammotropin genes (CSH2 and CSH-L1) were also up regulated. The most highly down-regulated gene was APOBEC3A; other down-regulated genes included those that may be important in immune and barrier protection. Validation of the three somatomammotropin genes showed a high correlation between qPCR and microarray expression. We conclude that maternal smoking may be associated with altered gene expression in the offspring.
PMCID: PMC2624573  PMID: 18437100
prenatal tobacco exposure; pregnancy; microarray; genetics
9.  Frequency and Distinctive Spectrum of KRAS Mutations in Never Smokers with Lung Adenocarcinoma 
KRAS mutations are found in ~ 25% of lung adenocarcinomas in Western countries and, as a group, have been strongly associated with cigarette smoking. These mutations are predictive of poor prognosis in resected disease as well as resistance to treatment with erlotinib or gefitinib.
Experimental Design:
We determined the frequency and type of KRAS codon 12 and 13 mutations and characterized their association with cigarette smoking history in patients with lung adenocarcinomas.
KRAS mutational analysis was performed on 482 lung adenocarcinomas, 81 (17%) of which were obtained from patients who had never smoked cigarettes. KRAS mutations were found in 15% (12/81; 95% CI 8%-24%) of tumors from never smokers. Similarly, 22% (69/316; 95% CI 17%-27%) of tumors from former smokers, and 25% (21/85; 95% CI 16%-35%) of tumors from current smokers had KRAS mutations. The frequency of KRAS mutation was not associated with age, gender, or smoking history. The number of pack years of cigarette smoking did not predict an increased likelihood of KRAS mutations. Never smokers were significantly more likely than former or current smokers to have a transition mutation (G→A) rather than the transversion mutations known to be smoking related (G→T or G→C; p<0.0001).
Based upon our data, KRAS mutations are not rare among never smokers with lung adenocarcinoma and such patients have a distinct KRAS mutation profile. The etiologic and biological heterogeneity of KRAS mutant lung adenocarcinomas is worthy of further study.
PMCID: PMC2754127  PMID: 18794081
10.  A gene expression signature from peripheral whole blood for stage I lung adenocarcinoma 
Affordable early screening in subjects with high risk of lung cancer has great potential to improve survival from this deadly disease. We measured gene expression from lung tissue and peripheral whole blood (PWB) from adenocarcinoma cases and controls to identify dysregulated lung cancer genes that could be tested in blood to improve identification of at-risk patients in the future. Genome-wide mRNA expression analysis was conducted in 153 subjects (73 adenocarcinoma cases, 80 controls) from the Environment And Genetics in Lung cancer Etiology (EAGLE) study using PWB and paired snap-frozen tumor and non-involved lung tissue samples. Analyses were conducted using unpaired t-tests, linear mixed effects and ANOVA models. The area under the receiver operating characteristic curve (AUC) was computed to assess the predictive accuracy of the identified biomarkers. We identified 50 dysregulated genes in stage I adenocarcinoma versus control PWB samples (False Discovery Rate ≤0.1, fold change ≥1.5 or ≤0.66). Among them, eight (TGFBR3, RUNX3, TRGC2, TRGV9, TARP, ACP1, VCAN, and TSTA3) differentiated paired tumor versus non-involved lung tissue samples in stage I cases, suggesting a similar pattern of lung cancer-related changes in PWB and lung tissue. These results were confirmed in two independent gene expression analyses in a blood-based case-control study (n=212) and a tumor-non tumor paired tissue study (n=54). The eight genes discriminated patients with lung cancer from healthy controls with high accuracy (AUC=0.81, 95% CI=0.74–0.87). Our finding suggests the use of gene expression from PWB for the identification of early detection markers of lung cancer in the future.
PMCID: PMC3188352  PMID: 21742797
microarray gene expression; peripheral blood; lung cancer; stage I
11.  Pathway-based identification of a smoking associated 6-gene signature predictive of lung cancer risk and survival 
Smoking is a prominent risk factor for lung cancer. However, it is not an established prognostic factor for lung cancer in clinics. To date, no gene test is available for diagnostic screening of lung cancer risk or prognostication of clinical outcome in smokers. This study sought to identify a smoking associated gene signature in order to provide a more precise diagnosis and prognosis of lung cancer in smokers.
Methods and materials
An implication network based methodology was used to identify biomarkers by modeling crosstalk with major lung cancer signaling pathways. Specifically, the methodology contains the following steps: 1) identifying genes significantly associated with lung cancer survival; 2) selecting candidate genes which are differentially expressed in smokers versus non-smokers from the survival genes identified in Step 1; 3) from these candidate genes, constructing gene coexpression networks based on prediction logic for the smoker group and the non-smoker group, respectively; 4) identifying smoking-mediated differential components, i.e., the unique gene coexpression patterns specific to each group; and 5) from the differential components, identifying genes directly co-expressed with major lung cancer signaling hallmarks.
A smoking-associated 6-gene signature was identified for prognosis of lung cancer from a training cohort (n=256). The 6-gene signature could separate lung cancer patients into two risk groups with distinct post-operative survival (log-rank P < 0.04, Kaplan-Meier analyses) in three independent cohorts (n=427). The expression-defined prognostic prediction is strongly related to smoking association and smoking cessation (P < 0.02; Pearson’s Chi-squared tests). The 6-gene signature is an accurate prognostic factor (hazard ratio = 1.89, 95% CI: [1.04, 3.43]) compared to common clinical covariates in multivariate Cox analysis. The 6-gene signature also provides an accurate diagnosis of lung cancer with an overall accuracy of 73% in a cohort of smokers (n=164). The coexpression patterns derived from the implication networks were validated with interactions reported in the literature retrieved with STRING8, Ingenuity Pathway Analysis, and Pathway Studio.
The pathway-based approach identified a smoking-associated 6-gene signature that predicts lung cancer risk and survival. This gene signature has potential clinical implications in the diagnosis and prognosis of lung cancer in smokers.
PMCID: PMC3351561  PMID: 22326768
implication networks based on prediction logic; gene coexpression networks based on formal logic; smoking; gene signature; lung cancer diagnosis and prognosis; signaling pathways
12.  Cigarette smoking and lung cancer – relative risk estimates for the major histological types from a pooled analysis of case-control studies 
Lung cancer is mainly caused by smoking, but the quantitative relations between smoking and histologic subtypes of lung cancer remain inconclusive. Using one of the largest lung cancer datasets ever assembled, we explored the impact of smoking on risks of the major cell types of lung cancer. This pooled analysis included 13,169 cases and 16,010 controls from Europe and Canada. Studies with population controls comprised 66.5% of the subjects. Adenocarcinoma (AdCa) was the most prevalent subtype in never smokers and in women. Squamous cell carcinoma (SqCC) predominated in male smokers. Age-adjusted odds ratios (ORs) were estimated with logistic regression. ORs were elevated for all metrics of exposure to cigarette smoke and were higher for SqCC and small cell lung cancer (SCLC) than for AdCa. Current male smokers with an average daily dose of >30 cigarettes had ORs of 103.5 (95% CI 74.8-143.2) for SqCC, 111.3 (95% CI 69.8-177.5) for SCLC, and 21.9 (95% CI 16.6-29.0) for AdCa. In women, the corresponding ORs were 62.7 (95% CI 31.5-124.6), 108.6 (95% CI 50.7-232.8), and 16.8 (95% CI 9.2-30.6), respectively. Whereas ORs started to decline soon after quitting, they did not fully return to the baseline risk of never smokers even 35 years after cessation. The major result that smoking exerted a steeper risk gradient on SqCC and SCLC than on AdCa is in line with previous population data and biological understanding of lung cancer development.
PMCID: PMC3296911  PMID: 22052329
cigarette smoking; lung cancer; relative risk characterization; tobacco smoke; stem cells
13.  Effects of Cigarette Smoke on the Human Oral Mucosal Transcriptome 
Use of tobacco is responsible for approximately 30% of all cancer-related deaths in the United States including cancers of the upper aerodigestive tract. In the current study, 40 current and 40 age- and gender-matched never smokers underwent buccal biopsies to evaluate the effects of smoking on the transcriptome. Microarray analyses were carried out using Affymetrix HGU 133 Plus2 arrays. Smoking altered the expression of numerous genes: 32 genes showed increased expression and 9 genes showed reduced expression in the oral mucosa of smokers vs. never smokers. Increases were found in genes involved in xenobiotic metabolism, oxidant stress, eicosanoid synthesis, nicotine signaling and cell adhesion. Increased numbers of Langerhans cells were found in the oral mucosa of smokers. Interestingly, smoking caused greater induction of aldo-keto reductases, enzymes linked to polycyclic aromatic hydrocarbon induced genotoxicity, in the oral mucosa of women than men. Striking similarities in expression changes were found in oral compared to the bronchial mucosa. The observed changes in gene expression were compared to known chemical signatures using the Connectivity Map database, and suggested that geldanamycin, an Hsp90 inhibitor, might be an anti-mimetic of tobacco smoke. Consistent with this prediction, geldanamycin caused dose-dependent suppression of tobacco smoke extract-mediated induction of CYP1A1 and CYP1B1 in vitro. Collectively, these results provide new insights into the carcinogenic effects of tobacco smoke, support the potential use of oral epithelium as a surrogate tissue in future lung cancer chemoprevention trials and illustrate the potential of computational biology to identify chemopreventive agents.
PMCID: PMC2833216  PMID: 20179299
tobacco; smoking; microarray; aryl hydrocarbon receptor; heat shock protein 90
14.  Genetic analysis of lung tumours of non-smoking subjects: p53 gene mutations are constantly associated with loss of heterozygosity at the FHIT locus. 
British Journal of Cancer  1998;78(1):73-78.
Lung cancer is strictly associated with tobacco smoking. Tumours developed in non-smoking subjects account for less than 10% of all lung cancers and show peculiar histopathological features, being prevalently adenocarcinomas. A number of genetic data suggest that their biological behaviour may be different from that of lung tumours caused by smoking, however the number of cases investigated to date is too low to draw definitive conclusions. We have examined the status of p53 and K-ras genes and the presence of loss of heterozygosity (LOH) at the FHIT locus in a series of 35 lung adenocarcinomas that developed in subjects who had never smoked. Results were compared with those obtained in a series of 35 lung adenocarcinomas from heavy-smoking subjects. In the group of non-smoking subjects p53 mutations and LOH at the FHIT locus were present in seven (20%) cases, and the two alterations were constantly associated (P < 0.0001), whereas they were not related in the series of carcinomas caused by smoking. In tumours developed in heavy-smoking subjects, the frequency of LOH at the FHIT locus was significantly higher (P = 0.006) than in tumours from non-smoking subjects. The frequency of p53 mutations in adenocarcinomas caused by smoking was not different from that seen in non-smoking subjects. However, in the group of smoking subjects we observed mostly G:C --> T:A transversions, whereas frameshift mutations and G:C --> A:T transitions were more frequently found in tumours from non-smoking subjects. No point mutations of the K-ras gene at codon 12 were seen in subjects who had never smoked, whereas they were present (mostly G:C --> T:A transversions) in 34% of tumours caused by smoking (P = 0.002). Our data suggest that lung adenocarcinomas developed in subjects who had never smoked represent a distinct biological entity involving a co-alteration of the p53 gene and the FHIT locus in 20% of cases.
PMCID: PMC2062949  PMID: 9662254
15.  Smoking and lung cancer with special regard to type of smoking and type of cancer. A case-control study in north Sweden. 
British Journal of Cancer  1986;53(5):673-681.
The aetiologic role of tobacco smoking was elucidated in a case-control study comprising 579 cases of male lung cancer registered during 1972-1977 in northern Sweden. The population aetiologic fraction attributable to smoking was about 80% in this series. Pipe smoking was as common as cigarette smoking and gave similar relative risk. The pipe smoking cases, however, had significantly higher mean age and mean smoking years at the time of diagnosis than the cigarette smoking cases. An obvious dose-response relation was found for both cigarette and pipe smoking. In ex-smokers, the relative risk gradually decreased from five years after cessation of smoking. This decrease was, however, much less pronounced in ex-pipe smokers than in ex-cigarette smokers. High relative risks were obtained for small cell and squamous cell carcinomas. For adenocarcinomas the relative risk was considerably lower but still significantly increased. Two types of controls were used, i.e. decreased and living. Comparison with living controls gave generally higher risk estimates than comparison with deceased controls.
PMCID: PMC2001370  PMID: 3013266
16.  The accumulated evidence on lung cancer and environmental tobacco smoke. 
BMJ : British Medical Journal  1997;315(7114):980-988.
OBJECTIVE: To estimate the risk of lung cancer in lifelong non-smokers exposed to environmental tobacco smoke. DESIGN: Analysis of 37 published epidemiological studies of the risk of lung cancer (4626 cases) in non-smokers who did and did not live with a smoker. The risk estimate was compared with that from linear extrapolation of the risk in smokers using seven studies of biochemical markers of tobacco smoke intake. MAIN OUTCOME MEASURE: Relative risk of lung cancer in lifelong non-smokers according to whether the spouse currently smoked or had never smoked. RESULTS: The excess risk of lung cancer was 24% (95% confidence interval 13% to 36%) in non-smokers who lived with a smoker (P < 0.001). Adjustment for the effects of bias (positive and negative) and dietary confounding had little overall effect; the adjusted excess risk was 26% (7% to 47%). The dose-response relation of the risk of lung cancer with both the number of cigarettes smoked by the spouse and the duration of exposure was significant. The excess risk derived by linear extrapolation from that in smokers was 19%, similar to the direct estimate of 26%. CONCLUSION: The epidemiological and biochemical evidence on exposure to environmental tobacco smoke, with the supporting evidence of tobacco specific carcinogens in the blood and urine of non-smokers exposed to environmental tobacco smoke, provides compelling confirmation that breathing other people's tobacco smoke is a cause of lung cancer.
PMCID: PMC2127653  PMID: 9365295
17.  Stat3 Downstream Genes Serve as Biomarkers in Human Lung Carcinomas and Chronic Obstructive Pulmonary Disease 
Smoking causes lung cancer and chronic obstructive pulmonary disease (COPD) that impose severe health problem to humans. Both diseases are related to each other and can be induced by chronic inflammation in the lung. To identify the molecular mechanism for lung cancer formation, a CCSP-rtTA/(Teto)7Stat3C bitransgenic model was generated recently. In this model, persistent activation of the Stat3 signaling pathway induced pulmonary inflammation and adenocarcinoma formation in the lung. A group of Stat3 downstream genes were identified by Affymetrix GeneChip microarray analysis that can be used as biomarkers for lung cancer diagnosis and prognosis. To determine which human lung cancers are related to the Stat3 pathway, multiple Stat3 downstream genes were screened in human lung cancers (adenocarcinomas and squamous cell carcinomas) and lung tissue with COPD. In both cancer and COPD, the Stat3 gene was up-regulated. A panel of Stat3-up-regulated downstream genes in mice was up-regulated in human adenocarcinomas, but not in human squamous cell carcinomas. This panel of genes was also modestly up-regulated in lung tissue with COPD from patients with a history of smoking and not up-regulated in those without histories of smoking. Several Stat3-down-regulated downstream genes also showed differential expression patterns in carcinoma and COPD. These studies support a concept that Stat3 is a potent oncogenic molecule that plays a role in formation of lung adenocarcinomas in both mice and humans. The carcinogenesis of adenocarcinoma and squamous cell carcinoma is mediated by different molecular mechanisms and pathways in vivo. Stat3 and its downstream genes can serve as biomarkers for lung adenocarcinoma and COPD diagnosis and prognosis in mice and humans.
PMCID: PMC2664999  PMID: 18614255
Stat3; Adenocarcinoma; Squamous Cell Carcinoma; COPD; Biomarkers; Real-Time qRT-PCR
18.  Global Changes in Gene Expression of Barrett's Esophagus Compared to Normal Squamous Esophagus and Gastric Cardia Tissues 
PLoS ONE  2014;9(4):e93219.
Barrett's esophagus (BE) is a metaplastic precursor lesion of esophageal adenocarcinoma (EA), the most rapidly increasing cancer in western societies. While the prevalence of BE is increasing, the vast majority of EA occurs in patients with undiagnosed BE. Thus, we sought to identify genes that are altered in BE compared to the normal mucosa of the esophagus, and which may be potential biomarkers for the development or diagnosis of BE.
We performed gene expression analysis using HG-U133A Affymetrix chips on fresh frozen tissue samples of Barrett's metaplasia and matched normal mucosa from squamous esophagus (NE) and gastric cardia (NC) in 40 BE patients.
Using a cut off of 2-fold and P<1.12E-06 (0.05 with Bonferroni correction), we identified 1324 differentially-expressed genes comparing BE vs NE and 649 differentially-expressed genes comparing BE vs NC. Except for individual genes such as the SOXs and PROM1 that were dysregulated only in BE vs NE, we found a subset of genes (n = 205) whose expression was significantly altered in both BE vs NE and BE vs NC. These genes were overrepresented in different pathways, including TGF-β and Notch.
Our findings provide additional data on the global transcriptome in BE tissues compared to matched NE and NC tissues which should promote further understanding of the functions and regulatory mechanisms of genes involved in BE development, as well as insight into novel genes that may be useful as potential biomarkers for the diagnosis of BE in the future.
PMCID: PMC3979678  PMID: 24714516
19.  The Impact of Cigarette Smoking on the Frequency of and Qualitative Differences in KRAS Mutations in Korean Patients with Lung Adenocarcinoma 
Yonsei Medical Journal  2013;54(4):865-874.
This study was designed to determine the relationship of cigarette smoking to the frequency and qualitative differences among KRAS mutations in lung adenocarcinomas from Korean patients.
Materials and Methods
Detailed smoking histories were obtained from 200 consecutively enrolled patients with lung adenocarcinoma according to a standard protocol. EGFR (exons 18 to 21) and KRAS (codons 12/13) mutations were determined via direct-sequencing.
The incidence of KRAS mutations was 8% (16 of 200) in patients with lung adenocarcinoma. KRAS mutations were found in 5.8% (7 of 120) of tumors from never-smokers, 15% (6 of 40) from former-smokers, and 7.5% (3 of 40) from current-smokers. The frequency of KRAS mutations did not differ significantly according to smoking history (p=0.435). Never-smokers were significantly more likely than former or current smokers to have a transition mutation (G→A or C→T) rather than a transversion mutation (G→T or G→C) that is known to be smoking-related (p=0.011). In a Cox regression model, the adjusted hazard ratios for the risk of progression with epidermal growth factor receptor-tyrosine kinase inhibitors (EGFR-TKIs) were 0.24 (95% CI, 0.14-0.42; p<0.001) for the EGFR mutation and 1.27 (95% CI, 0.58-2.79; p=0.537) for the KRAS mutation.
Cigarette smoking did not influence the frequency of KRAS mutations in lung adenocarcinomas in Korean patients, but influenced qualitative differences in the KRAS mutations.
PMCID: PMC3663229  PMID: 23709419
EGFR; KRAS; pulmonary adenocarcinoma; cigarette smoking; EGFR-tyrosine kinase inhibitors
20.  Exon-array profiling unlocks clinically and biologically relevant gene signatures from formalin-fixed paraffin-embedded tumour samples 
British Journal of Cancer  2011;104(6):971-981.
Degradation and chemical modification of RNA in formalin-fixed paraffin-embedded (FFPE) samples hamper their use in expression profiling studies. This study aimed to show that useful information can be obtained by Exon-array profiling archival FFPE tumour samples.
Nineteen cervical squamous cell carcinoma (SCC) and 9 adenocarcinoma (AC) FFPE samples (10–16-year-old) were profiled using Affymetrix Exon arrays. The gene signature derived was tested on a fresh-frozen non-small cell lung cancer (NSCLC) series. Exploration of biological networks involved gene set enrichment analysis (GSEA). Differential gene expression was confirmed using Quantigene, a multiplex bead-based alternative to qRT–PCR.
In all, 1062 genes were higher in SCC vs AC, and 155 genes higher in AC. The 1217-gene signature correctly separated 58 NSCLC into SCC and AC. A gene network centered on hepatic nuclear factor and GATA6 was identified in AC, suggesting a role in glandular cell differentiation of the cervix. Quantigene analysis of the top 26 differentially expressed genes correctly partitioned cervix samples as SCC or AC.
FFPE samples can be profiled using Exon arrays to derive gene expression signatures that are sufficiently robust to be applied to independent data sets, identify novel biology and design assays for independent platform validation.
PMCID: PMC3065290  PMID: 21407225
cervix cancer; exon array; expression profiling; FFPE; histology
21.  Molecular Epidemiology of EGFR and KRAS Mutations in 3026 Lung Adenocarcinomas: Higher Susceptibility of Women to Smoking-related KRAS-mutant Cancers 
The molecular epidemiology of most EGFR and KRAS mutations in lung cancer remains unclear.
Experimental Design
We genotyped 3026 lung adenocarcinomas for the major EGFR (exon 19 deletions and L858R) and KRAS (G12, G13) mutations and examined correlations with demographic, clinical and smoking history data.
EGFR mutations were found in 43% of never smokers (NS) and in 11% of smokers. KRAS mutations occurred in 34% of smokers and in 6% of NS. In patients with smoking histories up to 10 pack-years, EGFR predominated over KRAS. Among former smokers with lung cancer, multivariate analysis showed that, independent of pack-years, increasing smoking-free years raise the likelihood of EGFR mutation. NS were more likely than smokers to have KRAS G>A transition mutation (mostly G12D) (58% vs. 20%, p=0.0001). KRAS G12C, the most common G>T transversion mutation in smokers, was more frequent in women (p=0.007) and these women were younger than men with the same mutation (median 65 vs. 69, p=0.0008) and had smoked less.
The distinct types of KRAS mutations in smokers vs. NS suggest that most KRAS-mutant lung cancers in NS are not due to secondhand smoke exposure. The higher frequency of KRAS G12C in women, their younger age, and lesser smoking history together support a heightened susceptibility to tobacco carcinogens.
PMCID: PMC3500422  PMID: 23014527
lung cancer; tobacco; EGFR; KRAS; molecular epidemiology
22.  Lungs don’t forget: Comparison of the KRAS and EGFR mutation profile and survival of “collegiate smokers” and never smokers with advanced lung cancers 
We hypothesize that among patients with lung cancers the KRAS/EGFR mutation profile and overall survival of “collegiate smokers” (former smokers who smoked between 101 lifetime cigarettes and 5 pack years) are distinct from those of never smokers and former smokers with ≥ 15 pack years.
We collected age, sex, stage, survival, and smoking history for patients evaluated from 2004 to 2009 with advanced stage lung cancers and known KRAS/EGFR status. Mutation profile and overall survival were compared using Fisher’s exact test and log-rank test, respectively.
Data were available for 852 patients with advanced stage lung cancers with known KRAS/EGFR status. 6% were “collegiate smokers”, 36% were never smokers, and 30% were former smokers with ≥ 15 pack years. The mutation profile of “collegiate smokers” (15% KRAS mutations, 27% EGFR mutations) was distinct from those of never smokers (p < .001) and former smokers with ≥ 15 pack years (p < .001)and not significantly different from those of former smokers with 5 to 15 pack years (p = 0.9). Median overall survival for “collegiate smokers” was 25 months, compared to 32 months for never smokers (p = 0.4), 33 months for former smokers with 5–15 pack years (p = 0.48),and 21 months for former smokers with ≥ 15 pack years (p = 0.63).
“Collegiate smokers” with advanced stage lung cancers represent a distinct subgroup of patients with a higher frequency of KRAS mutations and lower frequency of EGFR mutations compared to never smokers. These observations reinforce the recommendation for routine mutation testing for all patients with lung cancers and that no degree of tobacco exposure is safe.
PMCID: PMC3534987  PMID: 23242442
Collegiate Smokers; non-small cell lung cancers; epidermal growth factor receptor mutation; KRAS mutation
23.  Development and Validation of a Prognostic Gene-Expression Signature for Lung Adenocarcinoma 
PLoS ONE  2012;7(9):e44225.
Although several prognostic signatures have been developed in lung cancer, their application in clinical practice has been limited because they have not been validated in multiple independent data sets. Moreover, the lack of common genes between the signatures makes it difficult to know what biological process may be reflected or measured by the signature. By using classical data exploration approach with gene expression data from patients with lung adenocarcinoma (n = 186), we uncovered two distinct subgroups of lung adenocarcinoma and identified prognostic 193-gene gene expression signature associated with two subgroups. The signature was validated in 4 independent lung adenocarcinoma cohorts, including 556 patients. In multivariate analysis, the signature was an independent predictor of overall survival (hazard ratio, 2.4; 95% confidence interval, 1.2 to 4.8; p = 0.01). An integrated analysis of the signature revealed that E2F1 plays key roles in regulating genes in the signature. Subset analysis demonstrated that the gene signature could identify high-risk patients in early stage (stage I disease), and patients who would have benefit of adjuvant chemotherapy. Thus, our study provided evidence for molecular basis of clinically relevant two distinct two subtypes of lung adenocarcinoma.
PMCID: PMC3436895  PMID: 22970185
24.  Smoking-induced gene expression changes in the bronchial airway are reflected in nasal and buccal epithelium 
BMC Genomics  2008;9:259.
Cigarette smoking is a leading cause of preventable death and a significant cause of lung cancer and chronic obstructive pulmonary disease. Prior studies have demonstrated that smoking creates a field of molecular injury throughout the airway epithelium exposed to cigarette smoke. We have previously characterized gene expression in the bronchial epithelium of never smokers and identified the gene expression changes that occur in the mainstem bronchus in response to smoking. In this study, we explored relationships in whole-genome gene expression between extrathorcic (buccal and nasal) and intrathoracic (bronchial) epithelium in healthy current and never smokers.
Using genes that have been previously defined as being expressed in the bronchial airway of never smokers (the "normal airway transcriptome"), we found that bronchial and nasal epithelium from non-smokers were most similar in gene expression when compared to other epithelial and nonepithelial tissues, with several antioxidant, detoxification, and structural genes being highly expressed in both the bronchus and nose. Principle component analysis of previously defined smoking-induced genes from the bronchus suggested that smoking had a similar effect on gene expression in nasal epithelium. Gene set enrichment analysis demonstrated that this set of genes was also highly enriched among the genes most altered by smoking in both nasal and buccal epithelial samples. The expression of several detoxification genes was commonly altered by smoking in all three respiratory epithelial tissues, suggesting a common airway-wide response to tobacco exposure.
Our findings support a relationship between gene expression in extra- and intrathoracic airway epithelial cells and extend the concept of a smoking-induced field of injury to epithelial cells that line the mouth and nose. This relationship could potentially be utilized to develop a non-invasive biomarker for tobacco exposure as well as a non-invasive screening or diagnostic tool providing information about individual susceptibility to smoking-induced lung diseases.
PMCID: PMC2435556  PMID: 18513428
25.  Increased miR-708 Expression in NSCLC and Its Association with Poor Survival in Lung Adenocarcinoma from Never Smokers 
MicroRNA plays an important role in human diseases and cancer. We seek to investigate the expression status, clinical relevance, and functional role of microRNA in non-small cell lung cancer.
Experimental Design
We performed miRNA expression profiling in matched lung adenocarcinoma and uninvolved lung using 56 pairs of fresh-frozen (FF) and 47 pairs of formalin-fixed, paraffin-embedded (FFPE) samples from never smokers. The most differentially expressed miRNA genes were evaluated by Cox analysis and Log-Rank test. Among the best candidate, miR-708 was further examined for differential expression in two independent cohorts. Functional significance of miR-708 expression in lung cancer was examined by identifying its candidate mRNA target and through manipulating its expression levels in cultured cells.
Among the 20 miRNAs most differentially expressed between tested tumor and normal samples, high expression level of miR-708 in the tumors was most strongly associated with an increased risk of death after adjustments for all clinically significant factors including age, sex, and tumor stage (FF cohort: HR, 1.90; 95% CI, 1.08-3.35; P=.025 and FFPE cohort: HR, 1.93; 95% CI, 1.02-3.63; P=.042). The transcript for TMEM88 gene has a miR-708 binding site in its 3′ UTR and was significantly reduced in tumors high of miR-708. Forced miR-708 expression reduced TMEM88 transcript levels and increased the rate of cell proliferation, invasion, and migration in culture.
MicroRNA-708 acts as an oncogene contributing to tumor growth and disease progression by directly down regulating TMEM88, a negative regulator of the Wnt signaling pathway in lung cancer.
PMCID: PMC3616503  PMID: 22573352
NSCLC; adenocarcinoma; miR-708; never smoker; survival; TMEM88; Wnt signaling

