In the work of Chari et al. entitled "Effect of active smoking on the human bronchial epithelium transcriptome" the authors use SAGE to identify candidate gene expression changes in bronchial brushings from never, former, and current smokers. These gene expression changes are categorized into those that are reversible or irreversible upon smoking cessation. A subset of these identified genes is validated on an independent cohort using RT-PCR. The authors conclude that their results support the notion of gene expression changes in the lungs of smokers which persist even after an individual has quit.
This correspondence raises questions about the validity of the approach used by the authors to analyze their data. The majority of the reported results suffer deficiencies due to the methods used. The most fundamental of these are explained in detail: biases introduced during data processing, lack of correction for multiple testing, and an incorrect use of clustering for gene discovery. A randomly generated "null" dataset is used to show the consequences of these shortcomings.
Most of Chari et al.'s findings are consistent with what would be expected by chance alone. Although there is clear evidence of reversible changes in gene expression, the majority of those identified appear to be false positives. However, contrary to the authors' claims, no irreversible changes were identified. There is a broad consensus that genetic change due to smoking persists once an individual has quit smoking; unfortunately, this study lacks sufficient scientific rigour to support or refute this hypothesis or identify any specific candidate genes. The pitfalls of large-scale analysis, as exemplified here, may not be unique to Chari et al.
Cigarette smoking is a leading cause of preventable death and a significant cause of lung cancer and chronic obstructive pulmonary disease. Prior studies have demonstrated that smoking creates a field of molecular injury throughout the airway epithelium exposed to cigarette smoke. We have previously characterized gene expression in the bronchial epithelium of never smokers and identified the gene expression changes that occur in the mainstem bronchus in response to smoking. In this study, we explored relationships in whole-genome gene expression between extrathorcic (buccal and nasal) and intrathoracic (bronchial) epithelium in healthy current and never smokers.
Using genes that have been previously defined as being expressed in the bronchial airway of never smokers (the "normal airway transcriptome"), we found that bronchial and nasal epithelium from non-smokers were most similar in gene expression when compared to other epithelial and nonepithelial tissues, with several antioxidant, detoxification, and structural genes being highly expressed in both the bronchus and nose. Principle component analysis of previously defined smoking-induced genes from the bronchus suggested that smoking had a similar effect on gene expression in nasal epithelium. Gene set enrichment analysis demonstrated that this set of genes was also highly enriched among the genes most altered by smoking in both nasal and buccal epithelial samples. The expression of several detoxification genes was commonly altered by smoking in all three respiratory epithelial tissues, suggesting a common airway-wide response to tobacco exposure.
Our findings support a relationship between gene expression in extra- and intrathoracic airway epithelial cells and extend the concept of a smoking-induced field of injury to epithelial cells that line the mouth and nose. This relationship could potentially be utilized to develop a non-invasive biomarker for tobacco exposure as well as a non-invasive screening or diagnostic tool providing information about individual susceptibility to smoking-induced lung diseases.
Serial Analysis of Gene Expression (SAGE) is becoming a widely
used gene expression profiling method for the study of development,
cancer and other human diseases. Investigators using SAGE rely heavily
on the quantitative aspect of this method for cataloging gene expression
and comparing multiple SAGE libraries. We have developed additional
computational and statistical tools to assess the quality and reproducibility
of a SAGE library. Using these methods, a critical variable in the
SAGE protocol was identified that has the potential to bias the
Tag distribution relative to the GC content of the 10 bp SAGE Tag
DNA sequence. We also detected this bias in a number of publicly
available SAGE libraries. It is important to note that the GC content bias
went undetected by quality control procedures in the current SAGE
protocol and was only identified with the use of these statistical
analyses on as few as 750 SAGE Tags. In addition to keeping any
solution of free DiTags on ice, an analysis of the GC content should
be performed before sequencing large numbers of SAGE Tags to be
confident that SAGE libraries are free from experimental bias.
More than half of the approximately 500,000 women diagnosed with cervical cancer worldwide each year will die from this disease. Investigation of genes expressed in precancer lesions compared to those expressed in normal cervical epithelium will yield insight into the early stages of disease. As such, establishing a baseline from which to compare to, is critical in elucidating the abnormal biology of disease. In this study we examine the normal cervical tissue transcriptome and investigate the similarities and differences in relation to CIN III by Long-SAGE (L-SAGE).
We have sequenced 691,390 tags from four L-SAGE libraries increasing the existing gene expression data on cervical tissue by 20 fold. One-hundred and eighteen unique tags were highly expressed in normal cervical tissue and 107 of them mapped to unique genes, most belong to the ribosomal, calcium-binding and keratinizing gene families. We assessed these genes for aberrant expression in CIN III and five genes showed altered expression. In addition, we have identified twelve unique HPV 16 SAGE tags in the CIN III libraries absent in the normal libraries.
Establishing a baseline of gene expression in normal cervical tissue is key for identifying changes in cancer. We demonstrate the utility of this baseline data by identifying genes with aberrant expression in CIN III when compared to normal tissue.
Cigarette smoke creates a molecular field of injury in epithelial cells that line the respiratory tract. We hypothesized that transcriptome sequencing (RNA-Seq) will enhance our understanding of the field of molecular injury in response to tobacco smoke exposure and lung cancer pathogenesis by identifying gene expression differences not interrogated or accurately measured by microarrays. We sequenced the high-molecular-weight fraction of total RNA (>200 nt) from pooled bronchial airway epithelial cell brushings (n = 3 patients per pool) obtained during bronchoscopy from healthy never smoker (NS) and current smoker (S) volunteers and smokers with (C) and without (NC) lung cancer undergoing lung nodule resection surgery. RNA-Seq libraries were prepared using 2 distinct approaches, one capable of capturing non-polyadenylated RNA (the prototype NuGEN Ovation RNA-Seq protocol) and the other designed to measure only polyadenylated RNA (the standard Illumina mRNA-Seq protocol) followed by sequencing generating approximately 29 million 36 nt reads per pool and approximately 22 million 75 nt paired-end reads per pool, respectively. The NuGEN protocol captured additional transcripts not detected by the Illumina protocol at the expense of reduced coverage of polyadenylated transcripts, while longer read lengths and a paired-end sequencing strategy significantly improved the number of reads that could be aligned to the genome. The aligned reads derived from the two complementary protocols were used to define the compendium of genes expressed in the airway epithelium (n = 20,573 genes). Pathways related to the metabolism of xenobiotics by cytochrome P450, retinol metabolism, and oxidoreductase activity were enriched among genes differentially expressed in smokers, whereas chemokine signaling pathways, cytokine–cytokine receptor interactions, and cell adhesion molecules were enriched among genes differentially expressed in smokers with lung cancer. There was a significant correlation between the RNA-Seq gene expression data and Affymetrix microarray data generated from the same samples (P < 0.001); however, the RNA-Seq data detected additional smoking- and cancer-related transcripts whose expression was were either not interrogated by or was not found to be significantly altered when using microarrays, including smoking-related changes in the inflammatory genes S100A8 and S100A9 and cancer-related changes in MUC5AC and secretoglobin (SCGB3A1). Quantitative real-time PCR confirmed differential expression of select genes and non-coding RNAs within individual samples. These results demonstrate that transcriptome sequencing has the potential to provide new insights into the biology of the airway field of injury associated with smoking and lung cancer. The measurement of both coding and non-coding transcripts by RNA-Seq has the potential to help elucidate mechanisms of response to tobacco smoke and to identify additional biomarkers of lung cancer risk and novel targets for chemoprevention.
Serial Analysis of Gene Expression (SAGE) is a powerful tool to determine gene expression profiles. Two types of SAGE libraries, ShortSAGE and LongSAGE, are classified based on the length of the SAGE tag (10 vs. 17 basepairs). LongSAGE libraries are thought to be more useful than ShortSAGE libraries, but their information content has not been widely compared. To dissect the differences between these two types of libraries, we utilized four libraries (two LongSAGE and two ShortSAGE libraries) generated from the hippocampus of Alzheimer and control samples. In addition, we generated two additional short SAGE libraries, the truncated long SAGE libraries (tSAGE), from LongSAGE libraries by deleting seven 5' basepairs from each LongSAGE tag.
One problem that occurred in the SAGE study is that individual tags may have matched to multiple different genes – due to the short length of a tag. We found that the LongSAGE tag maps up to 15 UniGene clusters, while the ShortSAGE and tSAGE tags map up to 279 UniGene clusters. Both long and short SAGE libraries exhibit a large number of orphan tags (no gene information in UniGene), implying the limitation of the UniGene database. Among 100 orphan LongSAGE tags, the complete sequences (17 basepairs) of nine orphan tags match to 17 genomic sequences; four of the orphan tags match to a single genomic sequence. Our data show the potential to resolve 4–9% of orphan LongSAGE tags. Finally, among 400 tSAGE tags showing significant differential expression between AD and control, 79 tags (19.8%) were derived from multiple non-significant LongSAGE tags, implying the false positive results.
Our data show that LongSAGE tags have high specificity in gene mapping compared to ShortSAGE tags. LongSAGE tags show an advantage over ShortSAGE in identifying novel genes by BLAST analysis. Most importantly, the chances of obtaining false positive results are higher for ShortSAGE than LongSAGE libraries due to their specificity in gene mapping. Therefore, it is recommended that the number of corresponding UniGene clusters (gene or ESTs) of a tag for prioritizing the significant results be considered.
Non-small cell lung cancer (NSCLC) presents as a progressive disease spanning precancerous, preinvasive, locally invasive, and metastatic lesions. Identification of biological pathways reflective of these progressive stages, and aberrantly expressed genes associated with these pathways, would conceivably enhance therapeutic approaches to this devastating disease.
Through the construction and analysis of SAGE libraries, we have determined transcriptome profiles for preinvasive carcinoma-in-situ (CIS) and invasive squamous cell carcinoma (SCC) of the lung, and compared these with expression profiles generated from both bronchial epithelium, and precancerous metaplastic and dysplastic lesions using Ingenuity Pathway Analysis. Expression of genes associated with epidermal development, and loss of expression of genes associated with mucociliary biology, are predominant features of CIS, largely shared with precancerous lesions. Additionally, expression of genes associated with xenobiotic metabolism/detoxification is a notable feature of CIS, and is largely maintained in invasive cancer. Genes related to tissue fibrosis and acute phase immune response are characteristic of the invasive SCC phenotype. Moreover, the data presented here suggests that tissue remodeling/fibrosis is initiated at the early stages of CIS. Additionally, this study indicates that alteration in copy-number status represents a plausible mechanism for differential gene expression in CIS and invasive SCC.
This study is the first report of large-scale expression profiling of CIS of the lung. Unbiased expression profiling of these preinvasive and invasive lesions provides a platform for further investigations into the molecular genetic events relevant to early stages of squamous NSCLC development. Additionally, up-regulated genes detected at extreme differences between CIS and invasive cancer may have potential to serve as biomarkers for early detection.
To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE).
Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR.
Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified.
The EyeSAGE database, combining three different gene-profiling platforms including the authors’ multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions.
Oligonucleotide microarray analysis revealed 175 genes that are differentially expressed in large airway epithelial cells of people who currently smoke compared with those who never smoked, with 28 classified as irreversible, 6 as slowly reversible, and 139 as rapidly reversible.
Tobacco use remains the leading preventable cause of death in the US. The risk of dying from smoking-related diseases remains elevated for former smokers years after quitting. The identification of irreversible effects of tobacco smoke on airway gene expression may provide insights into the causes of this elevated risk.
Using oligonucleotide microarrays, we measured gene expression in large airway epithelial cells obtained via bronchoscopy from never, current, and former smokers (n = 104). Linear models identified 175 genes differentially expressed between current and never smokers, and classified these as irreversible (n = 28), slowly reversible (n = 6), or rapidly reversible (n = 139) based on their expression in former smokers. A greater percentage of irreversible and slowly reversible genes were down-regulated by smoking, suggesting possible mechanisms for persistent changes, such as allelic loss at 16q13. Similarities with airway epithelium gene expression changes caused by other environmental exposures suggest that common mechanisms are involved in the response to tobacco smoke. Finally, using irreversible genes, we built a biomarker of ever exposure to tobacco smoke capable of classifying an independent set of former and current smokers with 81% and 100% accuracy, respectively.
We have categorized smoking-related changes in airway gene expression by their degree of reversibility upon smoking cessation. Our findings provide insights into the mechanisms leading to reversible and persistent effects of tobacco smoke that may explain former smokers increased risk for developing tobacco-induced lung disease and provide novel targets for chemoprophylaxis. Airway gene expression may also serve as a sensitive biomarker to identify individuals with past exposure to tobacco smoke.
SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers.
To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique “multi-pool method” that analyzes multiple pools of pair-wise case controls individually. When all the settings are in “inclusion”, the common SAGE tag sequences are mined. When one tissue type is in “inclusion” and the other types of tissues are not in “inclusion”, the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries.
The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing
Serial Analysis of Gene Expression (SAGE) is a new technique that allows a detailed and profound quantitative and qualitative knowledge of gene expression profile, without previous knowledge of sequence of analyzed genes. We carried out a modification of SAGE methodology (microSAGE), useful for the analysis of limited quantities of tissue samples, on normal human cervical tissue obtained from a donor without histopathological lesions. Cervical epithelium is constituted mainly by cervical keratinocytes which are the targets of human papilloma virus (HPV), where persistent HPV infection of cervical epithelium is associated with an increase risk for developing cervical carcinomas (CC).
We report here a transcriptome analysis of cervical tissue by SAGE, derived from 30,418 sequenced tags that provide a wealth of information about the gene products involved in normal cervical epithelium physiology, as well as genes not previously found in uterine cervix tissue involved in the process of epidermal differentiation.
This first comprehensive and profound analysis of uterine cervix transcriptome, should be useful for the identification of genes involved in normal cervix uterine function, and candidate genes associated with cervical carcinoma.
Tobacco smoking is responsible for over 90% of lung cancer cases, and yet the precise molecular alterations induced by smoking in lung that develop into cancer and impact survival have remained obscure.
We performed gene expression analysis using HG-U133A Affymetrix chips on 135 fresh frozen tissue samples of adenocarcinoma and paired noninvolved lung tissue from current, former and never smokers, with biochemically validated smoking information. ANOVA analysis adjusted for potential confounders, multiple testing procedure, Gene Set Enrichment Analysis, and GO-functional classification were conducted for gene selection. Results were confirmed in independent adenocarcinoma and non-tumor tissues from two studies. We identified a gene expression signature characteristic of smoking that includes cell cycle genes, particularly those involved in the mitotic spindle formation (e.g., NEK2, TTK, PRC1). Expression of these genes strongly differentiated both smokers from non-smokers in lung tumors and early stage tumor tissue from non-tumor tissue (p<0.001 and fold-change >1.5, for each comparison), consistent with an important role for this pathway in lung carcinogenesis induced by smoking. These changes persisted many years after smoking cessation. NEK2 (p<0.001) and TTK (p = 0.002) expression in the noninvolved lung tissue was also associated with a 3-fold increased risk of mortality from lung adenocarcinoma in smokers.
Our work provides insight into the smoking-related mechanisms of lung neoplasia, and shows that the very mitotic genes known to be involved in cancer development are induced by smoking and affect survival. These genes are candidate targets for chemoprevention and treatment of lung cancer in smokers.
Cystatin A (gene: CSTA), is up-regulated in non-small-cell lung cancer(NSCLC) and dysplastic vs normal human bronchial epithelium. In the context that chronic obstructive pulmonary disease (COPD), a small airway epithelium (SAE) disorder, is independently associated with NSCLC(especially squamous cell carcinoma, SCC), but only occurs in a subset of smokers, we hypothesized that genetic variation, smoking and COPD modulate CSTA gene expression levels in SAE, with further up-regulation in SCC. Gene expression was assessed by microarray in SAE of 178 individuals [healthy nonsmokers (n=60), healthy smokers (n=82), and COPD smokers (n=36)], with corresponding large airway epithelium (LAE) data in a subset (n=52). Blood DNA was genotyped by SNP microarray. Twelve SNPs upstream of the CSTA gene were all significantly associated with CSTA SAE gene expression(p<0.04 to 5 × 10 −4). CSTA gene expression levels in SAE were higher in COPD smokers (28.4 ± 2.0) than healthy smokers (19.9 ± 1.4, p<10−3), who in turn had higher levels than nonsmokers(16.1 ± 1.1, p<0.04). CSTA LAE gene expression was also smoking-responsive (p<10−3). Using comparable publicly available NSCLC expression data, CSTA was up-regulated in SCC vs LAE (p<10−2) and down-regulated in adenocarcinoma vs SAE (p <10−7). All phenotypes were associated with significantly different proportional gene expression of CSTA to cathepsins. The data demonstrate that regulation of CSTA expression in human airway epithelium is influenced by genetic variability, smoking, and COPD, and is further up-regulated in SCC, all of which should be taken into account when considering the role of CSTA in NSCLC pathogenesis.
cystatin; small airway epithelium; gene expression; genotype; COPD
Prior microarray studies of smokers at high risk for lung cancer have demonstrated that heterogeneity in bronchial airway epithelial cell gene expression response to smoking can serve as an early diagnostic biomarker for lung cancer. As a first step in applying functional genomic analysis to population studies, we have examined the relationship between gene expression variation and genetic variation in a central molecular pathway (NRF2-mediated antioxidant response) associated with smoking exposure and lung cancer. We assessed global gene expression in histologically normal airway epithelial cells obtained at bronchoscopy from smokers who developed lung cancer (SC, n = 20), smokers without lung cancer (SNC, n = 24), and never smokers (NS, n = 8). Functional enrichment analysis showed that the NRF2-mediated, antioxidant response element (ARE)-regulated genes, were significantly lower in SC, when compared with expression levels in SNC. Importantly, we found that the expression of MAFG (a binding partner of NRF2) was correlated with the expression of ARE genes, suggesting MAFG levels may limit target gene induction. Bioinformatically we identified single nucleotide polymorphisms (SNPs) in putative ARE genes and to test the impact of genetic variation, we genotyped these putative regulatory SNPs and other tag SNPs in selected NRF2 pathway genes. Sequencing MAFG locus, we identified 30 novel SNPs and two were associated with either gene expression or lung cancer status among smokers. This work demonstrates an analysis approach that integrates bioinformatics pathway and transcription factor binding site analysis with genotype, gene expression and disease status to identify SNPs that may be associated with individual differences in gene expression and/or cancer status in smokers. These polymorphisms might ultimately contribute to lung cancer risk via their effect on the airway gene expression response to tobacco-smoke exposure.
Use of tobacco is responsible for approximately 30% of all cancer-related deaths in the United States including cancers of the upper aerodigestive tract. In the current study, 40 current and 40 age- and gender-matched never smokers underwent buccal biopsies to evaluate the effects of smoking on the transcriptome. Microarray analyses were carried out using Affymetrix HGU 133 Plus2 arrays. Smoking altered the expression of numerous genes: 32 genes showed increased expression and 9 genes showed reduced expression in the oral mucosa of smokers vs. never smokers. Increases were found in genes involved in xenobiotic metabolism, oxidant stress, eicosanoid synthesis, nicotine signaling and cell adhesion. Increased numbers of Langerhans cells were found in the oral mucosa of smokers. Interestingly, smoking caused greater induction of aldo-keto reductases, enzymes linked to polycyclic aromatic hydrocarbon induced genotoxicity, in the oral mucosa of women than men. Striking similarities in expression changes were found in oral compared to the bronchial mucosa. The observed changes in gene expression were compared to known chemical signatures using the Connectivity Map database, and suggested that geldanamycin, an Hsp90 inhibitor, might be an anti-mimetic of tobacco smoke. Consistent with this prediction, geldanamycin caused dose-dependent suppression of tobacco smoke extract-mediated induction of CYP1A1 and CYP1B1 in vitro. Collectively, these results provide new insights into the carcinogenic effects of tobacco smoke, support the potential use of oral epithelium as a surrogate tissue in future lung cancer chemoprevention trials and illustrate the potential of computational biology to identify chemopreventive agents.
tobacco; smoking; microarray; aryl hydrocarbon receptor; heat shock protein 90
Why do we observe a wage differential between smokers and non-smokers? Pooling reports of current and prior smoking activity across 15 years from the Panel Study of Income Dynamics (PSID) allows the reconstruction of individual smoking histories. Dividing the sample into smoking history groups, the four largest of which are: persistent smokers, never smokers, former smokers, and future quitters reveals that there is no observed wage gap between former smokers and those who have never smoked. There is, however, a wage gap between those smokers who will continue smoking and three other groups of individuals: (1) those smokers who will quit smoking in the future, (2) those smokers who have quit smoking already, and (3) those who never smoked. The wage gap between smokers and non-smokers, observed in the 1986 cross-section, is largely driven by those who persist as smokers, 1986–2001. These results support the hypothesis that the cross-sectional wage differential is not driven by smoking per se, but may be driven by a non-causal explanation. One plausible interpretation is that a common factor such as myopia, leads to reduced investment in both health capital or firm-specific or other human capital.
smoking; wages; health capital
Genetically engineered mice cancer models are among the most useful tools for testing the in vivo effectiveness of the various chemopreventive approaches. The p53-null mouse model of mammary carcinogenesis was previously characterized by us at the cellular, molecular, and pathological levels. In a companion article, Medina et al. (2009) analyzed the efficacy of bexarotene, gefitinib, and celecoxib as chemopreventive agents in the same model. Here we report the global gene expression effects on mammary epithelium of such compounds, analyzing the data in light of their effectiveness as chemopreventive agents. SAGE was used to profile the transcriptome of p53 null mammary epithelium obtained from mice treated with each compound Vs controls. This information was also compared with SAGE data from p53-null mouse mammary tumors. Gene expression changes induced by the chemopreventive treatments revealed a common core of 87 affected genes across treatments (p<0.05). The effective compounds, bexarotene and gefitinib may at least in part exert their chemopreventive activity by affecting a set of 34 genes related to specific cellular pathways. The gene expression signature revealed various genes previously described to be associated with breast cancer, such as, the AP-1 complex member Fos like antigen 2, Early growth response1, Gelsolin and Tumor protein translationally-controlled 1, among others. The concerted modulation of many of these transcripts prior to malignant transformation appears conducive to predominantly decrease cell proliferation. This study has revealed candidate key pathways that can be experimentally tested in the same model system and may constitute novel targets for future translational research.
Chemoprevention; gene expression profile; SAGE; Bexarotene; Gefitinib; Celecoxib
Lung cancer is the leading cause of cancer death in the United States, and the majority of diagnoses are made in former smokers. Although avoidance of tobacco abuse and smoking cessation clearly will have the greatest impact on lung cancer development, effective chemoprevention could prove to be more effective than treatment of established, advanced-stage disease. Chemoprevention is the use of dietary or pharmaceutical agents to reverse or block the carcinogenic process and has been successfully applied to common malignancies other than lung (including recent reports on the prevention of breast cancer in high-risk individuals). Despite previous studies in lung cancer chemoprevention failing to identify effective agents, our ability to define the highest-risk populations and the understanding of lung tumor and premalignant biology continue to make advances. Squamous cell carcinogenesis in the bronchial epithelium starts with normal epithelium and progresses through hyperplasia, metaplasia, dysplasia, and carcinoma in situ to invasive cancer. Precursor lesions also have been identified for adenocarcinoma, and these premalignant lesions are targeted by chemopreventive agents in current and future trials. Chemopreventive agents can currently only be recommended as part of well-designed clinical trials, and multiple trials have recently been completed or are enrolling subjects.
lung cancer; chemoprevention; premalignant dysplasia; biomarkers; clinical trials
Serial analysis of gene expression (SAGE) is a widely used and powerful technique to characterize and compare transcriptomes. Although several modifications have been proposed to the initial protocol with the aim of reducing the amount of starting material, unless additional PCR steps are added, the technique is still limited by the need for at least 1 µg of total RNA. As extra PCR amplification might introduce representation biases, current SAGE protocols are not fully suitable for the study of small, microdissected tissue samples. We propose here an alternative method involving the linear amplification of small mRNA fragments containing the SAGE tags. The procedure allows preparation of libraries of over 100 000 tags from as few as 2500 cells. A satisfactory correlation was observed between a microSAGE library made from 5 µg of total thyroid RNA, and a library prepared from 50 ng of the same RNA preparation according to the present protocol.
Idiopathic pulmonary fibrosis (IPF) is a progressive, chronic interstitial lung disease that is unresponsive to current therapy and often leads to death. However, the rate of disease progression differs among patients. We hypothesized that comparing the gene expression profiles between patients with stable disease and those in which the disease progressed rapidly will lead to biomarker discovery and contribute to the understanding of disease pathogenesis.
Methodology and Principal Findings
To begin to address this hypothesis, we applied Serial Analysis of Gene Expression (SAGE) to generate lung expression profiles from diagnostic surgical lung biopsies in 6 individuals with relatively stable (or slowly progressive) IPF and 6 individuals with progressive IPF (based on changes in DLCO and FVC over 12 months). Our results indicate that this comprehensive lung IPF SAGE transcriptome is distinct from normal lung tissue and other chronic lung diseases. To identify candidate markers of disease progression, we compared the IPF SAGE profiles in stable and progressive disease, and identified a set of 102 transcripts that were at least 5-fold up regulated and a set of 89 transcripts that were at least 5-fold down regulated in the progressive group (P-value≤0.05). The over expressed genes included surfactant protein A1, two members of the MAPK-EGR-1-HSP70 pathway that regulate cigarette-smoke induced inflammation, and Plunc (palate, lung and nasal epithelium associated), a gene not previously implicated in IPF. Interestingly, 26 of the up regulated genes are also increased in lung adenocarcinomas and have low or no expression in normal lung tissue. More importantly, we defined a SAGE molecular expression signature of 134 transcripts that sufficiently distinguished relatively stable from progressive IPF.
These findings indicate that molecular signatures from lung parenchyma at the time of diagnosis could prove helpful in predicting the likelihood of disease progression or possibly understanding the biological activity of IPF.
Serial Analysis of Gene Expression (SAGE) is a powerful expression profiling method, allowing the analysis of the expression of thousands of transcripts simultaneously. A disadvantage of the method, however, is the relatively high amount of input RNA required. Consequently, SAGE cannot be used for the generation of expression profiles when RNA is limited, i.e. in small biological samples such as tissue biopsies or microdissected material. Here we describe a modification of SAGE, named microSAGE, which requires 500- to 5000-fold less starting material. Compared with SAGE, microSAGE is simplified due to incorporation of a 'single-tube' procedure for all steps from RNA isolation to tag release. Furthermore, a limited number of additional PCR cycles are performed. Using microSAGE gene expression profiles can be obtained from minute quantities of tissue such as a single hippocampal punch from a rat brain slice of 325 micrometers thickness, estimated to contain, at most, 10(5) cells. This method opens up a multitude of new possibilities for the application of SAGE, for example the characterization of expression profiles in tissue biopsies, tumor metastases or in other cases where tissue is scarce and the generation of region-specific expression profiles of complex heterogeneous tissues.
To compare full transcriptome expression levels of matched tumor and normal samples from patients with oropharyngeal carcinoma stratified by known tumor etiologic factors.
Patients and Methods
Full transcriptome sequencing was analyzed for 10 matched tumor and normal tissue samples from patients with previously untreated oropharyngeal carcinoma. Transcriptomes were analyzed using massively parallel messenger RNA sequencing and validated using the NanoString nCounter system. Global gene expression levels were compared in samples grouped by smoking status and human papillomavirus status. This study was completed between June 10, 2010, and June 30, 2011.
Global gene expression analysis indicated tumor tissue from former smokers grouped more closely to the never smokers than the current smokers. Pathway analysis revealed alterations in the expression of genes involved in the p53 DNA damage-repair pathway, including CHEK2 and ATR, which display patterns of increased expression that is associated with human papillomavirus–negative current smokers rather than former or never smokers.
These findings support the application of messenger RNA sequencing technology as an important clinical tool for more accurately stratifying patients based on individual tumor biology with the goal of improving our understanding of tumor prognosis and treatment response, ultimately leading to individualized patient care strategies.
Melanoma antigens (MAGE) are frequently expressed in lung cancer and are promising targets of anticancer immunotherapy. Our preliminary data suggested that MAGE may be expressed during early lung carcinogenesis, raising the possibility of targeting MAGE as a lung cancer prevention strategy. The purpose of this study was to investigate MAGE activation patterns in the airways of chronic smokers without lung cancer. MAGE-A1, -A3 and -B2 gene expression was determined in bronchial brush cells from chronic former smokers without lung cancer by reverse transcription-PCR (RT-PCR). The results were correlated with clinical parameters. The 123 subjects had a median age of 57 years, a median of 40 pack-years smoking history, and had quit smoking for at least one year prior to enrollment. Among the subjects, 31 (25%), 38 (31%), and 46 (37%) had detectable MAGE-A1, -A3 and -B2 expression, respectively, in their bronchial brush samples. Expression of MAGE-A1 and -B2 positively correlated with pack-years smoking history (P=0.03 and 0.03, respectively). The frequency of expression did not decrease despite a prolonged smoking cessation period. In conclusion, MAGE-A1, -A3 and -B2 genes are frequently expressed in the bronchial epithelial cells of chronic smokers without lung cancer, suggesting that chronic exposure to cigarette smoke activates these genes even before the malignant transformation of bronchial cells in susceptible individuals. Once activated, the expression persists despite long-term smoking cessation. These data support the targeting of MAGE as a novel lung cancer prevention strategy.
melanoma antigens; airway; smokers; lung cancer; prevention
To examine if the risk of lung cancer declines with increasing time since ceasing exposure to asbestos and quitting smoking, and to determine the relative asbestos effect between non‐smokers and current smokers.
A cohort study of 2935 former workers of the crocidolite mine and mill at Wittenoom, who responded to a questionnaire on smoking first issued in 1979 and on whom quantitative estimates of asbestos exposure are known. Conditional logistic regression was used to relate asbestos exposure, smoking category, and risk of lung cancer.
Eighteen per cent of the cohort reported never smoking; 66% of cases and 50% of non‐cases were current smokers. Past smokers who ceased smoking within six years of the survey (OR = 22.1, 95% CI 5.6 to 87.0), those who ceased smoking 20 or more years before the survey (OR = 1.9, 95% CI 0.50 to 7.2), and current smokers (<20 cigarettes per day (OR = 6.8, 95% CI 2.0 to 22.7) or >20 cigarettes per day (OR = 13.2, 95% CI 4.1 to 42.5)) had higher risks of lung cancer compared to never smokers after adjusting for asbestos exposure and age. The asbestos effect between non‐smokers and current smokers was 1.23 (95% CI 0.35 to 4.32).
Persons exposed to asbestos and tobacco but who subsequently quit, remain at an increased risk for lung cancer up to 20 years after smoking cessation, compared to never smokers. Although the relative risk of lung cancer appears higher in never and ex‐smokers than in current smokers, those who both smoke and have been exposed to asbestos have the highest risk; this study emphasises the importance of smoking prevention and smoking cessation programmes within this high risk cohort.
smoking cessation; asbestos; lung cancer; relative asbestos effect
"Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered.
In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method.
We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method.