SAGE offers an unbiased and comprehensive approach to expression profiling, limited only by the depth of sequencing chosen by the researcher, and offers an unprecedented opportunity for transcript discovery. This is in sharp contrast to microarray analysis, where expression profile and scope of analysis is predetermined by target design typically employing a limited number of the most commonly characterized genes 
. In a previous study, we described the transcriptomes of smoke-damaged bronchial epithelium and lung parenchyma by way of SAGE 
. Here we extend our analysis to the largely unexplored area of early-stage lung cancer development, and present the first report of large-scale gene expression profiling of carcinoma-in-situ (CIS) of the lung. An understanding of the molecular genetics governing the preinvasive stages is critical to facilitate early detection and immediate therapeutic intervention before progression to invasive cancer ensues. In the current study, we present a comparative analyses, with emphasis on genes over-expressed in CIS and invasive cancer transcriptomes, relative to non-cancerous transcriptomes of the lung including bronchial epithelium (BE), and precancerous lesions (PC: squamous metaplasia and dysplasia).
Twenty-seven lung SAGE libraries comprised of 3,997,729 total sequence tags (~40 megabases of high quality DNA sequence) were analyzed in this study. Normal lung is represented by 14 bronchial epithelial libraries (BE-1 through BE-14) 
. Precancer stage is represented by two libraries derived from squamous metaplasia (Met) and squamous dysplasia (Dys). Squamous cell carcinoma of the lung is represented by five carcinoma in-situ libraries (CIS-1 through CIS-5), and six invasive carcinoma libraries (SCC-1 through SCC-6) (detailed in and ). (It is noted that specimens comprising the BE, PC, CIS, and SCC datasets were from a mixed population of current and former smokers.) This data has identified greater than 129,000 unique sequence tags/potential transcripts in CIS lesions, and nearly 140,000 unique sequence tags/potential transcripts in invasive squamous NSCLC.
Summary of SAGE libraries generated and tags sequenced.
Analysis of the top 300 most abundant tags in BE, CIS and SCC SAGE datasets
Cluster analysis yielded anticipated grouping of SAGE libraries, attesting to sample quality. For this analysis, the 300 most abundant tags were retained from each library, yielding a merged list of 1128 unique tags. Average linkage clustering analysis based on the 1128 most abundant SAGE tags, reveals that all cancer libraries (both CIS and invasive SCC) cluster together, and separately from the BE libraries (). We note some clustering of the invasive SCC libraries (four out of six). Similar clustering is observed when using the top 500 or top 1000 unique tags per library (data not shown).
Analysis of the top 300 most abundant tags from the BE, CIS, and invasive cancer datasets.
Ingenuity pathway analysis
To characterize and compare the bronchial epithelial and cancer transcriptomes, we computated average normalized tag counts for BE (14 libraries), CIS (5 libraries), and invasive SCC (6 libraries) datasets, and subsequently selected the most abundant 300 unique tags from each dataset for analysis (Table S1
). Tags mapping to mitochondrial-encoded genes and ribosomal protein genes, were found at similar frequencies within the top 300 most abundant tags, across all three datasets of BE, CIS, and SCC, at ~8% and ~18%, respectively. We used the core analysis component of Ingenuity Pathway Analysis
(IPA) to categorize these genes according to biological functions. Only those molecules having at least one functional annotation in the IPA Knowledge Base qualify for analysis, and included 220 (BE), 231 (CIS), and 233 (SCC) IPA eligible genes. These analyses reveal that genes within the category of Hair and Skin Development and Function
are highly expressed in the CIS transcriptome relative to both BE and SCC, and genes within the categories of Hematological System Development and Function
and Immune Cell Trafficking
are highly expressed in the SCC transcriptome relative to both BE and CIS. Hence, high expression of genes associated with epidermal development is identified here as a characteristic feature of the CIS transcriptome, and high expression of genes associated with cellular movement as a characteristic feature of the SCC transcriptome. See for a summary of these analyses.
A notable feature of both the CIS and SCC datasets is the abundance of tags mapping to the constant region of immunoglobulin heavy chains. In fact, the SAGE tag for IGHG1 is the most abundant tag in both the CIS and the invasive cancer datasets (Table S1
). Although it is possible that expression of these immunoglobulin chains originates from infiltrating lymphoid tissue and surrounding stroma, previous studies have demonstrated expression of heavy chain constant and variable regions of immunoglobulins in breast cancer epithelial cells 
, and expression of IgG heavy and light chains in various epithelial cancer cells including lung SCC 
. These immunoglobulin chains may represent cancer cell autoantibodies, and stimulate growth in an autocrine/paracrine fashion 
. Intriguingly, the abundance of tags mapping to immunoglobulin heavy chain transcripts in CIS preinvasive lesions of the lung, in contrast to the relatively low detection in both BE and precancerous metaplastic and dysplastic lesions (detailed in subsequent tables), suggests that up-regulation of these transcripts may have relevance to initiation of NSCLC.
Gene expression changes common to carcinoma-in-situ and precancerous lesions
Transition from a healthy bronchial epithelium to invasive cancer is thought to proceed via progression of histological and genetic abnormalities: BE to PC to CIS to SCC, where PC represents precancerous lesions (squamous metaplasia and dysplasia). Squamous metaplasia is a transient component of normal wound healing of the bronchial epithelium, and typically resolves to a re-differentiated epithelium composed of pseudostratified ciliated and secretory cells, restoring bronchial function 
. (Use of the term PC here does not imply an obligatory progression to cancer, but rather refers to lesions/abnormalities that despite infrequent progression to cancer, are considered as precursors to cancer.) Conversely, CIS lesions demonstrate a low regression frequency with a high incidence of progression to invasive cancer 
and are characterized by a more extensive stratification of squamous cell types compared to PC 
. By identifying gene expression changes common to both PC and CIS relative to BE, we focus on those genetic events which occur early and persist through to CIS. In accordance with our selection criteria (minimal three-fold difference in average normalized tag abundance; minimal average normalized tag abundance of 40 TPM in the over-expressing dataset), 868 SAGE tags were found to be similarly differentially expressed in PC and CIS relative to BE, consisting of 190 up-regulated tags, and 678 down-regulated tags ().
Venn diagrams of differentially expressed genes discussed in this manuscript.
Up-regulated expression changes
Approximately 35% of the tags up-regulated in CIS relative to BE (190 out of 529 tags) were found to be commonly up-regulated in PC lesions, and approximately 25% of tags up-regulated in PC lesions relative to BE (190 out of 754 tags) were found to be similarly up-regulated in CIS (). See Table S2
for a description of these tags up-regulated in both PC and CIS. IPA functional analysis (based on 143 eligible mapped IDs) indicates that roughly 35% of these commonly up-regulated genes are associated with epidermal development and associated disorders, as described in . In addition to those genes described in , a review of the literature identified other genes within the PC/CIS up-regulated dataset to be associated with epidermal development, including SBSN, CNFN, CRCT1, additional members of the small proline-rich family of proteins (SPRR2E and SPRR3), and additional members of the S100A family of calcium-binding proteins. Many of these genes are encoded either within the epidermal differentiation complex (EDC) locus on 1q21 
or within a conserved locus on 19q13 
, and specify components of the cornified cell envelope, a structure that provides barrier protection to epidermis and internal epithelium in response to insult or injury 
Genes associated with epidermal development in the CIS_PC over BE dataset by IPA functional analysis.
IPA pathway graphical representation of the genes commonly up-regulated in CIS and PC relative to BE, is presented in . Considering the molecular interactions identified by IPA, functional associations among desmosomal cadherins and catenins are prominent within the PC/CIS up-regulated dataset. Desmosomes are intercellular adhesion junctions that provide mechanical integrity to the epithelium, and studies indicate that desmosomal cadherins modulate keratinocyte differentiation and epidermal morphogenesis 
. This analysis also suggests that a signaling cascade mediated by members of the 14-3-3 family of proteins, may be active here. 14-3-3 sigma (SFN) mediates keratinocyte differentiation and stratification of epidermis 
. The pathway diagram also suggests that specific aspects of keratinocyte terminal differentiation may be mediated by the AP-1 transcription factor FOSL2, which may have an additional role in extracellular matrix remodeling. Indeed, FOSL2 has been identified as a mediator of pulmonary fibrosis 
. A consideration of genes associated with the transcription factor HIF1A in the PC/CIS lesions, is suggestive of a remodeling/profibrotic response to hypoxic growth conditions 
. Indeed, a functional link between hypoxia and fibrosis is documented in the literature 
. Expression of other genes identified here, such as NCF1 (the regulatory subunit of NADPH oxidase), and the heme catabolic enzyme HMOX1, is also indicative of an oxygen-related stress response and tissue remodeling/fibrosis 
. It is noted that additional IPA analysis identified Hepatic Fibrosis/Hepatic Stellate Cell Activation
and 14-3-3-mediated Signaling
as significant categories within Canonical Pathways
, and identified Hepatic Fibrosis
as the most significant category within Toxicity Lists
(data not shown).
IPA pathway graphical representation for the CIS_PC over BE dataset of up-regulated genes.
Additional analysis by Gene Ontology using the GATHER
annotation tool 
, similarly identified epidermal development as a prominent component of the transcriptome of commonly up-regulated genes in CIS and PC lesions, relative to BE (Figure S1A, Table S3
Down-regulated expression changes
The majority of tags down-regulated in CIS relative to BE, were found to be commonly down-regulated in PC lesions (678 tags out of 904 tags), and vice versa (678 tags out of 912 tags down-regulated in PC relative to BE) (). See Table S4
for a description of these tags down-regulated in both PC and CIS. IPA functional analysis (based on 347 eligible mapped IDs) identified Cellular Assembly and Organization
and Embryonic Development
as the two most significant functional categories (p-values 1.29E-05–4.67E-02 and 1.30E-05–4.32E-02, respectively) for this dataset of commonly down-regulated genes (data not shown). Within the former, seven genes associated with biogenesis and formation of cilia were identified. These include dynein components of the cilium axoneme (DNAI2, DYNC2H1), FOXJ1 transcription factor and master regulator of motile ciliogenesis 
, intraflagellar transport proteins IFT172 and IFT88, kinesin family member KIF3A, and BBS5, a protein family member linked to Bardet-Biedl Syndrome and localized to ciliary basal bodies. Genes identified within the functional category of Embryonic Development
are associated with patterning, specification of the midline axis, and formation of the neural tube. As these developmental processes have been linked to ciliary activity, specifically primary cilia-mediated Hedgehog signaling 
, presumably this data overall reflects loss of the ciliated cell phenotype as common to PC and CIS lesions.
We also note at a lower significance, down-regulation of genes associated with DNA recombination and repair in CIS and PC lesions (p-value 1.16E-04-4.42E-02). These include DNA repair genes (CCNO and NEIL1, glycosylases associated with base-excision repair; cyclin-dependent kinase CDK2; glycoprotein CLU; p53-inducible ribosomal protein RPS27L associated with the G1 DNA damage checkpoint; helicase RUVBL2; TRIP13, a regulator of double-strand break repair and meiotic checkpoint control; antioxidant SOD1), genes associated with DNA modification (editing enzyme APOBEC3G; antioxidant CAT), and DNA catabolism (exoribonuclease XRN2), mediators of ATP hydrolysis (ATPIF1, MAPK1, N4BP2, RUVBL1, RUVBL2, TGM2), centriole duplication (AKAP9, CETN2), and folate receptor FOLR1.
IPA pathway graphical representation of genes commonly down-regulated in CIS and PC also highlights genes associated with ciliogenesis including axonemal components and centrosomal proteins, genes associated with goblet cell differentiation, and genes associated with epithelial cell polarization and ion transport (Figure S2
). Down-regulation of these genes, as well as many others in the BE over PC_CIS dataset also associated with ciliogenesis but not identified as such by IPA, reflect a pronounced loss of mucociliary differentiation in both PC and CIS lesions, presumably accompanied by deficiency in clearance and defense of the airways 
. For further description of the genes identified in Figure S2
, see Text S1
Additional analysis by Gene Ontology using the GATHER
annotation tool 
, identified processes associated with cilia function such as gametogenesis and spermatogenesis, and microtubule-based processes, as prominent components of the transcriptome of commonly down-regulated genes in CIS and PC lesions, relative to BE (Figure S1B
, Table S3
Identification of differentially expressed genes in carcinoma-in-situ and invasive cancer transcriptomes key to cancer development
By identifying genes differentially expressed between preinvasive and invasive stages of lung cancer development (CIS and SCC, respectively), relative to both non-cancerous bronchial epithelium and precancerous metaplasia/dysplasia lesions (BE and PC, respectively), we propose to identify expression changes instrumental to both initiation (CIS) and progression (SCC) of lung cancer. In accordance with our selection criteria (minimal three-fold difference in average normalized tag abundance; minimal average normalized tag abundance of 40 TPM in the over-expressing dataset), 309 SAGE tags were found to be differentially expressed in CIS relative to BE and PC, and 280 tags were differentially expressed in SCC relative to BE and PC, with 116 tags similarly differentially expressed (). It is noted that the stringent selection criteria imposed in this study for differential expression would preclude certain genes, although present in the SAGE datasets and relevant to cancer development, from further analysis (see example below). However, a high stringency within the selection process, typically lends greater confidence to the relevance of those genes identified as differentially expressed in the cancer datasets.
Up-regulated expression changes
We identified 225 SAGE tags to be over-expressed in CIS relative to both BE and PC (Table S5
), and 232 tags to be over-expressed in invasive SCC relative to both BE and PC (Table S6
). It is noted that greater than 35% of the over-expressed tags within the CIS dataset (85 tags) were commonly up-regulated in SCC (), suggesting that significant expression changes relating to advanced cancer have already occurred by the time a diagnosis of CIS has been made, in accordance with irreversibility of CIS lesions. Discrepancy between the number of up-regulated tags and the number of IPA mapped IDs within each dataset, indicates that a significant proportion of potentially up-regulated genes in early-stage lung cancer remain to be identified (). IPA pathway graphical representation for up-regulated tags with mapped IDs for the two cancer datasets, is presented in . A higher proportion of up-regulated gene products are localized to the extracellular space in the SCC dataset relative to the CIS dataset. Considering the molecular interactions identified by IPA, functional networks involving the cell surface/extracellular matrix adhesion protein FN1, and transcriptional/cell cycle regulator CDKN2A, highlight the SCC dataset. Up-regulation of FN1-interacting proteins associated with tissue remodeling/fibrosis, and FN1-interacting proteins associated with acute phase response, suggests a link between these processes in SCC. A link between acute phase response and tissue repair has been previously proposed 
. Activation of a CDKN2A functional network associated with cellular senescence, may reflect a protective response of the involved organ to acute tissue injury 
. No outstanding molecular interactions were apparent for the CIS up-regulated dataset. Differential expression for a subset of the up-regulated genes in was validated by real-time quantitative RT-PCR (Table S7
Genes up-regulated in the CIS and invasive SCC datasets relative to BE and PC.
We performed analysis by Gene Ontology using the GATHER annotation tool 
for genes up-regulated in CIS and SCC relative to BE and PC (Figure S3; Table S8
). This analysis identified fatty acid biosynthesis (described by a cluster of 5 genes) as a component of the CIS dataset. In agreement with IPA analysis described above, gene ontology analysis identified defense response (described by a cluster of 20 genes) as a notable component of the invasive cancer dataset. Additionally, skeletal development, response to wounding, anion transport, and carbohydrate catabolism were also identified by GATHER gene ontology analysis, as components of the invasive cancer dataset.
Considering the notable expression of genes associated with epidermal development common to the PC and CIS datasets, we investigated whether functionally related genes are also enriched in the cancer datasets (CIS and SCC) relative to both BE and PC. The genes identified from this investigation using IPA functional analysis are described in . These data indicate that gene expression patterns reflective of epidermal development are not restricted to precancerous lesions, but rather also present as a component of CIS (~15% of IPA eligible mapped IDs), and invasive cancer (~26% of IPA eligible mapped IDs) apart from precancerous lesions.
Up-regulated genes associated with epidermal development in CIS and invasive SCC according to IPA functional analysis.
In addition to those genes described in , other genes up-regulated in CIS relative to both BE and PC that are associated with epidermal development, include KRTDAP (encoded on19q13), KPRP, SPRR2F, SPRR2G, and LCE3D (all encoded on 1q21). KRTDAP is associated with epidermal morphogenesis, and is a potential regulator of keratinocyte differentiation 
. KPRP is an epidermal marker expressed in stratified squamous epithelia 
, and has a potential role in calcium-induced keratinocyte differentiation, and expression is increased in psoriasis 
. It is noted that genes associated with development of the cornified cell envelope encoded on 1q21/19q13, tend to be expressed at notably lower levels in invasive cancer relative to CIS (see Table S2
and Table S5
As described in , transcriptional regulators associated with keratinocyte differentiation over-expressed within invasive cancer include IRF6, CDKN2A, and JUNB. IRF6 is an interferon-induced transcription factor associated with the switch between keratinocyte proliferation and differentiation 
. JUNB, a member of the AP-1 transcription factor family, is associated with keratinocyte differentiation during wound healing and psoriasis 
. The cyclin-dependent protein kinase inhibitor/transcription factor CDKN2A (see above), is associated with cell cycle arrest/senescence of keratinocytes, and differentiation of epidermis 
. Although considered a tumor suppressor protein, increased expression of CDKN2A at the invasive front of basal cell carcinomas and colon cancer has been reported, and the correlation of increased invasiveness with decreased proliferation, suggests that CDKN2A may play a role in cancer cell invasion 
An association between epidermal development and squamous cell lung cancer development is frequently studied through analysis of the EGFR pathway 
. Anti-EGFR therapies have been initiated for various types of cancer, including NSCLC 
. Also, up-regulation of KGF, a member of the fibroblast growth factor family and mediator of epidermal differentiation, is associated with pancreatic cancer 
. The identification of additional genes associated with epidermal development and up-regulated in the early stages of NSCLC, may enhance our understanding of the role of this pathway in lung cancer development, and broaden treatment options.
Additional IPA analysis
We utilized IPA core analysis to identify additional functions within the CIS and invasive SCC datasets of up-regulated genes (). Canonical Pathways analysis and Toxicity Lists analysis identified metabolism/detoxification of xenobiotics as a component of the CIS dataset, and hepatic fibrosis as a major characteristic of the invasive cancer phenotype. Specific genes associated with these phenotypes are listed in and and described below.
IPA canonical pathways analysis and toxicity lists analysis of the CIS over BE_PC and the SCC over BE_PC datasets.
Up-regulated genes in CIS and invasive SCC associated with metabolism/detoxification of xenobiotics according to specific categories within IPA canonical pathways and toxicity lists as indicated.
Up-regulated genes in invasive SCC associated with tissue fibrosis according to IPA functions, canonical pathways, and toxicity lists1.
Metabolism/detoxification of xenobiotics
Metabolism/detoxification of xenobiotics is a protective cellular response to prevent damage to macromolecules upon exposure to both exogenous and an excess of endogenous stressors [such as reactive oxygen species (ROS)]. Most notably, glutathione metabolic enzymes that mediate phase II detoxification of xenobiotics via conjugation with glutathione, and members of the aldose-reductase family of oxidoreductases, have been identified to be up-regulated in the CIS dataset (). Most of these genes are also up-regulated in the invasive cancer dataset. Functionally related genes not included within , include the oxidoreductases SRXN1 and AKR1B10. SRXN1 expression protects against cigarette smoke-induced oxidative stress, and is important for redox homeostasis 
. Similar to AKR1C1, AKR1B10 catalyzes NADPH-dependent reduction and inhibition of 4-HNE and other toxic aldehydes resulting from peroxidation of membrane lipids 
. The pentose phosphate pathway (PPP) provides reducing equivalents for glutathione reduction/recycling and maintenance of redox status; three genes associated with the PPP (ALDOC, G6PD, TKTL1), are up-regulated in CIS lesions. Expression of many of these genes (AKR1B10, AKR1C1, AKR1C3, G6PD, GPX2, GSTM1, GSTM3, GSTM4, SRXN1) is regulated by the redox-sensitive NRF2 transcription factor, and is induced by cigarette smoke 
. However, these protective responses may also promote adaptation to adverse environmental conditions (redox stressors), and survival with propagation of damaged cells. For example, SRXN1 may play a role in development of skin malignancies 
. Both AKR1C isoforms and AKR1B10 catalyze oxidative activation of xenobiotic proximate carcinogen PAH trans-dihydrodiols (such as B[a]P, a component of cigarette smoke), to generate reactive ortho-quinones, and mediate redox cycling with ROS amplification 
. Over-expression of AKR1B10 and AKR1C1 has been reported for many cancer types 
. Expression of NRF2-regulated anti-oxidant/glutathione metabolic gene HMOX1, and NCF1 (the p47phox subunit of NADPH oxidase, a major cellular source of ROS) is up-regulated in CIS lesions, but is also relatively high in precancerous lesions (, Table S2
), suggesting that xenobiotic/oxidative stress may initiate early in the pathway leading to invasive lung cancer. It has recently been suggested that NADPH oxidase may stimulate the protective activity of the NRF2-KEAP1 signaling pathway 
Tissue fibrosis, initiated in response to injury and facilitated by inflammatory mediators, is characterized by the excessive accumulation of extracellular matrix components. Genes typically associated with tissue fibrosis, and up-regulated in the invasive SCC dataset described in this study, include fibrillar collagens and other fibrillar matrix components, matrix metalloproteases, metalloprotease inhibitors, proteoglycans, chemotactic proteins, transcriptional regulators, and contractile proteins (). In addition to those genes identified by IPA, other genes within the invasive cancer dataset associated with tissue fibrosis include extracellular proteins MFGE8 
, and POSTN, a mediator of collagen fibrillogenesis 
. Related genes up-regulated in the SCC dataset include constituents of the fibrillar extracellular matrix such as MFAP2 
, FBLN1 
, and the small leucine-rich proteoglycan FMOD, a mediator of collagen fibrillogenesis and matrix assembly 
. In addition to structural properties, many of these components, such as MFAP2, MFGE8, and POSTN have signaling activities relevant to cancer development 
Tissue fibrosis is a component of various cancer types, and studies suggest that advanced fibrosis contributes to aggressiveness and resistance to chemotherapy 
,  
. The myofibroblast cell, thought to originate from various sources including transformation of resident or bone marrow-derived fibroblasts, transdifferentiation of epithelial cells to mesenchymal-type cells via EMT (epithelial-mesenchymal transition), and activation of resident stellate (astrocyte) cells, is a mediator of tissue fibrosis 
. Activation of pancreatic stellate cells mediates the fibrotic process inherent to pancreatic cancer and contributes to cancer progression 
. Activated hepatic stellate cells are the major mediators of liver fibrosis and contribute to liver cancer 
. The epidermal growth factor receptor regulates pancreatic fibrosis via stimulation of pancreatic stellate cells 
. The cellular origins of the tissue fibrosis apparent from analysis of the invasive lung cancer dataset presented in this study, is not known. Some genes identified in are associated with hepatic stellate cell activation. Others, including POSTN, contribute to pancreatic stellate cell activiation 
, and also mediate EMT 
. MFGE8 is also a mediator of EMT 
, and the cytoskeletal intermediate filament protein, VIM (over-expressed here) is a mesenchymal cell marker, and an indicator of EMT 
. EMT is associated with progression to invasive cancer 
Down-regulated expression changes
We identified 84 SAGE tags to be down-regulated in CIS relative to both BE and PC (Table S9
), and 48 SAGE tags to be down-regulated in invasive SCC relative to both BE and PC (Table S10
), with 31 tags in common (). It is noted that IPA functional analysis did not identify any down-regulated genes from these datasets as specifically associated with ciliogenesis, and although multiple significant functional categories were identified, no specific biological process within these categories took prominence (data not shown). This was the case when analysed separately or as a single dataset of down-regulated genes, perhaps partially attributed to the relatively small size of the datasets. However, when based on indirect as well as direct molecular connections, IPA pathway graphical representation identified receptor tyrosine kinase ERBB2, as central to a network associated with airway biology (Figure S4)
. In addition to associations with multiple developmental processes, ERBB2 also plays a role in airway repair including differentiation of ciliated and goblet cells, while inhibiting squamous metaplasia 
. In this regard, it is intriguing to hypothesize that failure to initiate a potential ERBB2 signaling complex in CIS lesions, may compromise redifferentiation/restoration of the bronchial epithelium following injury, and contribute to initiation of in-situ cancer. For further description of the down-regulated genes identified in Figure S4
, see Text S2
We performed analysis by Gene Ontology using the GATHER annotation tool 
for down-regulated genes in CIS and SCC relative to BE and PC (Figure S5
; Table S8
). Genes associated with defense response were most prominently identified by this analysis.
Genes expressed at notably different levels among normal and cancer datasets, have the potential to serve as biomarkers for early detection. For a listing of potential biomarkers for both CIS and invasive SCC, based upon a minimal 20-fold up-regulation (and a minimal average tag abundance of 40 TPM), see . Generally, genes associated with epidermal development (KRTDAP, SPRR2G, SPRR2E) may potentially serve as biomarkers for CIS, whereas genes associated with immune response (MHC class I receptor HLA-G, acute-phase response protein CRP) may potentially serve as biomarkers for invasive SCC. It is noted that, although precise identification of the expressing cell type(s) for the genes associated with immune response may warrant further investigation, these genes may nonetheless accurately reflect the tumor cell microenvironment and provide diagnostic potential. Multiple literature reports support the potential of HLA-G and CRP to serve as biomarkers for invasive cancer of various tissue types including NSCLC 
. Additionally, the data presented in this study suggests that the neuropeptide NTS, may have potential as a biomarker for CIS lesions, whereas CST1, a peptidase inhibitor within the cystatin superfamily, may have potential as a biomarkers for invasive SCC lesions. Intriguingly, a recent study describes the potential of CST1 as a urinary marker for colorectal cancer 
Potential biomarkers for CIS, invasive SCC, and squamous cell lung cancer.
Due to rarity of CIS specimens, we were not able to validate up-regulation of gene expression directly. However, an analysis of publically available microarray expression data from squamous lung carcinoma specimens 
, and microarray expression data from internally profiled bronchial brushings, indicates that genes associated with epidermal development and showing specific expression in the CIS SAGE dataset, such as KRTDAP and SPRR2G, may actually retain up-regulation in a subset of invasive tumors (Figure S6
). Thus, these genes may have potential to serve as early-stage biomarkers for SCC in a combinatorial manner.
To identify potential biomarkers associated with both preinvasive and invasive squamous cell lung cancer, we selected SAGE tags on the basis of a minimal 20-fold up-regulation in both the CIS and invasive cancer datasets (relative to BE and PC) (). Intriguingly, several of these tags map to immunoglobulin heavy chain and light chain genes (see above). Other genes identified here include NTRK2, a member of the neurotrophin tyrosine kinase receptor family, and GSTM3, a mediator of glutathione/xenobiotic metabolism (see above). Genetic polymorphisms of GSTM3 and other members of the GST family of proteins, have been associated with cancer risk 
. Enhanced expression of NTRK2 has been associated with poor prognosis in neuroblastoma and other cancer types 
. It is noted that several of the selected tags in are currently unmapped to a gene ID; further experimentation to resolve these mappings may provide additional genes for biomarker evaluation.
In an effort to evaluate the frequency of over-expression of these candidate genes in SCC on a broader scale, we again consulted publically available microarray expression data from squamous lung carcinoma specimens 
, and microarray expression data from internally profiled bronchial brushings. Although sporadic, a trend of up-regulation was observed for COL3A1, SFTPC, CST1, IGHG1, GSTM3, SLCO1A2, and NTRK2 in squamous tumors relative to bronchial epithelium, lending support to the potential of these genes to serve as biomarkers for invasive SCC (Figure S6
). See Table S11
for raw microarray data used for this analysis.
Differential gene expression and genomic copy-number status in CIS lesions
To investigate whether alterations in gene dosage contributes mechanistically to the differential gene expression identified here in early-stage lung cancer, we compared segmental copy number gain/loss from 20 independent CIS specimens with locus information for up-regulated and down-regulated genes in CIS/SCC relative to BE and PC. A subset of up-regulated genes localized to regions of stable frequent gain (), and a subset of down-regulated genes localized to regions of stable frequent loss (). Most prominently identified corresponding regions of copy number gain include 1q21–1q42.13 and 3q12.1–3q29, and loci within chromosomal arms 7q, 8q, 17q, and 20q. It is noted that the region 1q21 encodes the EDC, a region over-expressed early in CIS lesions (see above). Most prominently identified corresponding regions of copy number loss include loci within chromosomal arms 3p and 6p. This data agrees with previously published data for lung cancer, particularly amplification at 3q, 7q, 8q, and loss of 3p 
. [Notably, a recent study describes genomic amplification and over-expression of transcription factor SOX2 encoded at 3q26.33 in lung squamous cell carcinoma 
. Although we detect enhanced expression of SOX2 in CIS relative to both BE and PC, the tag abundance ratio falls marginally below the three-fold threshold/cut-off applied in this study, precluding this gene from the copy-number analysis presented here.] The data presented here suggests that frequent chromosomal gain/loss of specific loci, represents a significant mechanism for differential gene expression in early, preinvasive stages of squamous cell lung cancer. See Table S12
and Table S13
for raw data describing copy-number status for up-regulated genes and down-regulated genes, respectively.
Correlation between up-regulated gene expression in CIS and SCC relative to BE and PC, with regions of frequent copy-number gain in CIS specimens.
Correlation between down-regulated gene expression in CIS and SCC relative to BE and PC, with regions of frequent copy-number loss in CIS specimens.
In this study we describe genes differentially regulated in early stages of squamous NSCLC development by way of SAGE profiling. Differentially expressed genes found in common between CIS and precancerous lesions relative to bronchial epithelium, are presumed to reflect early expression changes during CIS development. Those genes differentially expressed in CIS relative to the precancerous lesions/bronchial epithelium, and in invasive SCC relative to the precancerous lesions/bronchial epithelium, presumably reflect gene expression changes more instrumental to cancer initiation, and cancer cell invasion, respectively. In this study, data was analyzed primarily through the use of Ingenuity Pathway Analysis, complemented by literature searches pertaining to specific genes. Here we have described the up-regulation of genes associated with epidermal development, and the down-regulation of genes associated with mucociliary development, in both CIS and precancerous lesions relative to bronchial epithelium. Increased expression of genes associated with desmosomal cell-cell junctions, and epidermal barrier formation, would conceivably enhance tissue integrity, and may reflect a protective response to tissue damage occurring early in CIS lesions. Although genes associated with epidermal development are also elevated in SCC, those genes specifically associated with epidermal barrier formation and desmosomal structures, show relatively low expression in invasive SCC, suggesting further tissue architectural changes upon transition to invasive cancer. Our data also suggests that tissue remodeling/fibrosis is present in early stage CIS lesions, where it may reflect a cellular response to hypoxia/oxidative stress. Our analysis has identified up-regulation of genes associated with xenobiotic metabolism/detoxification in CIS and invasive SCC relative to bronchial epithelium and precancerous lesions, implying an enhanced requirement for protection against electrophile and/or oxidative stress upon the transition from precancer to CIS. Up-regulated genes specifying tissue fibrosis is a pronounced feature of the invasive cancer dataset, where it appears in association with acute phase immune components. Thus, the data presented here suggests that a fibrotic tissue response is initiated in early stage CIS, and is further developed in invasive cancer. Considering that many of these matrix components have signaling activities associated with regulation of cellular proliferation and migration similar to those described for EMT, the profibrotic phenotype described here may represent a defining component of advanced lung SCC. Additionally, by selecting SAGE tags showing extreme up-regulation among the various datasets, we have identified a small number of genes that may have potential as biomarkers for early diagnosis. Although some of these genes have previously been investigated as biomarkers for invasive cancer by other researchers, this is the first description of potential biomarkers for CIS. Lastly, a comparative analysis between differential gene expression in CIS lesions and invasive carcinoma with array CGH data from independent CIS specimens, suggests that copy number alterations plays a significant role in differential gene expression in CIS lesions.