|Home | About | Journals | Submit | Contact Us | Français|
Visually normal cells adjacent to, and extending from, tumors of the lung may carry molecular alterations characteristics of the tumor itself, an effect referred to as airway field of cancerization. This airway field has been postulated as a model for early events in lung cancer pathogenesis. Yet the genomic landscape of somatically acquired molecular alterations in airway epithelia of lung cancer patients has remained unknown. To begin to fill this void, we sought to comprehensively characterize the genomic architecture of chromosomal alterations inducing allelic imbalance (AI) in the airway field of the most common type of lung tumors, non–small cell lung cancer (NSCLC). To do so, we conducted a genome-wide survey of multiple spatially distributed normal-appearing airways, multiregion tumor specimens, and uninvolved normal tissues or blood from 45 patients with early-stage NSCLC. We detected alterations in airway epithelia from 22 patients, with an increased frequency in NSCLCs of squamous histology. Our data also indicated a spatial gradient of AI in samples at closer proximity to the NSCLC. Chromosome 9 displayed the highest levels of AI and comprised recurrent independent events. Furthermore, the airway field AI included oncogenic gains and tumor suppressor losses in known NSCLC drivers. Our results demonstrate that genome-wide AI is common in the airway field of cancerization, providing insights into early events in the pathogenesis of NSCLC that may comprise targets for early treatment and chemoprevention.
Lung cancer is the leading cause of cancer-related deaths in both men and women (1). Non-small cell lung cancer (NSCLC) comprises the majority (~85%) of all lung tumors, with lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) the most frequently diagnosed histologic subtypes (2). NSCLC exhibits relatively poor prognosis with an average 5-year survival rate of 15% (2). Importantly, even early-stage NSCLC exhibits relatively poor clinical outcome compared with other similar stage malignancies with 5-year survival rates of stage I NSCLC reaching only approximately 50% (3). Data from the National Lung Screening Trial suggest that screening is expected to increase detection rates and augment the number of diagnosed early-stage lung cancers (4), warranting the need for better early treatment strategies for this growing subpopulation. Understanding molecular events that drive NSCLC development will permit identification of early targets for prevention and treatment.
Studies in field carcinogenesis of the lung airway have revealed molecular alterations in normal-appearing cells adjacent to lung tumors that are characteristic of the tumor itself (5). This airway “field of cancerization” in the lung has been linked to smoking-associated damage and is thought to be highly pertinent to lung oncogenesis (5). Previously described airway field changes include loss-of-heterozygosity (LOH) at chromosomal regions 3p and 9p (6), promoter methylation of CDKN2A (7), mutations in the EGFR (8), and KRAS oncogenes (9) as well as gene expression profiles that are common between tumors and adjacent normal airway cells (10). A better understanding of these field changes may provide important biologic insights into lung tumorigenesis.
LOH and other forms of acquired chromosomal alterations that induce allelic imbalance (AI) have an established role in oncogenesis (11). Yet the genomic landscape of somatically acquired alterations in the airway field of cancerization remains largely unexplored. In this study, we interrogated a rich collection of airways from NSCLC patients, analyzing whole genome copy number alterations (CNA) using SNP arrays and applying novel computational tools that allow us to detect alterations that are present at low cellular fractions. We report the landscape of AI (CNAs and copy-neutral LOH, cn-LOH) in the airway field of cancerization, offering new insights into the spatiotemporal evolution of lung cancer from the premalignant field.
NSCLC tumors and airway field cancerization samples were obtained from early-stage (I–IIIA) NSCLC (31 LUADs, 14 LUSCs) patients who were evaluated at The University of Texas MD Anderson Cancer Center (MD Anderson; Houston, TX). The median follow-up times to survival and recurrence were 22 months (range 1–41) and 21 months (range 1–42), respectively. Tumor stage was classified as described in previous work (10). Individuals who smoked at least 100 cigarettes in his or her lifetime but quit smoking more than 12 months before NSCLC diagnosis were considered former smokers (12). The study was approved by the Institutional Review Board and all participants provided written informed consents. All 45 NSCLC patients did not receive neoadjuvant therapy prior to surgery and their clinicopathologic information is summarized in Table 1. Paired NSCLC and uninvolved normal lung tissues were obtained snap-frozen. We also obtained multiregion core-needle biopsies (CNB) from 20 of the 45 NSCLC tumors (Supplementary Table S1).
Each patient set comprised samples from the primary tumor and normal-appearing airways paired with blood cells and/or uninvolved normal lung tissue (all samples, n = 435; Fig. 1). The type (including numbers) of airway samples obtained from each case is summarized in Supplementary Table S1. White blood cells were available for 36 of the 45 NSCLC cases and, along with normal lung tissue, were used for somatic contrasts. Brushings from nasal epithelia and ipsilateral large (mainstem bronchi) airways (L1) were obtained from 27 and 22 NSCLC patients, respectively, during endoscopic bronchoscopy prior to resective surgery as described previously (13). Nasal brushings were obtained using sterile Cytosoft Cytology Brushes (Medical Packaging Corporation), whereas brushings from large airways were obtained endoscopically using ConMed disposable bronchial cytology brushes (ConMed Corporation). Small airway epithelia (S) adjacent to NSCLCs were obtained from the resected specimens as described previously (10). Briefly, small airway epithelia were collected by brushing, using Cytosoft brushes, 1–5 sequential bronchiolar structures with varying distances from the tumors (S1, relatively closest from NSCLC to S5, relatively farthest small airway from the tumor). Confirmation of epithelial cell collection by pan-cytokeratin immunohistochemical analysis as well as cytopathologic control was performed as described previously (10). All airway samples included in this study were determined to be normal by pathology review. Airway brushings were placed in PBS, pelleted, and immediately stored in −80°C until further processing.
Genomic DNA was isolated from all samples (n = 435) using the QIAamp DNA Kit from Qiagen according to the manufacturer's instructions. Double-stranded genomic DNA was quantified using the RNase P Assay (Life Technologies) according to the manufacturer's instructions. DNA quality was assessed by running samples on 1% agarose gels to confirm absence of DNA degradation and RNA contamination. High-quality DNA was then surveyed for genome-wide AI using the whole genome Human OmniExpressExome (v.1.2) BeadChip Array Platform (Illumina), which queries approximately 981,000 SNPs (741KtagSNPs, 240K exonic). All raw data and sample annotations were submitted to the Gene Expression Omnibus (GEO) under series GSE80519 (samples GSM2129208-GSM2129642).
To survey normal-appearing airways with expected low fractions of aberrant DNA, a sensitive haplotype-based statistical algorithm, hapLOH (14) was used to infer genome-wide AI. This method is based on identifying subtle B-allele frequency (BAF) shifts among heterozygous markers that are congruent with one of the parental haplotypes and thus consistent with an AI event such as deletion, duplication, orcn-LOH (Supplementary Fig. S1). The following parameters were used to run hapLOH: mean event size, 20 Mb; event prevalence, 0.001; and max iterations, 100. The BAF values (sample specific) and germline haplotype estimates (patient-specific) were input to hapLOH. The number of aberrant event states was set to 1 for nontumor samples and it was set to 2 for tumor/CNB samples. For each patient, a germline sample (blood or uninvolved normal lung tissue) was designated, after which fastPHASE (15) was applied to each patient for statistical reconstruction of the haplotypes. The hidden Markov model of hapLOH was used to compute the probability that a set of adjacent markers span a region of AI. An AI event was defined as a continuous set of markers with posterior probabilities exceeding the threshold of 0.5, although 81% of identified events exceeded 0.9 (Supplementary Fig. S2). Events had an average size of 10,480 markers (median=5,346; max=44,650; min=58). As all but two events were comprised of more than 100 markers, an arbitrarily selected value of 20 markers rendered virtually no impact on our findings. To complement findings from hapLOH, BAFsegmentation (16) was also applied using the following parameters: noninformative mBAF threshold, 0.85; triplet filtering cutoff, 0.8; AI calling mBAF threshold, 0.56; and minimum segment size, 4. Events called by either method were removed if any of the following were true for a called event: fewer than 20 markers, greater than 50% reciprocal overlap with copy number variants in the database of genomic variants (for putative gains), or if the event showed a 10% overlap with an event called in the paired germline (blood or normal lung) sample. Calls from BAFsegmentation with less than 50 markers were removed unless they showed marginal statistical support from hapLOH (P < 0.05).
Bedtools was used to identify genes impacted by AI events and gene coordinates were downloaded from the UCSC table browser RefSeq release 68 and genome build hg19 (GRCh37), the same build used for the array design.
Fisher exact test was applied to assess associations between smoking and presence of AI in the airway. Logistic regression was used to assess associations with histology, adjusting for smoking status. Azero-inflated Poisson distribution (R; www.r-project.org) was used to examine the interplay of smoking, histology, and AI, treating smoking (pack-years) and AI (counts of events) as continuous variables.
See Supplementary Information for a full description of Methods.
We sought to understand the extent and role of chromosomal alterations in the airway field by analysis of genome- and airway-wide AI in NSCLC. To do so, we collected a rich set of tumor and normal-appearing airway samples from 45 early-stage NSCLC patients (31 LUADs and 14 LUSCs; Table 1). We performed AI analysis on a total of 435 samples including 1–5 small normal-appearing airways adjacent to the tumor as well as NSCLCs from all 45 patients (Fig. 1; Supplementary Table S1). Our sample set also included centrally (in the lung) located normal-appearing large airways and nasal epithelia (obtained from 22 and 27 cases, respectively) for a better assessment of airway-wide patterns of AI in the airway field (Supplementary Table S1). In addition, we analyzed multiregion tumor core-needle biopsies (CNB) from 20 patients for a comprehensive assessment of the primary tumor as well as white blood cells (in 36 cases) and uninvolved normal lung tissue (in all 45) as comparators to aid identification of somatic AI events in the field (Supplementary Table S1).
We applied a haplotype-based computational method, hapLOH (14), to infer genome-wide AI in the normal-appearing airway field of cancerization. In total, we detected 255 somatic airway AI events (247 autosomal, 8 events in the X chromosome; Supplementary Fig. S3). These somatic events were found in 22 of 45 patients (Fig. 2), distributed across 30 small airways adjacent to tumors and three relatively more-distant large airways (Fig. 2; Supplementary Table S2). Of note, we did not find AI events in any of the 27 nasal samples, which are the most distant airway samples from the tumor. Our findings suggest that somatic events are distributed along a spatial gradient in the normal-appearing airway field, particularly in LUAD patients, in which samples in closer proximity to the primary tumor are more likely to exhibit somatic chromosomal alterations (Fig. 2; Supplementary Table S2).
We sought to assess the relationship between AI (as presence and burden) in the airway with clinical and pathologic features. Overall, lifetime smokers had a higher AI burden (252 events in 19 of 37 smoker patients) relative to nonsmokers (3 events in 3 of 8 nonsmokers; Fig. 2). We found that events in chromosome 9 were the most frequent AI alterations in the airway field and were detected in smokers only (Fig. 2). When we compared AI events by tumor stage, we found no positive correlation between AI burden and progression of pathologic stage, which may be in part due to the relatively small number of stage III samples (n = 6) in our early-stage cohort. Notably, histology significantly predicted somatic alterations, with 79% (11/14) of LUSC patients exhibiting AI, compared with 35% (11/31) for LUAD (P = 0.011). When we excluded nonsmokers from this analysis, the difference between histologic subtypes was still statistically significant, with the presence of AI in LUSC (11/14) patients significantly exceeding that in LUAD smokers (9/23; P = 0.02 for a χ2 test and P = 0.03 using logistic regression and adjusting for pack-years as a continuous covariate). Although our cohort is relatively young, we attempted to probe the association between presence of AI and tumor recurrence. While this association exhibited an OR of 2.5, it did not reach statistical significance (P = 0.2).
We then interrogated the spectrum of different types of alterations (deletions, cn-LOH, or duplications) detected in the airway field. To do so, we jointly analyzed BAF and log R ratio values within a region of detected AI (Supplementary Fig. S4). Still, at cell fractions below 5%, determination of the specific alteration becomes exceedingly difficult. Of our detected AI events in the airway field, 39 were gains, 33 were losses, 4 were cn-LOH, and 179 were deemed undeterminable. Among the somatic events in the normal-appearing airways for which we confidently assign alteration type, 88% matched the classification of the event found in a corresponding tumor or CNB sample from the same patient. For purposes of overall summaries of the alterations, we assigned alteration types to the undeterminable events based on the classification of suitably matched events in a paired tumor sample (see Supplementary Methods and Supplementary Fig. S4), resulting in 46 gains, 64 losses, and 27 cn-LOH and 118 undeterminable (Fig. 2 and Supplementary Fig. S3).
Thirteen airway AI events matched an alteration in the paired tumor by physical position but differed in one of the two ways. For 4 of these 13, the event designations (e.g., deletion, duplication) differed between airway and tumor. For the other 9 events, the specific haplotype in relative excess differed between the airway and tumor, that is, the maternal copy of the chromosome was observed to be in relative excess in one sample and the paternal copy in excess in the other sample (Supplementary Fig. S5). These observations imply regions of genomic instability and recurrent independent AI (independent mutations in the airway and tumor).
In fact, these 13 observations (listed in bold in Supplementary Table S3) understate the actual rates of “recurrent” (within-patient, across-sample) mutation. An expected half of recurrent AI events will induce imbalance in the same direction and thus go undetected by the above analysis of haplotype consistency. In our data, 6 of the 9 aforementioned examples of opposite-direction AI comprised an entire chromosomal arm. Interestingly, all 6 were found on chromosome 9q. There were 15 total 9q AI mutations in the field, leading to an estimated 9q recurrent mutation rate of 12/15 (0.80, ± 0.15; see Supplementary Methods).
To understand the effects of these AI field events on the pathobiology of NSCLC, we annotated the identified field alterations as bona fide drivers in cancer (17) and with genes previously reported to be aberrant in NSCLCs (e.g., lineage-restricted oncogenes). Overall, there were more CNAs in the airway field in LUSCs, relative to LUADs, in driver genes (105 relative to 73; Fig. 3). Losses or cn-LOH in 9q, spanning KLF4 (9q31), PTCH1 (9q22), GNAQ (9q21), TSC1 (9q34), ABL1 (9q34), and NOTCH1 (9q34), were the most frequent CNAs in the airway field of both LUADs and LUSCs (Fig. 3). The airway field of both LUAD and LUSC also displayed focal or arm losses in 19p13 comprising STK11, KEAP1, and SMARCA4 tumor suppressors. We also noted different airway field CNAs between LUADs and LUSCs. In particular, we observed gains in 3q26 that include PIK3CA and the squamous lineage-specific transcription factor SOX2 in the airway field of 3 LUSCs, whereas a gain in the adenocarcinoma-restricted lineage oncogene NKX2-1 (14q13) was observed in the field of one LUAD patient (Fig. 3). In addition, we detected copy number gain of the MYC oncogene (8q24) in the airway field of 2 LUADs, whereas focal or chromosomal arm losses in regions comprising the tumor suppressors VHL (3p25), RB1 (13q14), TP53 (17p13), MTUS1 (8p22), and SMARCB1 (22q11) were restricted to the airway field of LUSCs.
In this study, we sought to characterize the heretofore unknown landscape of somatic genome-wide chromosomal alterations in the airway field of cancerization in NSCLC. To comprehensively interrogate genome-wide AI in the airway field, we compiled and studied a rich set of matched NSCLCs and spatially distributed normal-appearing airway epithelia.Weapplied anovel algorithm hapLOH (14) and performed genome-wide assessment of AI in matched NSCLC tissues, germline samples (normal lung parenchyma or blood cells), and multiple normal-appearing airway epithelia. We also investigated “intrafield heterogeneity” by genome-wide survey of multiple spatially distributed airway field samples found in both the local/adjacent airway field (airways adjacent to tumors) and relatively more distant fields (large airways and nasal epithelia). We found that almost half (22/45) of the NSCLC patients harbored AI events in the normal-appearing airway field, the majority of which matched alterations in the paired NSCLC. We observed that the airway field of LUSCs comprised significantly more AI than the field of smoker LUADs. AI was more frequently found in adjacent (to the tumor), relative to more distant, airways and was absent in the nasal epithelia, suggestive of a spatial gradient of AI across the field. Importantly, we found consistent somatic variation in driver oncogenes and tumor suppressor genes between the field and the paired NSCLCs.
This genomic profile of the airway field of cancerization offers a window to interrogate early or critical events in lung cancer pathogenesis. For example, the airway field profiles comprised losses in chromosomal regions harboring known tumor suppressors such as 3p25 (VHL), 8p22 (MTUS1), 9q (TSC1), 19p (STK11, KEAP1, SMARCA4), 13q14 (RB1), and 17p13 (TP53) as well as gains in 3q26 (PIK3CA and SOX2), 8q24 (MYC), and 14q13 (NKX2-1). It is noteworthy that, for the most part, these airway field AI events were reported to be present in premalignant lesions. LOH in 9q (TSC1) and 17p13 (TP53) as well as reduced protein expression of STK11 were reported in atypical adenomatous hyperplasias, precursor lesions in the histopathologic sequence of LUAD development (18). Also, AI events in 8p22 (MTUS1), 13q14 (RB1), and 17p13 (TP53) were detected in squamous preinvasive lesions (6). Moreover, our finding of loss of 3p25 (VHL) in the airway field of LUSCs but not of LUADs is consistent with earlier reports demonstrating loss of chromosome 3p in normal epithelium adjacent to LUSCs (6). It is important to mention that our analysis revealed that the airway field in one LUAD harbored gain of NKX2-1, a lineage-specific oncogene that is specific to adenocarcinoma histology (19). We also observed gain of SOX2, a lineage-specific oncogene for lung tumors of squamous histology (20), in 3 LUSCs. This is of particular interest, as SOX2 has been found to be amplified in preinvasive squamous lesions (21) in the sequence of LUSC development. On the basis of the above, it is plausible to speculate that these events may be among the earliest aberrations in field carcinogenesis of the lung and, if so, represent targets for chemoprevention.
Our study revealed differences in somatic AI between the normal-appearing airway fields of LUSCs and LUADs. Genome-wide analysis of AI in multiple airway samples per case that were sampled along the respiratory tract pointed to a genomic spatial gradient in the airway field of LUADs and not in LUSCs. It cannot be neglected that this spatial gradient may be largely due to known differences (5) in anatomic locations, from which LUADs (relatively peripheral in the lung) and LUSCs (more centrally located) develop. In addition, our analysis revealed that normal-appearing airways of LUSC patients exhibited more AI events compared with normal airways of smoker LUAD patients, suggesting the significant association of squamous histology with AI burden in the airway field of NSCLCs. It is worthwhile to mention that previous work demonstrated that LUSC tumors harbor more frequent copy number alterations compared with ever-smoker LUADs (20, 22). In addition, our previous analysis of the transcriptomic architecture of the airway field of cancerization adjacent to NSCLCs revealed that the local airway field surrounding early-stage LUSCs comprised substantially more tumor-associated expression changes compared with the field adjacent to smoker LUADs (10). It is reasonable to surmise that the increased levels of AI in the airways of LUSC patients are a reflection of the overall increased genomic anomalies and instability in the LUSC tumors themselves.
Our analysis identified AI in the large proximal airways (mainstem bronchi) of three NSCLC patients, suggesting that airway epithelia that are still in situ in the lung following surgical removal of the tumor may continue to carry genomic alterations in definitively treated patients. It is intriguing to suggest that airway field aberrations in the lung may be implicated in relapse of early-stage NSCLC patients and thus warrant analysis following surgery to derive prognostic biomarkers. Another observation was that there were events, predominantly in 9q, that comprised independent and subclonal alterations. Although we studied genome-wide AI in a large set of samples (n = 435) with the primary objective of investigating the airway field to model early somatic events in the pathogenesis of NSCLC, the role of the airway field alterations in development of recurrence could not be statistically addressed at the present due to our cohort's size and relatively short follow-up time and warrant further assessment in future adequately powered studies. However, it is not unlikely that the identified somat-ically acquired AI events are involved in lung carcinogenesis, as they were shared between the airway field and NSCLCs. Nonetheless, our study represents the first in-depth attempt to characterize the landscape and architecture of genome-wide somatic alterations (e.g., AI) in the adjacent (to tumor) and more distant fields of cancerization, and current efforts are underway to expand this model to study airway AI attributable to recurrence and other clinicopathologic features such as smoking.
Earlier work pointed to common gene expression profiles between normal large airways and nasal epithelia of phenotypically healthy smokers (23), suggesting that intrathoracic gene expression changes may extend to extrathoracic (e.g., nasal) cavities. In our analysis of somatic DNA alterations in the airway field of cancerization, we did not find AI events in any of the 27 nasal brushings. It cannot be neglected that our relatively small cohort (n = 45) and the availability of nasal brushings in a subset of the cases (9/14 LUSCs and 18/31 LUADs) may impact conclusions on the frequency of DNA alterations such asdetectable AI in the nasal epithelia of NSCLC patients. It is important to note that the previously reported airway field profiling studies in smokers with cancer centered on the bronchial compartment demonstrating that gene expression changes in bronchial epithelia can improve diagnostic performance of bronchoscopy (24, 25). It is also noteworthy that our study, based on its goal of identifying potential genomic drivers in the airway field, did not assess gene expression changes but rather performed a genome-wide survey of DNA alterations (e.g., AI). It is likely that the nasal epithelial field may harbor different types of alterations (e.g., epigenetic) and that warrant future studies.
In conclusion, we characterized the architecture of genome-wide AI in the airway field of cancerization of early-stage NSCLC. We found that AI is common in the airway field of NSCLC and exhibits intrafield heterogeneity within patients. We also shed light on driver gene copy number alterations that are present in the normal-appearing airway field of cancerization and, thus, may embody early events in the pathogenesis of NSCLC.
Grant Support: This work was supported in part by Molecular Genetics of Cancer training grant T32 CA009299 (Y. Jakubek), Department of Defense (DoD) grant W81XWH-10-1-1007 (I.I. Wistuba and H. Kadara.), Lung Cancer SPORE grant P50CA70907 from the NCI (I.I. Wistuba), Cancer Prevention and Research Institute of Texas (CPRIT) award RP150079 (P. Scheet and H. Kadara), NIH grant R01HG005859 (P. Scheet), and by the Institutional Cancer Center Support Grant CA16672.
Disclosure of Potential Conflicts of Interest: No potential conflicts of interest were disclosed.
Authors' Contributions: Conception and design: P. Scheet, H. KadaraDevelopment of methodology: L. Xu, Z. Weber, J. Fujimoto, S.G. Swisher, P. Scheet
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): W. Lang, W. Lu, Z. Weber, G. Davies, C. Behrens, N. Kalhor, C. Moran, J. Fujimoto, R. Mehran, R. El-Zein, S.G. Swisher, E.A. Ehli, P. Scheet, H. Kadara
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): Y. Jakubek, S. Vattathil, L. Huang, S.-Y. Yoo, L. Shen, J. Huang, J. Wang, J. Fowler, I.I. Wistuba, P. Scheet, H. Kadara
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M. Garcia, L. Xu, C.-W. Chow, J. Fujimoto, P. Scheet
Writing, review, and/or revision of the manuscript: Y. Jakubek, W. Lang, G. Davies, J. Fujimoto, R. El-Zein, S.G. Swisher, J. Fowler, A.E. Spira, E.A. Ehli, I.I. Wistuba, P. Scheet, H. Kadara
Study supervision: P. Scheet, H. Kadara