|Home | About | Journals | Submit | Contact Us | Français|
Affordable early screening in subjects with high risk of lung cancer has great potential to improve survival from this deadly disease. We measured gene expression from lung tissue and peripheral whole blood (PWB) from adenocarcinoma cases and controls to identify dysregulated lung cancer genes that could be tested in blood to improve identification of at-risk patients in the future. Genome-wide mRNA expression analysis was conducted in 153 subjects (73 adenocarcinoma cases, 80 controls) from the Environment And Genetics in Lung cancer Etiology (EAGLE) study using PWB and paired snap-frozen tumor and non-involved lung tissue samples. Analyses were conducted using unpaired t-tests, linear mixed effects and ANOVA models. The area under the receiver operating characteristic curve (AUC) was computed to assess the predictive accuracy of the identified biomarkers. We identified 50 dysregulated genes in stage I adenocarcinoma versus control PWB samples (False Discovery Rate ≤0.1, fold change ≥1.5 or ≤0.66). Among them, eight (TGFBR3, RUNX3, TRGC2, TRGV9, TARP, ACP1, VCAN, and TSTA3) differentiated paired tumor versus non-involved lung tissue samples in stage I cases, suggesting a similar pattern of lung cancer-related changes in PWB and lung tissue. These results were confirmed in two independent gene expression analyses in a blood-based case-control study (n=212) and a tumor-non tumor paired tissue study (n=54). The eight genes discriminated patients with lung cancer from healthy controls with high accuracy (AUC=0.81, 95% CI=0.74–0.87). Our finding suggests the use of gene expression from PWB for the identification of early detection markers of lung cancer in the future.
Lung cancer causes more deaths than any other cancer in both men and women, with over 160,000 deaths annually in the United States and one million worldwide (1). Unfortunately, the average 5-year survival rate has remained relatively stable at 15% over many decades, due to minimal improvements in early detection and treatment. Non-invasive assays for detection of lung cancer at a curable stage could offer the best therapeutic option for these patients. Although promising, imaging techniques such as low-dose helical computed tomography are expensive and potentially associated to increased risk due to ionizing radiation exposure. Blood-based biomarker assays are a potentially important alternative non-invasive method to screen for lung cancer. Technological advances in methods of blood collection and RNA stabilization have only recently increased our ability to detect transcript levels in gene expression studies of human blood samples. Recent studies of gene expression from blood cells have successfully identified gene signatures for diverse exposures (e.g., tobacco smoking (2) or benzene (3)), and health conditions, including autoimmune disorders (4–6), inflammatory diseases (7), and cancer (8–11). In our study, we first compared gene expression changes in blood between adenocarcinoma cases and non-cancer controls to select the genes whose expression mostly differentiated cases from controls. We then compared this signature in paired adenocarcinoma vs. non-involved lung tissue samples to identify the subset of genes differentiating both cases/controls (blood samples) and tumor/non-tumor (tissue samples). These expression changes could be specifically due to early development of cancer. Finally, we validated the overlapping gene expression signature in additional blood-based and tissue-based independent studies. If confirmed in prospective studies, gene expression changes from blood tests can provide a useful tool for the early detection of cancer in at-risk individuals.
Our study design included three phases. (i) First, we aimed to identify molecular changes in blood due to cancer, by comparing stage I adenocarcinoma cases (n=26) to controls (n=80). We restricted the analysis to stage I cases to focus on early molecular changes not affected by systematic metabolic disruption such as weight loss or other sequelae of advanced disease. We then verified whether these gene changes in PWB were present also in later stages (n=47). Since tobacco smoking is the most important risk factor for lung cancer (12) and has been associated with lung cancer progression (13), we also explored potentially distinct gene signatures by smoking groups. (ii) We then compared the blood-related gene expression signature distinguishing stage I cases from controls with the gene expression signature differentiating fresh frozen paired tumor versus non-involved tissue samples in a subgroup (n=15) of the same stage I cases. With this comparison we aimed to identify expression changes in PWB due to lung cancer that paralleled changes in the target organ. (iii) Finally, we sought to validate the main results using a) qRT-PCR analysis from PWB for all identified genes in additional 82 stage I adenocarcinoma patients and 130 controls from the same population and b) microarray gene expression data for all identified genes in 54 lung adenocarcinoma and non-involved paired tissue samples from a previously published independent study (14).
Individuals with lung adenocarcinoma (n=73 for the microarray experiment; n=82 for qRT-PCR validation) and healthy controls (n=80 for the microarray experiment; n=130 for qRT-PCR validation) were randomly sampled from a large, well-defined population-based case-control study, the Environment And Genetics in Lung cancer Etiology (EAGLE) study (15–21), including 2,100 consecutive incident lung cancer cases and 2,120 controls (all Caucasians) from Italy. Selected cases had histologically confirmed primary adenocarcinoma of the lung including all stages and controls were matched to cases by age, sex, and smoking status (never, former, and current smoking). For the validation set we focused on current smoker stage I cases and controls. Detailed subjects’ characteristics are described in Table 1.
The study was approved by the Institutional Review Board (IRB) of each participating institutions in Italy and by the National Cancer Institute, Bethesda, MD. All participants signed an informed consent.
PWB was collected for all EAGLE participants (after lung cancer diagnosis and before treatment for cases, and at enrollment for controls) using the Paxgene® Blood RNA System (PreAnalytiX, Hombrechtikon, CH) containing a proprietary solution that reduces RNA degradation and gene induction (22;23). Fresh lung tissue samples were snap-frozen within 20 minutes of surgical resection.
Data on microarray gene expression from PWB were obtained using the Affymetrix GeneChip® HG-U133A v2.0. After exclusion of two samples with poor quality profile (see quality assessment in Supplemental Material 1), the remaining 162 samples were processed and normalized with the Robust Multichip Average (RMA) method. Corresponding CEL files and information conform to the MIAME guidelines are publicly available on the GEO database (accession number GSE20189). Nine subjects were excluded after data normalization because of reclassification to non-adenocarcinoma morphology during histologic review. The final analyses were based on 73 adenocarcinoma cases and 80 controls. All 22,277 probe sets based on RMA summary measures were used in the analyses.
The detailed description of the gene expression examination of lung tissues in lung cancer cases in EAGLE (also based on the Affymetrix HG-U133A GeneChip®) and sample inclusion and exclusion criteria have been published previously (24). For the present study we used data from paired tumor and non-involved lung tissue samples from 15 of the same stage I adenocarcinoma cases included in the PWB based study.
The validation lung tissue set consisted of 27 tumor and 27 non-involved paired lung tissue samples from a previously published independent study (14). Details of the specimens, mRNA processing and hybridization, and data access are described in the relative publication (14).
We followed the procedure described in Hu N et al. (25). Briefly, RNA quality and quantity was determined using the RNA 6000 Labchip/Aligent 2100 Bioanalyzer. RNA purification was performed according to the manufacturer’s instructions (Qiagen Inc.). After reverse transcription of RNA, all real-time PCR reactions were performed using an ABI Prism 7000 Sequence Detection System with the designed primers and probes for target genes and an internal control gene, glyceraldehyde-3-phosphate dehydrogenase (GAPDH). Each sample for each gene was run in triplicate. Quantitative methods require that PCR efficiencies be similar for all genes and ≥90%. Efficiency was measured using a standard curve generated by serial dilutions of the RNA as described in http://docs.appliedbiosystems.com/search.
(i) A two sample t-test was conducted to test whether blood RNA expression differed between cases and controls (overall and stratified by stage and by smoking status). Age, sex, and smoking variables were similarly distributed across the groups (Table 1) and were not associated with the expression of the 61 selected gene targeting probes (gene-probes) among controls or cases. Analyses adjusted or unadjusted for these factors provided almost identical results. Unadjusted results are shown throughout the paper. We used the Benjamini-Hochberg procedure (26) to calculate the False Discovery Rate (FDR) to adjust for the ~22,000 comparisons and only further considered results with a maximum FDR≤0.1 (based on single gene-probe p-value threshold of 0.001). In addition, only gene-probes with a fold change (FC) ≤0.66 for down-regulated gene-probes or ≥1.5 for up-regulated gene-probes were considered for follow-up in subsequent analyses. (ii) Since significantly fewer hypotheses (61 probes) were tested in the following analyses, less stringent significance criteria were applied (p-value <0.005). For analyses of tumor versus non-involved paired tissues from the same subjects, a linear mixed effects model was used to account for intra-person correlation. Gene-probes with p-value <0.005 and same FC direction and intensity (i.e., FC ≤0.66 or ≥1.5) as in the case/control blood RNA comparison were selected for validation analyses. (iii) To validate the significant results, we analyzed: a) the qRT-PCR gene expression PWB-based data using the 2−ΔΔCt method (27) to compare cases to controls and b) the microarray gene expression tissue-based data using linear mixed effects model to compare tumor to non-involved paired lung tissue samples. In addition, receiver operating characteristic (ROC) analysis was performed on the PWB-based validation data and the area under the curve (AUC) was estimated to assess the accuracy of the identified biomarkers, both individually and combined, in discriminating between lung cancer patients and controls.
All statistical analyses were conducted using R program language v2.10.
(i) We compared mRNA expression from PWB in stage I adenocarcinoma cases versus controls, in the overall sample and stratified by smoking categories (Table 2). Two significant gene signatures in stage I cases were detected: one in the combined smokers and non smokers (FDR=0.10) and the second among current smokers only (FDR=0.15). No significant results were found within subsets of former or never smokers (FDR=0.97 and 1.00, respectively). However, gene expression changes significant in the analysis among current smokers showed similar, although not significant, trends in the analyses among never and former smokers (data not shown). At the same time, the analysis among current smokers revealed distinct alterations, which might be particularly important for individuals who smoke. Thus, for the comparison of stage I cases vs. controls, we considered both results from all subjects and from current smokers only (221 and 144 gene-probes, respectively, 81 overlapping between the two). To increase specificity, we restricted the successive analyses to gene-probes with FC ≤0.66 or ≥1.5. The resulting 25 down-regulated gene-probes (20 genes), and 36 up-regulated genes-probes (30 genes), are shown in the heatmap of Figure 1 and Supplemental Material 2. In general, fold changes were stronger in the analysis restricted to current smokers than in the overall analysis. Since there was no significant difference between cigarette per day or cumulative pack-years between cases and controls (Table 1) and the analysis adjusted by these covariates provided almost identical results, our findings are unlikely due to differences in smoking quantity between cases and controls. We verified whether the identified 61 gene-probes were also differentially expressed between cases and controls in late stage disease. FCs were consistently stronger in the analysis limited to stage I cases, but had concordant directions in all groups analyzed (Figure 3).
(ii) We aimed to identify changes in gene expression related to early stage lung cancer that are detectable in both blood cells and lung tissue cells. Thus, for the 61 gene-probes (50 genes) in the analysis of stage I patients and controls (Figure 1 and Supplemental Material 2), we compared gene expression in tumor versus paired non-involved lung tissue samples in 15 stage I adenocarcinoma cases. We found that 10 probes from 8 genes (TGFBR3, RUNX3, TRGC2, TRGV9, TARP, ACP1, VCAN, and TSTA3) were differentially expressed (p-values ≤0.003) in tumor compared to non-involved lung tissue samples and in the same direction and intensity as in stage I adenocarcinoma cases compared to controls (Figure 2).
(iii) We validated the PWB-based gene expression differences in stage I cases compared to controls using qRT-PCR measurements of RNA extracted from PWB of additional 82 stage I adenocarcinoma patients and 130 controls from EAGLE. Each gene was covered by a single ABI probe with the exception of TARP, covered by both the TRGC2 and the TRGV9 ABI probes, due to overlap between these 3 genes. Results were strongly confirmed for all examined genes: RUNX3, TGFBR3, TRGC2/TARP, and TRGV9/TARP were significantly down-regulated in stage I lung cancer patients compared to controls (FCs = 0.6, 0.5, 0.5, 0.6, P-values = 1.0×10−7, 1.4×10−8, 3.4×10−7, 2.6×10−6, respectively) and VCAN, ACP1 and TSTA3 were significantly up-regulated in stage I lung cancer patients compared to controls (FCs = 1.2, 1.2, 1.3, P-values = 5.0×10−3, 5.0×10−3, 3.0×10−3, respectively). We then validated gene expression differences between tumor and non-involved lung tissue samples for all 8 genes using microarray gene expression data from a previously published dataset (14), which included 27 adenocarcinoma and non-involved paired lung tissue samples. The direction of changes was 100% consistent with our original finding: RUNX3, TGFBR3, TRGV9, TARP, and TRGC2 were significantly down-regulated in tumor compared to non-involved tissues (FCs = 0.7, 0.2, 0.3, 0.6, 0.5, P-values = 0.06, 3.0×10−11, 4.0×10−7, 3.4×10−5, 2.5×10−6, respectively) and VCAN, ACP1 and TSTA3 were significantly up-regulated in tumor compared to non-involved tissues (FCs = 2.6, 1.5, 2.5, P-values = 0.002, 5.7×10−5, 3.8×10−9, respectively). We evaluated the ability of PWB-based expression of each gene to discriminate lung cancer patients from controls in the validation set by means of receiver operating characteristic (ROC) curves (Figure 4). The area under the curve (AUC) ranged from 0.55 (95% CI = 0.46–0.64) for ACP1 to 0.73 (95% CI = 0.66–0.81) for TGFBR3 (Figure 4), thus indicating a reasonable discrimination power between lung cancer cases and controls for most genes when considered individually. In addition, a combination of all markers based on a logistic regression model showed the best diagnostic accuracy with an AUC of 0.81 (95% CI=0.74–0.87, red ROC curve in Figure 4).
We identified a gene expression signature from blood samples consisting of 8 genes (RUNX3, TGFBR3, TRGC2, TRGV9, TARP, ACP1, VCAN and TSTA3) that differentiates stage I lung adenocarcinoma cases from controls and mirrors cancer-related gene expression changes in the target tissue. Results were validated in additional independent sets of tissue-based and blood-based gene expression analyses of adenocarcinoma cases and controls. Although present in all stages, expression changes were weaker in advanced stages, possibly because of secondary changes due to the spread of the disease. Similarly, the changes were stronger in current smokers but present in all smoking categories. The accuracy in discriminating between stage I lung adenocarcinoma cases and controls was good for most genes when considered separately, in particular those that were down-regulated between cases and controls. A multiplex model based on the expression of all 8 genes combined showed a high diagnostic accuracy of 81% (Figure 4). If further validated in prospective studies using PWB of cases drawn prior to lung cancer diagnosis (28), this gene expression signature may be used as a blood-based biomarker for early detection of lung adenocarcinoma in heavy smokers at high risk of lung cancer. We validated its use in current smokers. Further study in never and former smokers is warranted. Moreover, it will be important to test the identified biomarkers in other lung cancer histologies.
The identified genes are promising with regard to potential mechanistic relevance. RUNX3 (runt-related transcription factor 3), down-regulated in our analyses and with an AUC of 0.69, is involved in the negative regulation of epithelial cell proliferation, functions as a tumor suppressor, and is frequently deleted or transcriptionally silenced in cancer. Hypermethylation of RUNX3 has also been associated with the evolution of lung cancer (29) and specifically of lung adenocarcinoma (30). In addition, higher protein expression of RUNX3 has been associated with increased survival from lung adenocarcinoma (31). TGFBR3 (transforming growth factor beta receptor III) encodes a glycoprotein that binds TGFB, a cytokine that modulates several tissue development and repair processes. TGFBR3 is the TGF-beta component most commonly down-regulated at both the message and protein levels in several cancers (32–36), including non-small cell lung cancer (37). Our study is the first to show down-regulation of TGFBR3 mRNA expression in both blood and tumor tissue cells of lung adenocarcinoma patients. TGFBR3 showed the highest accuracy among the single gene models in discriminating cases from controls (AUC = 0.73). TRGC2 (T cell receptor gamma constant 2), TRGV9 (T cell receptor gamma variable 9), and TARP (T cell receptor gamma alternate reading frame protein) are colocalized at chromosome locus 7p14.1, close to the 7p14.3 chromosomal region that frequently shows allelic loss in non-small cell lung cancer (38). TARP is embedded within the TCR gamma locus and cDNA that detect TCR gamma mRNA also detect TARP mRNA. Accordingly, probes in TRGC2, TRGV9, and TARP showed very similar results in our study. TRGV9 cells have been shown to contribute to the natural immune surveillance against colon cancers (39). TARP has been previously studied as a prostate-specific gene and an androgen-regulated protein that may carry out its biological functions via action on mitochondria (40). Down-regulation in cases with respect to controls and in tumor compared to non-involved tissues of TRGC2, TRGV9, and TARP points to an immune-related alteration as a possible contribution to lung adenocarcinoma development. Case-control discrimination based on TRGC2, TRGV9, and TARP was also good (average AUC = 0.70). ACP1 (acid phosphatase 1) gene, up-regulated in our analysis, is polymorphic and encodes at least two electrophoretically different isozymes. An increase of fast isozyme concentration increases cancer cells’ invasiveness, whereas a decrease of slow isozyme concentration in cancer results in cancer cell proliferation (41). In the validation set ACP1 showed the poorest accuracy in discriminating cases and controls (AUC = 0.55). VCAN (versican) encodes a protein involved in cell adhesion, proliferation, migration, angiogenesis, tissue morphogenesis and maintenance. VCAN was initially identified in cultures of lung fibroblasts (42) and has been recognized to play a role in the invasion of several cancers (43) including lung cancer (44). VCAN mRNA expression was up-regulated in both lung tumor tissue and PWB of adenocarcinoma cases in our study. TSTA3 (tissue specific transplantation antigen P35B) gene, also up-regulated in our analysis, is involved in the expression of many glycoconjugates. Intriguingly, TSTA3 is located in chromosomal region 8q24, which contains several polymorphic variants recently associated with several cancers (45–47). VCAN and TSTA3 also showed a reasonable performance in discriminating between cases and controls (AUC = 0.61 and 0.59, respectively). In addition to the described eight genes, we also identified 42 additional genes whose expression in PWB distinguishes stage I lung adenocarcinoma from controls (Figure 1 and Supplemental Material 2) and was stronger among subjects who currently smoked. If further confirmed in additional blood-based analyses, these genes could also contribute to the detection of early lung adenocarcinoma lesions.
In conclusion, gene expression changes from peripheral blood samples can differentiate early stage lung adenocarcinoma cases from controls and resemble gene expression changes in early stage lung adenocarcinoma tissue. This finding suggests that early processes of lung adenocarcinoma development may lead to systemic alterations that can be detected in peripheral blood tests. Gene expression from PWB can provide an important tool for the identification of early detection markers of cancer in the future.
Financial support: This research was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD.
We would like to thank the EAGLE participants and study collaborators listed on the EAGLE website (http://eagle.cancer.gov/).
Conflicts of interest: The authors declare no conflicts of interest.