|Home | About | Journals | Submit | Contact Us | Français|
Lung cancer is the leading cause of cancer-related mortality in both men and women throughout the world. The need to detect lung cancer at an early, potentially curable stage, is essential and may reduce mortality by 20%. The aim of this study was to identify distinct proteomic profiles in bronchoalveolar fluid (BALF) and plasma that are able to discriminate individuals with benign disease from those with non-small cell lung cancer (NSCLC).
Using label-free mass spectrometry analysis of BALF during discovery-phase analysis, a significant number of proteins were found to have different abundance levels when comparing control to adenocarcinoma (AD) or squamous cell lung carcinoma (SqCC). Validation of candidate biomarkers identified in BALF was performed in a larger cohort of plasma samples by detection with enzyme-linked immunoassay.
Four proteins (Cystatin-C, TIMP-1, Lipocalin-2 and HSP70/HSPA1A) were selected as a representative group from discovery phase mass spectrometry BALF analysis. Plasma levels of TIMP-1, Lipocalin-2 and Cystatin-C were found to be significantly elevated in AD and SqCC compared to control.
The results presented in this study indicate that BALF is an important proximal biofluid for the discovery and identification of candidate lung cancer biomarkers.
There is good correlation between the trend of protein abundance levels in BALF and that of plasma which validates this approach to develop a blood biomarker to aid lung cancer diagnosis, particularly in the era of lung cancer screening. The protein signatures identified also provide insight into the molecular mechanisms associated with lung malignancy.
Lung cancer is the most common cancer worldwide, in 2012 it contributed to 13% of the total number of new cases diagnosed and is the most common cause of cancer-related mortality, accounting for more than 1.4 million deaths per year globally . The overall prognosis remains poor with just over one in eight lung cancer patients living for five-years or more after their diagnosis. Approximately 85% to 90% of patients with lung cancer have had direct exposure to tobacco , , other risk factors include environmental smoke exposure and occupational exposure to agents such as asbestos .
Lung cancer is categorised into two groups, small cell (SCLC) and non-small cell (NSCLC). The majority of cases are NSCLC (85%) of which 40% are adenocarcinoma (AD), 25–30% squamous cell carcinoma (SqCC), 10–15% large cell carcinoma (LCC), and then rarer variants such as mixed/undifferentiated pulmonary carcinomas . Substantial progress has been made in our understanding of the biological processes and mutations that cause lung cancer, this has led to the development of targeted therapy for lung cancer with some marked improvement in survival in selected groups. Advances in stereotactic radiotherapy for early stage disease are also a promising new treatment modality for those who are unfit for surgery, which has demonstrated excellent control of local disease , , . Despite this array of new targeted treatments, cure remains elusive for the vast majority of patients diagnosed with lung cancer .
Approximately 85% of patients with lung cancer are symptomatic at presentation, the remainder are detected by interpreting chest radiographs performed for an unrelated health issue . Computed tomography (CT) screening of high-risk individuals, particularly smokers, is helping to detect the disease in its early, more-curable stages . Screening of asymptomatic individuals has been shown to decrease mortality by 20% in those aged 55–74 years who are current smokers or quit within the last 15 years . Lung cancer screening has now been endorsed by a number of different medical societies and is available to many in the United States , , . However concerns remain in relation to the high false-positive rate of screening, unnecessary interventions, and its overall cost effectiveness despite being implemented . Efforts to further risk stratify patients clinically beyond those criteria employed in the National Lung Cancer Screening, along with technological developments in lung nodule volume measurement, may ultimately reduce the number and frequency of low dose CT scans required for population screening , . Developing a lung cancer biomarker to complement radiological imaging is a strategy that may reduce false-positive and false-negative screening rates, improve cost-effectiveness, and aid the earlier detection of this disease .
Biomarkers circulating in blood could constitute the gold standard for non-invasive cancer diagnostics . Blood is the ideal final test medium, however due to the high dynamic range, identification of candidate protein biomarkers remains difficult. This high dynamic range, spanning 10–12 orders of magnitude, is a result of abundant proteins, such as albumin, representing more than 99% of the total bulk mass of protein content . Fractionation techniques, such as immunodepletion, can reduce this dynamic range, however biomarker discovery in blood remains a challenge , .
One approach to overcome this problem is to analyse more proximal biofluids such as saliva, urine, exhaled breath condensate and bronchoalveolar lavage fluid (BALF) , , , , . Proximal biofluids have a much reduced dynamic range of protein abundances and in some cases are in direct contact with the site of the disease . BALF is routinely sampled during bronchoscopy in individuals with suspected lung cancer, it contains a wide variety of cellular material such as macrophages and neutrophils, a large number of proteins produced by epithelial and inflammatory cells, and tumour cells if present .
In the present study, our aim was to identify altered levels of candidate protein biomarkers in BALF samples from individuals with lung cancer compared to a control group diagnosed with benign nodules or sarcoidosis. Using quantitative mass spectrometry, we identified significantly changed proteins and catalogued their biological processes and molecular functions. With this information, a number of candidate biomarkers were selected for verification and initial validation in blood samples from lung cancer and control groups. This study may enhance our knowledge of the proteomic profile induced in the lung microenvironment by the presence of tumour cells, and may help elucidate the protein signature created by the tumour cells themselves. Furthermore, this data may facilitate the identification of a useful clinical biomarker panel for the diagnosis and monitoring of lung cancer in the future.
To remove debris and cells, BALF samples were centrifuged at 4000 × g for 15 min at 4 °C. Proteins from the resulting supernatants were concentrated using 5 kDa Amicons (Millipore) by centrifugation at 4000 × g for 45 min at 4 °C. 200 μl of the resulting retenate was treated using the ReadyPrep 2-D clean-up kit (Bio-Rad) to precipitate protein. The isolated BALF proteins were resuspended in buffer containing 8 M urea/50 mM NH4HCO3/0.1% ProteaseMax. The protein amount was estimated using an RC/DC protein assay from Bio-Rad . Mass spectrometry analysis was performed according to a previously optimised .
Progenesis label-free LC-MS software version 3.1 from Non-Linear Dynamics (Newcastle upon Tyne, UK) was used to process the raw data generated from LC-MS/MS analysis. Data alignment was based on the LC retention time of each sample, allowing for any drift in retention time given and adjusted retention time for all runs in the analysis. A reference run was established with the sample run that yielded most features (i.e. peptide ions). The retention times of all of the other runs were aligned to this reference run and peak intensities were then normalized . A number of criteria were applied to ensure proper identification of BALF derived proteins, including an ANOVA p-value between experimental groups of ≤ 0.05, fold change ≥ 2 and proteins with ≥ 2 peptides matched .
Human HSP70/HSPA1A DuoSet ELISA; DY1663-05, Human Cystatin C DuoSet ELISA; DY1196, Human TIMP-1 DuoSet ELISA; DY970-05 & Human Lipocalin-2/NGAL DuoSet ELISA; DY1757 were used for the verification study (R&D Systems, Oxon, UK). Each of these ELISA assays was performed according to their individual manufacturer's protocol and guidelines. The concentration of each protein in the serum samples was measured by comparing the optical density (OD) using a microplate reader (Bio-Tek). Standard curves were calculated for each ELISA kit.
Four proteins (Cystatin-C, TIMP-1, Lipocalin-2 and HSP70/HSPA1A) were selected as a representative group from the discovery phase experiments in BALF to verify in patient plasma samples. Commercial available ELISA kits were available to evaluate the abundance of these proteins in the verification plasma sample cohort. Cystatin-C, TIMP-1, Lipocalin-2 and HSP70/HSPA1A passed the following criteria for selection: p-value ≤ 0.05, protein fold change > 2 and proteins with ≥ 2 peptides matched. Label-free mass spectrometry analysis of BALF samples found that Cystatin-C was elevated in AD compared to SqCC (4.8-fold; p-value = 4.89E-08) and elevated in SqCC compared to control (4.5-fold; p-value = 0.01) (Tables – supplemental). TIMP-1 was discovered to be elevated in AD compared to SqCC (2.2-fold; p-value = 0.0001), elevated in SqCC compared to control (2.2-fold; p-value = 0.02) and elevated in AD compared to control (2-fold; p-value = 2.39E-05) (Tables – supplemental). Lipocalin-2 was found to have an increased abundance in SqCC compared to control (2.2-fold; p-value = 0.006) (Tables – supplemental). HSP70/HSPA1A was increased in AD compared to SqCC (2.2-fold; p-value = 0.001) and in control compared to SqCC (2.1-fold; p-value = 0.002) (Tables – supplemental).
Blood is an accessible, acceptable, and widely used biological sample for identification of biomarkers of disease . However, one of the greatest challenges in proteomic analysis of both serum and plasma samples is the wide range of concentration of different proteins . The dynamic range of proteins in blood limits the ability to examine the blood proteome for discovery due to the presence of a few proteins at very high concentrations , .
Arising from the inherent analytical challenges related to blood proteomics, proximal fluids have gained increasing attention for conducting candidate protein biomarker discovery . Compared to distal biofluids (e.g. blood), proximal biofluids are less complex and are likely to be significantly enriched in potential biomarker candidates due to their vicinity to the site of disease , . Once a candidate biomarker is discovered in a proximal biofluid, targeted verification and validation can then be performed in blood.
BALF is a mixture of different cell types as well as a wide variety of soluble components such as phospholipids, nucleic acids, peptides and proteins . The utility of BALF has been exploited for many years in clinical research, with recent technological advances permitting detailed proteomic profiling of the protein/peptide signature in particular lung diseases. This approach provides a rich source of candidate biomarkers, in addition to providing insight into the complex pathological mediators associated with lung diseases at the molecular level , .
The data presented in this study demonstrates a marked increase in the abundance of proteins involved in cellular and metabolic processes in patients with lung cancer compared to controls. In agreement with our findings, Almatroodi et al. published a list of proteins found to be increased in BALF from primary lung adenocarcinoma compared to controls using liquid chromatography-mass spectrometry . Of the 33 proteins consistently overexpressed in lung adenocarcinoma samples compared to controls presented in that study, ACTN4, ANXA2, CLIC1, GRP78, H4, LKHA4, S10A8, SAMH1 were also found to be of greater abundance in AD compared to control groups in this study.
Following selection of four candidate proteins for verification in plasma, TIMP-1, Lipocalin-2 and Cystatin-C were all found to be significantly elevated in NSCLC compared to control. This indicates that there is a good level of agreement between proteins found to have different abundance levels in BALF (cancer v control) and plasma samples (cancer v control), furthermore the increase was a consistent finding irrespective of cancer subtype, i.e. AD or SqCC.
`TIMP-1 was discovered to be elevated in SqCC compared to control and AD compared to control when analysing BALF samples by mass spectrometry. Verification of these results in plasma samples found that TIMP-1 had significantly higher levels in AD and SqCC compared to the control group as measured by ELISA. TIMPs are natural inhibitors of matrix metalloproteinases (MMPs) present in most tissues and body fluids . By inhibiting MMPs activities, they participate in tissue remodelling of the extracellular matrix. The balance between MMPs and TIMPs activities is involved in both normal and pathological processes such as wound healing, tissue remodelling, angiogenesis, tumour development and metastasis , . In cancer, MMP over-expression is thought to play an important role in tumour invasion and metastasis. The major function of TIMP-1 is as an inhibitor of MMPs, but several independent roles in tumour development are likely , .
Neutrophil gelatinase-associated lipocalin (NGAL), also known as Lipocalin-2, is a 178-amino acid protein which exists in three molecular forms, a 25-kDa monomer, a 45-kDa homodimer, and a 135-kDa heterodimer. A number of malignant tumours consistently overexpressed NGAL with increased concentration in blood, urine, and other biologic fluids . In this study, BALF levels of Lipocalin-2 were increased in SqCC compared to control with mean concentration values significantly elevated in AD and SqCC compared to control during verification analysis of plasma samples. NGAL is also commonly associated with tumour size, stage, and invasiveness . Ricci et al. found that there was an association between NGAL serum levels and bladder cancer stage. The author suggests that serum NGAL may be a useful non-invasive biomarker to provide clinical information for bladder cancer disease management . Studies investigating the role of NGAL in lung cancer have discovered high levels of NGAL associated with lung adenocarcinoma at both the protein and mRNA level , .
Cystatin C is a small 13 kDa protein that is a member of the cysteine proteinase inhibitor family. Cystatin-C was elevated in SqCC compared to control during discovery phase analysis of BALF, this was confirmed by ELISA in plasma as it was significantly elevated in AD and SqCC compared to control. Cystatin-C is likely to be biologically relevant as an imbalance between cysteine proteases and their inhibitors arises in malignancy, influencing tumour cell invasion and metastasis . Microarray analysis has revealed that cystatin C was one of the most highly upregulated genes in multiple myeloma . Additionally, Kos et al. demonstrated that a significant correlation between increased serum cystatin C and malignant progression in melanoma and colorectal cancer exists .
While the main focus of this investigation was to identify proteins with different abundance levels in BALF and verify these results in plasma samples, a significant number of proteins were found to be changed in BALF samples when comparing AD to SqCC. A finding of particular interest was that folate receptor alpha was increased 14-fold in AD compared to SqCC. Previous work has shown that folate receptor alpha expression had a high discriminatory capacity for lung adenocarcinomas versus squamous cell carcinomas as determined by immunohistochemistry , . The utility of MS-based quantitative proteomics as an additional piece of information to facilitate clinicians in the classifications of tumours is a significant area in development, as more treatments are specifically aimed at squamous and non-squamous cell carcinoma. Also, a better understanding of the NSCLC histological subtypes may provide valuable insight in support of developing targeted therapeutics that are specific for precisely classified lung tumours. Several new targeted therapies have been recently approved for non-squamous NSCLC that inhibit Vascular Endothelial Growth Factor (VEGF), Epidermal Growth Factor Receptor (EGFR), Anaplastic Lymphoma Kinase (ALK), and Reactive Oxygen Species (ROS-1) . As more driver mutations are discovered and therapeutic compounds developed to target these abnormal pathways, precise diagnosis of lung cancer and its subtypes will be crucial . Protein expression and signalling in either BALF or plasma may have clinical relevance in the future through its ability to precisely differentiate subtypes and facilitate lung cancer diagnosis.
The variety of patients treated in the clinic, tumour heterogeneity and treatment-related factors will significantly influence the protein constituents of BALF. This is an important limitation in the identification of accurate biomarker profiles, and necessitates the development of biomarker panels and stratification of patients based on previous medical history in order to facilitate the most accurate results based on the measurement of protein levels. Proteins are dynamic and constantly turned over in cells, characteristics that make them very sensitive to factors such as disease severity and previous/current medication history. A single BALF-based biomarker is unlikely to have sufficient sensitivity or specificity for use as a stand-alone test, therefore a panel of unrelated biomarkers may be more effective. When ambiguous results are encountered from biomarker panel testing it may indicate technical difficulties with measuring multiple biomolecules. It is also likely that some complex diseases are not amenable to monitoring with biomarkers and therefore other approaches such as scanning-based procedures may be more suitable.
The BALF proteome is a very complex mixture with many of the top protein hits by mass spectrometry found to be well-established blood-based proteins, namely albumin, haptoglobin and immunoglobin proteins. However, when compared to un-fractioned plasma/serum digests, BALF does contain more low abundant protein identifications and therefore may facilitate the analysis of more cellular secreted/shed proteins associated with disease. Watson and co-workers presented data showing the differences in the proteolytic fingerprints of serum and BALF in guinea pigs . This is very important in the context of protein degradation and subsequent measurement of candidate biomarkers in serum compared to BALF and the interpretation of data from these two pools of bodily fluids.
In conclusion, the findings of this study indicate that the BALF proteome is significantly altered in NCSLC samples compared to control. In addition, significant differences in the protein profile between AD and SqCC in BALF could also be used to help distinguish lung cancer sub-types in this population. The elevation of specific proteins in patients diagnosed with AD and SqCC may be used to develop a specific biomarker panel associated with each NSCLC sub-type. Certain candidate proteins, particularly Cystatin-C, TIMP-1 and Lipocalin, had altered abundance levels in BALF and correlated well with increased abundance in plasma. Proteins expressed at different levels in BALF or plasma may also provide insight into the molecular mechanisms associated with malignancy, particularly in relation to metabolism, cellular processing, and the immune response.
Overall the findings of this study support the use of BALF as a robust proximal biofluid for the discovery of candidate biomarkers in lung cancer. Our results demonstrate that BALF proteomic profiles are transferrable into blood, which is a promising finding in the search for non-invasive biomarkers to diagnose and monitor lung cancer.
This research work was funded by the Ministry of Higher Education and Scientific Research of Libya.
LC-MS facilities were funded by competitive awards from Science Foundation Ireland (12/RI/2346 (3)) and the Irish Higher Education Authority.
The following are the supplementary data related to this article.
List of differentially expressed proteins (complete list) when comparing control to AD BALF patient samples. The table includes information on accession number, peptide count, confidence score (XCorr), ANOVA (p-value), fold-change, highest/lowest condition and protein description.
List of differentially expressed proteins (complete list) when comparing control to SqCC BALF patient samples. The table includes information on accession number, peptide count, confidence score (XCorr), ANOVA (p-value), fold-change, highest/lowest condition and protein description.
List of differentially expressed proteins (complete list) when comparing SqCC to AD BALF patient samples. The table includes information on accession number, peptide count, confidence score (XCorr), ANOVA (p-value), fold-change, highest/lowest condition and protein description.