|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: LZ SK DC BK DTW. Performed the experiments: LZ HX. Analyzed the data: LZ HX HZ DE. Contributed reagents/materials/analysis tools: SK BK JG DA XY. Wrote the paper: LZ.
A sensitive assay to identify biomarkers using non-invasively collected clinical specimens is ideal for breast cancer detection. While there are other studies showing disease biomarkers in saliva for breast cancer, our study tests the hypothesis that there are breast cancer discriminatory biomarkers in saliva using de novo discovery and validation approaches. This is the first study of this kind and no other study has engaged a de novo biomarker discovery approach in saliva for breast cancer detection. In this study, a case-control discovery and independent preclinical validations were conducted to evaluate the performance and translational utilities of salivary transcriptomic and proteomic biomarkers for breast cancer detection.
Salivary transcriptomes and proteomes of 10 breast cancer patients and 10 matched controls were profiled using Affymetrix HG-U133-Plus-2.0 Array and two-dimensional difference gel electrophoresis (2D-DIGE), respectively. Preclinical validations were performed to evaluate the discovered biomarkers in an independent sample cohort of 30 breast cancer patients and 63 controls using RT-qPCR (transcriptomic biomarkers) and quantitative protein immunoblot (proteomic biomarkers). Transcriptomic and proteomic profiling revealed significant variations in salivary molecular biomarkers between breast cancer patients and matched controls. Eight mRNA biomarkers and one protein biomarker, which were not affected by the confounding factors, were pre-validated, yielding an accuracy of 92% (83% sensitive, 97% specific) on the preclinical validation sample set.
Our findings support that transcriptomic and proteomic signatures in saliva can serve as biomarkers for the non-invasive detection of breast cancer. The salivary biomarkers possess discriminatory power for the detection of breast cancer, with high specificity and sensitivity, which paves the way for prediction model validation study followed by pivotal clinical validation.
Early detection of breast cancer is the key to positive, long-lasting outcomes, thus reducing the suffering and cost to society associated with the disease . The high burden of breast cancer in women worldwide underscores the unmet potential of biomarker for early detection. A significant obstacle towards early detection of breast cancer is the development of methods that efficiently and accurately identify potentially affected individuals , .
Breast cancer has been among the earliest and most intensely-studied diseases using gene expression profiling and protein profiling technologies. The resulting molecular signatures help reveal the biological spectrum of breast cancers, providing diagnostic tools as well as prognostic and predictive gene signatures , . Breast cancer detection is currently based on physical examination and imaging (mammography, ultrasound, and MRI) , although emerging methods include direct examination of the cytomorphology of exfoliated cells , and the molecular analysis of tumor biomarkers in nipple aspirate fluid or in ductal lavage , , . In the last decade, biomarker discoveries for breast cancer detection have focused on blood and/or tissue, using proteomic , , , , , , transcriptomic , , , , , and genomic approaches , . In comparison to prognostic biomarkers , , , the development of detection biomarkers has been limited, mainly due to a lack of sensitivity and specificity for this clinical context , , . Most importantly, the use of tissue biomarkers for early detection will be limited to patients at very high risk because they rely on invasive procedures.
Recently, the study of salivary biomarkers has developed beyond oral diseases , , ,  to systemic diseases , , broadening the potential for systemic disease detection , , , , . Saliva-based translational research and technology is now at a mature juncture and can be evaluated to determine its utility for breast cancer detection. Explorative studies have evaluated the potential use of salivary proteins such as c-erbB-2, VEGF, EGF, and CEA in the initial detection and/or follow-up screening for the recurrence of breast cancer , , , , , . However, these investigations were not based on biomarker discoveries from saliva specimens, rather they were testing blood biomarkers in saliva . Here, we report the use of transcriptomic and proteomic approaches to discover and pre-validate biomarkers in saliva for the non-invasive detection of breast cancer. Our results demonstrate significant differences in salivary transcriptomic and proteomic profiles between breast cancer patients and controls. The discovered salivary biomarkers possess discriminatory power for the detection of breast cancer, with high specificity and sensitivity.
Schematic of the study design and demographic information of all subjects used for the discovery and pre-validation phases are shown in Figure 1 and Table 1, respectively. Transcriptomic profiling identified 1402 genes exhibiting >2 fold up-regulation, and 2247 genes exhibiting >2 fold down-regulation, in the saliva of breast cancer patients, relative to the matched controls (n=20, P<0.05). These transcriptomic changes were unlikely to be due to chance alone (χ2 test, P<0.0001), considering the false positive rate with P<0.05. Using a predefined criterion of a change in regulation >2-fold, and a more stringent cutoff of P<0.01, 358 up-regulated and 943 down-regulated transcripts were identified in the saliva of breast cancer samples. RT-qPCR was performed to verify the microarray results on the discovery sample set (n=20). The top 27 up-regulated candidates (Table S1) were selected based on p-value and fold-change (P<0.01, and >10-fold). The RT-qPCR results confirmed that the relative RNA expression levels of 11 up-regulated transcripts were consistent with the microarray. These verified transcriptomic biomarker candidates were then subjected to independent pre-validation by RT-qPCR using a cohort of 30 breast cancer patients and 63 controls (Figure 1). Eight up-regulated genes were pre-validated, showing significant differences between breast cancer and healthy controls (n=93, Table 2).
Proteomic profiling by 2D-DIGE revealed 35 up-regulated proteins/spots and 32 down-regulated proteins/spots in the saliva of breast cancer patients, relative to the matched controls (n=20). Twenty spots, 14 up-regulated (>1.5 fold) and 6 down-regulated (>1.5 fold), were selected for protein identification, resulting in the identification of 10 up-regulated and 4 down-regulated proteins (Table S1). Four proteins (carbonic anhydrase VI (CA6), psoriasin, transthyretin, and cyclophilin A) with available antibodies were subjected to verification using immunoblot on the discovery sample set. The levels of CA6 and psoriasin between cancer and control samples showed significant differences (p=0.012 and 0.014, respectively). These verified proteomic biomarker candidates were then independently validated by protein immunoblotting using the pre-validation cohort (30 breast cancer patients versus 63 controls). The level of CA6 showed a significant difference between breast cancer and healthy controls (n=93, Table 2).
Using logistic regression, the accuracy, sensitivity and specificity of 9-validated-biomarker combination on the pre-validation sample set (n=93) were 92% (86 of 93), 83% (25 of 30) and 97% (61 of 63), respectively (Figure 2A). Principle component analysis (PCA) of this 9-biomarker combination could separate the breast cancer patients from the controls along the first principal component (t-test, P-value=2.7E-15) (Figure 2B). None of the confounding factors (age, ethnicity, smoking status, menopausal status, and HRT treatment) significantly affected the validated biomarkers (Table 2). These indicate that cancer onset is a major source of variation in the expression of the validated biomarker. Furthermore, cross-disease comparisons showed that none of the validated mRNA biomarkers' expression was significantly altered in other salivary transcriptomic profiling studies, indicating their specificity for breast cancer detection (Table 3).
Early detection of breast cancer offers the promise of easier treatment (smaller surgeries, less radiation or chemotherapy) and improved survival. Conventional screening (physical examination and mammography) has a less-than-desirable sensitivity and specificity . There is a soaring need for new therapeutic strategies, as well as biomarkers that can achieve effective non-invasive early detection of breast cancer. Our long-term goal is to develop a saliva-based non-invasive tool for the early detection of breast cancer. We envision a clinical context in which a salivary test may enable clinicians to detect breast cancer earlier (by identifying patients warranting closer follow-up and additional imaging), and reduce the number of unnecessary biopsies (currently about 80% according to the American Cancer Society), in a cost-effective manner. The purpose of this study, which is an essential step toward attaining our long-range goal, is to evaluate the potential utility of salivary transcriptomes and proteomes for breast cancer detection. We applied two high-throughput technologies in order to assess 1) whether the salivary transcriptome and proteome profiles change with the onset of breast cancer, and 2) whether discriminatory biomarkers can be identified and validated. By addressing both questions, our profiling results, and further independent validation of the discovered biomarkers, will open new research directions and support the idea that saliva is a useful biomarker source for breast cancer detection.
The salivary transcriptome is a novel diagnostic alphabet we have explored for discovering breast cancer biomarkers. Salivary transcriptional profiling technology has been successfully applied for discovering detection biomarkers of resectable pancreatic cancer . Consistent with that study, high-throughput profiling revealed significant variations in gene signature profiles between the breast cancer patients and the controls, demonstrating that the salivary transcriptome is an informative biomarker source for systemic cancer detection. The gene ontology analysis could categorize the 1301 up/down-regulated genes (>2 fold up/down-regulation, P<0.01) into various biological processes based on their known roles or functions. The 1301 genes were enriched in functions related to metabolic processes (35.46%), biological regulation (30.31%), and regulation of biological process (28.24%) (Figure S1). Based on the microarray data of 358 up-regulated transcripts (>2-fold change, P<0.01), breast cancer patients (n=10) and matched controls (n=10) could be classified into two distinct groups using unsupervised clustering, indicating the discriminatory power of salivary mRNA biomarkers (Figure S2). Our aim with transcriptomic profiling is not to identify large numbers of differentially expressed genes; rather we seek to find a small number of truly differentially expressed genes that can be validated. In this study, eight out of 27 top up-regulated transcripts (P<0.01, and >10-fold) were pre-validated using an independent cohort, yielding a validation rate of 29.6% that is similar to one of our previous study for pancreatic cancer (validation rate, 24.5%) .
Proteomic profiling, without independent validation, has been recently performed for discovering salivary biomarkers using stimulated whole saliva . The results of our proteomic study overlap little with this previous proteomic profiling. This discrepancy could be due to the use of different disease types (invasive ductal carcinomas (IDC) versus ductal carcinoma in situ (DCIS)), different sample materials (unstimulated versus stimulated saliva), and different technical platforms. More importantly, we have conducted a pre–validation of the discovered protein biomarkers using an independent sample set. Interestingly, CA6, which was validated in our study, was also discovered in this previous proteomic profiling study using saliva samples from non-invasive breast cancer patients (DCIS) , indicating the potential of this biomarker for the early detection of breast cancer.
In order to obtain a more realistic estimate of the clinical utility of the validated biomarkers, and avoid the consequences of potential data overfitting, we employed leave-one-out cross-validation. The cross validation rate (cv.err) reflects a more accurate estimate of the true prediction accuracy of the biomarker. Except CA6, all comparisons have cross validation rates of ≤0.333, indicating that the validated biomarkers in general have high prediction accuracy (Table 2). Despite our moderate sample size, we appear to have identified biomarkers that significantly correlate with the presence of breast cancer.
Although the underlying relationships among systemic diseases and the saliva biomarkers are unclear, our recent study using mouse models has indicated that upon systemic disease development, cancer-specific changes occur in the salivary transcriptomic profiles . Stimulation of the salivary glands by mediators released from remote tumors plays an important role in regulating the salivary surrogate biomarker profiles . There may be extracellular communication between the ductal tissues of the breast and those of the salivary glands, since the histophysiology is very similar between these two distant tissues . Interestingly, all validated biomarkers were previously implicated in breast cancer or other cancers (Table 2). Further investigation into the mechanism of salivary biomarkers for systemic cancers is warranted.
In summary, our study has identified transcriptomic and proteomic biomarkers in saliva that have the potential to impact current diagnostic triage for breast cancer. The salivary biomarkers' discriminatory power paves the way for a PRoBE-designed definitive validation study . The critical feature of PRoBE design involves prospective clinical sample collection, before outcome ascertainment, from a study cohort that is relevant to the clinical application . Any biomarker test intended for FDA approval and clinical use should incorporate the PRoBE principles as early as possible, as these principles eliminate potential biases commonly seen at the discovery stage.
This study, which was approved by the UCLA and Cedars-Sinai Medical Center Institutional Review Boards (#06-07-043 and #3870, respectively), began sample collection in February 2007. Written informed consents and questionnaire data sheets were obtained from all patients who agreed to serve as saliva donors. The saliva bank for breast cancer project at the UCLA Dental Research Institute, in collaboration with the Cedars-Sinai Medical Center, has collected 178 saliva samples from subjects recruited from the Saul and Joyce Brandman Breast Cancer Center. Of these, 113 samples, including 40 breast cancer patients and 73 healthy control individuals (Table 1), were used for the discovery and pre-validation phases of this study. Inclusion criteria of cancer patients consisted of a confirmed diagnosis of breast cancer. Exclusion criteria of cancer patients included therapy/surgery and/or a diagnosis of other malignancies within 5 years prior to saliva collection. Exclusion criteria of control patients included a diagnosis of any malignancies within 5 years prior to saliva collection (Figure 1). The information on patient characteristics, such as age, ethnicity, smoking history, menopausal status, and hormone replacement therapy (HRT), is presented in Table 1. Unstimulated saliva samples were consistently collected, stabilized, and preserved as previously described  (Figure S3). The sample supernatants were reserved at −80°C prior to assay.
This study consisted of a discovery phase, followed by an independent preclinical validation phase. Of the 113 samples, 10 breast cancer samples and 10 matched control samples were used for the discovery phase. All breast cancer cases were invasive ductal carcinoma (IDC), the most common type of breast cancer. Biomarkers identified from the discovery studies were first verified using the discovery sample set. An independent sample set, including 30 breast cancer patients and 63 controls, was used for the biomarker pre-validation phase (Figure 1).
RNA was isolated from 330 µl of saliva supernatant using MagMax™ Viral RNA Isolation Kit (Ambion, Austin, TX). This process was automated using KingFisher mL technology (Thermo Fisher Scientific, Waltham, MA), followed by TURBO™ DNase treatment (Ambion, Austin, TX). Extracted RNA was linearly amplified using the RiboAmp RNA Amplification kit (Molecular Devices, Sunnyvale, CA). After purification, cDNA was transcribed and biotinylated using GeneChip Expression 3′-Amplification Reagents for in vitro transcription labeling (Affymetrix, Santa Clara, CA). Chip hybridization and scanning were performed at the UCLA microarray core facility. Using the MIAME criteria , all Affymetrix Human Genome U133 Plus 2.0 Array data generated in this study were uploaded to the GEO database , accession number GSE20266.
The analysis was performed using R 2.7.0 with samr and ROC packages . The Probe Logarithmic Intensity Error Estimation (PLIER) expression measures were computed after background correction and quantile normalization for each microarray dataset. Probeset-level quantile normalization was performed across all samples to make the effect sizes similar among all datasets. Finally, for every probeset, significance analysis of microarray (SAM) was applied to identify differential expression between the cancer and healthy control samples. The probesets were then ranked by the false discovery rate (FDR) corrected p-values.
The identified mRNA biomarkers were first verified by RT-qPCR using the discovery sample set (10 cancer versus 10 controls) as described previously . RT-qPCR primers were designed using Primer Express 3.0 software (Applied Biosystems, Foster City, CA) (Table S2). All primers were synthesized by Sigma-Genosys (Woodlands, TX), and the amplicons were intron spanning whenever possible. RT-qPCR was carried out in duplicate. Verified biomarkers were then assayed by RT-qPCR in the set of 93 independent samples (30 breast cancer patients versus 63 controls). Raw data were normalized by subtracting GAPDH Ct values from the biomarker Ct values to generate ΔCt. The Mann-Whitney rank sum test was used for between-group biomarker comparisons.
Two-dimensional difference gel electrophoresis (2D-DIGE) was performed by Applied Biomics (Hayward, CA). Briefly, by taking equal amounts of protein from each sample, 10 cancer samples and 10 control samples were pooled separately, with each pool containing 250 µg of proteins. The proteins in each pool were precipitated by methanol and labeled with Cy3 and Cy5, respectively, and then combined for 2D-DIGE. After loading the labeled samples, the isoelectric focusing (IEF, pH 3–10) was run following the protocol provided by Amersham BioSciences (Piscataway, NJ). The immobilized pH gradient (IPG) strips were rinsed in the SDS-gel running buffer before transferring onto 13.5% SDS gels. The fold change of the protein expression levels was obtained from in-gel DeCyder analysis (Amersham BioSciences). Spots with a fold-change larger than 1.5 on the gel were subjected to in-gel trypsin digestion. The digested tryptic peptides were then mixed with CHCA matrix (alpha-cyano-4-hydroxycinnamic acid) and spotted into wells of a MALDI plate for MALDI-TOF MS identification (ABI4800, Applied Biosystems, Foster City, CA).
Protein immunoblotting was used to verify and validate the proteomic biomarker candidates. Reduced protein samples (15 µg total protein per lane) were loaded onto a 10% Bis-Tris gel and run at 150 Volt for one hour. Pre-stained protein standard (Invitrogen, USA) was used to track protein migration. The proteins were transferred to a nitrocellulose membrane and blocked for one hour in 5% non-fat dry milk. After further washes in TBST wash buffer, the membrane was incubated with the primary antibody (Lifespan bioscience, Seattle, WA) at room temperature for two hours. The membrane was then washed in TBST wash buffer before applying the secondary antibody (Anti-mouse IgG, peroxidase-linked species-specific whole antibody from sheep, GE healthcare, Piscataway, NJ) for one hour at room temperature. Finally, the membrane was washed in TBST wash buffer and visualized using the ECL Plus detection kit (GE Healthcare, Piscataway, NJ). The signal intensity of the bands was measured using Image J software (NIH, Bethesda, MD, USA). The intensity of a band representing the protein of interest was divided by the intensity of its corresponding β-actin expression on the same blot for normalization.
Leave-one-out cross-validation was applied to assess the true accuracy of the model. In this procedure, each observation is iteratively taken out and the model is trained using all other observations. A prediction is then made on the left-out observation. The overall accuracy rate for each model is then the proportion of left out observations that are correctly predicted. To evaluate possible confounders for the markers versus cancer relationship, we examined factors such as age, ethnicity, smoking status, menopausal status, and HRT treatment. Linear regression model was constructed for each marker and used the factors cancer/normal and one of the above confounders.
The pre-validated breast cancer mRNA biomarkers were checked in other microarray studies that have been conducted in our laboratory on different diseases, including oral cancer , primary Sjögren's Syndrome (pSS) , pancreatic cancer , lung cancer, ovarian cancer, and type 2 diabetes. Briefly, P-value derived from Wilcoxon rank sum test were calculated for all breast-cancer-study-validated genes in other microarray datasets to check whether significant variation between breast cancer and controls also appeared in those disease datasets. After Bonferonni correction, variation was considered significant with p-values less than 0.006.
Gene ontology analysis of the up/down-regulated genes (1301 genes, >2 fold up/down-regulation, P<0.01).
Heatmap of 358 up-regulated transcripts based on microarray data (>2-fold change, P<0.01). Hierarchical clustering and gene function enrichment was performed using Euclidean distance metric and Average linkage method (unsupervised clustering). Breast cancer patients (n=10) and healthy controls (n=10) could be classified into distinct groups, indicating the discriminatory power of salivary mRNA biomarkers. The GEO database access number of all microarray experiments is GSE20266.
Protocol for saliva collection.
Biomarker candidates selected from transcriptomic and proteomic profiling.
Primers of 11 verified transcripts and GAPDH.
The authors thank the UCLA microarray core facility and Amersham BioSciences for technical support.
Competing Interests: DW is the co-founder of RNAmeTRIX. However RNAmeTRIX is a virtual company and currently has no income nor employs anyone. It does not fund the research at all. If pertinent, this does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials. The other authors disclosed no potential conflicts of interests..
Funding: This work was supported by grants from the National Institutes of Health (UO1DE016275 and R21CA126733) to DTW; and Innovative, Exploratory & Developmental Awards (IDEA) from California Breast Cancer Research Program (16IB-0004) to LZ. URLs: http://www.nih.gov and http://www.cbcrp.org. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.