PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1055837)

Clipboard (0)
None

Related Articles

1.  Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network 
BMC Bioinformatics  2010;11(Suppl 6):S8.
Background
Finding reliable gene markers for accurate disease classification is very challenging due to a number of reasons, including the small sample size of typical clinical data, high noise in gene expression measurements, and the heterogeneity across patients. In fact, gene markers identified in independent studies often do not coincide with each other, suggesting that many of the predicted markers may have no biological significance and may be simply artifacts of the analyzed dataset. To find more reliable and reproducible diagnostic markers, several studies proposed to analyze the gene expression data at the level of groups of functionally related genes, such as pathways. Studies have shown that pathway markers tend to be more robust and yield more accurate classification results. One practical problem of the pathway-based approach is the limited coverage of genes by currently known pathways. As a result, potentially important genes that play critical roles in cancer development may be excluded. To overcome this problem, we propose a novel method for identifying reliable subnetwork markers in a human protein-protein interaction (PPI) network.
Results
In this method, we overlay the gene expression data with the PPI network and look for the most discriminative linear paths that consist of discriminative genes that are highly correlated to each other. The overlapping linear paths are then optimally combined into subnetworks that can potentially serve as effective diagnostic markers. We tested our method on two independent large-scale breast cancer datasets and compared the effectiveness and reproducibility of the identified subnetwork markers with gene-based and pathway-based markers. We also compared the proposed method with an existing subnetwork-based method.
Conclusions
The proposed method can efficiently find reliable subnetwork markers that outperform the gene-based and pathway-based markers in terms of discriminative power, reproducibility and classification performance. Subnetwork markers found by our method are highly enriched in common GO terms, and they can more accurately classify breast cancer metastasis compared to markers found by a previous method.
doi:10.1186/1471-2105-11-S6-S8
PMCID: PMC3026382  PMID: 20946619
2.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value 
PLoS Medicine  2013;10(5):e1001453.
Background
Colon cancer (CC) pathological staging fails to accurately predict recurrence, and to date, no gene expression signature has proven reliable for prognosis stratification in clinical practice, perhaps because CC is a heterogeneous disease. The aim of this study was to establish a comprehensive molecular classification of CC based on mRNA expression profile analyses.
Methods and Findings
Fresh-frozen primary tumor samples from a large multicenter cohort of 750 patients with stage I to IV CC who underwent surgery between 1987 and 2007 in seven centers were characterized for common DNA alterations, including BRAF, KRAS, and TP53 mutations, CpG island methylator phenotype, mismatch repair status, and chromosomal instability status, and were screened with whole genome and transcriptome arrays. 566 samples fulfilled RNA quality requirements. Unsupervised consensus hierarchical clustering applied to gene expression data from a discovery subset of 443 CC samples identified six molecular subtypes. These subtypes were associated with distinct clinicopathological characteristics, molecular alterations, specific enrichments of supervised gene expression signatures (stem cell phenotype–like, normal-like, serrated CC phenotype–like), and deregulated signaling pathways. Based on their main biological characteristics, we distinguished a deficient mismatch repair subtype, a KRAS mutant subtype, a cancer stem cell subtype, and three chromosomal instability subtypes, including one associated with down-regulated immune pathways, one with up-regulation of the Wnt pathway, and one displaying a normal-like gene expression profile. The classification was validated in the remaining 123 samples plus an independent set of 1,058 CC samples, including eight public datasets. Furthermore, prognosis was analyzed in the subset of stage II–III CC samples. The subtypes C4 and C6, but not the subtypes C1, C2, C3, and C5, were independently associated with shorter relapse-free survival, even after adjusting for age, sex, stage, and the emerging prognostic classifier Oncotype DX Colon Cancer Assay recurrence score (hazard ratio 1.5, 95% CI 1.1–2.1, p = 0.0097). However, a limitation of this study is that information on tumor grade and number of nodes examined was not available.
Conclusions
We describe the first, to our knowledge, robust transcriptome-based classification of CC that improves the current disease stratification based on clinicopathological variables and common DNA markers. The biological relevance of these subtypes is illustrated by significant differences in prognosis. This analysis provides possibilities for improving prognostic models and therapeutic strategies. In conclusion, we report a new classification of CC into six molecular subtypes that arise through distinct biological pathways.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Cancer of the large bowel (colorectal cancer) is the third most common cancer in men and the second most common cancer in women worldwide. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year from this form of cancer—8% of all cancer deaths. The prognosis and treatment options for colorectal cancer depend on five pathological stages (0–IV), each of which has a different treatment option and five year survival rate, so it is important that the stage is correctly identified. Unfortunately, pathological staging fails to accurately predict recurrence (relapse) in patients undergoing surgery for localized colorectal cancer, which is a concern, as 10%–20% of patients with stage II and 30%–40% of those with stage III colorectal cancer develop recurrence.
Why Was This Study Done?
Previous studies have investigated whether there are any possible gene expression profiles (identified through microarray techniques) that can help predict prognosis of colorectal cancer, but so far, there have been no firm conclusions that can aid clinical practice. In this study, the researchers used genetic information from a French multicenter study to identify a standard, reproducible molecular classification based on gene expression analysis of colorectal cancer. The authors also assessed whether there were any associations between the identified molecular subtypes and clinical and pathological factors, common DNA alterations, and prognosis.
What Did the Researchers Do and Find?
The researchers used genetic information from a cohort of 750 patients with stage I to IV colorectal cancer who underwent surgery between 1987 and 2007 in seven centers in France. The researchers identified relevant clinical and pathological staging information for each patient from the medical records and calculated recurrence-free survival (the time from surgery to the first recurrence) for patients with stage II or III disease. In the genetic analysis, 566 tumor samples were suitable—443 were used in a discovery set, to create the classification, and the remainder were used in a validation set, to test the classification. The researchers also used information from eight public datasets to validate their findings.
Using these methods, the researchers classified the colon cancer samples into six molecular subtypes (based on gene expression data) and, on further analysis and validation, were able to distinguish the main biological characteristics and deregulated pathways associated with each subtype. Importantly, the researchers found that that these six subtypes were associated with distinct clinical and pathological characteristics, molecular alterations, specific gene expression signatures, and deregulated signaling pathways. In the prognostic analysis based on recurrence-free survival, the researchers found that patients whose tumors were classified in one of two clusters (C4 and C6) had poorer recurrence-free survival than the other patients.
What Do These Findings Mean?
These findings suggest that it is possible to classify colorectal cancer into six robust molecular subtypes that might help identify new prognostic subgroups and could provide a basis for developing robust prognostic genetic signatures for stage II and III colorectal cancer and for identifying specific markers for the different subtypes that might be targets for future drug development. However, as this study was retrospective and did not include some known predictors of colorectal cancer prognosis, such as tumor grade and number of nodes examined, the significance and robustness of the prognostic classification requires further confirmation with large prospective patient cohorts.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001453.
The American Cancer Society provides information about colorectal cancer and also about how colorectal cancer is staged
The US National Cancer Institute also provides information on colon and rectal cancer and colon cancer stages
doi:10.1371/journal.pmed.1001453
PMCID: PMC3660251  PMID: 23700391
3.  Significance Analysis of Prognostic Signatures 
PLoS Computational Biology  2013;9(1):e1002875.
A major goal in translational cancer research is to identify biological signatures driving cancer progression and metastasis. A common technique applied in genomics research is to cluster patients using gene expression data from a candidate prognostic gene set, and if the resulting clusters show statistically significant outcome stratification, to associate the gene set with prognosis, suggesting its biological and clinical importance. Recent work has questioned the validity of this approach by showing in several breast cancer data sets that “random” gene sets tend to cluster patients into prognostically variable subgroups. This work suggests that new rigorous statistical methods are needed to identify biologically informative prognostic gene sets. To address this problem, we developed Significance Analysis of Prognostic Signatures (SAPS) which integrates standard prognostic tests with a new prognostic significance test based on stratifying patients into prognostic subtypes with random gene sets. SAPS ensures that a significant gene set is not only able to stratify patients into prognostically variable groups, but is also enriched for genes showing strong univariate associations with patient prognosis, and performs significantly better than random gene sets. We use SAPS to perform a large meta-analysis (the largest completed to date) of prognostic pathways in breast and ovarian cancer and their molecular subtypes. Our analyses show that only a small subset of the gene sets found statistically significant using standard measures achieve significance by SAPS. We identify new prognostic signatures in breast and ovarian cancer and their corresponding molecular subtypes, and we show that prognostic signatures in ER negative breast cancer are more similar to prognostic signatures in ovarian cancer than to prognostic signatures in ER positive breast cancer. SAPS is a powerful new method for deriving robust prognostic biological signatures from clinically annotated genomic datasets.
Author Summary
A major goal in biomedical research is to identify sets of genes (or “biological signatures”) associated with patient survival, as these genes could be targeted to aid in diagnosing and treating disease. A major challenge in using prognostic associations to identify biologically informative signatures is that in some diseases, “random” gene sets are associated with prognosis. To address this problem, we developed a new method called “Significance Analysis of Prognostic Signatures” (or “SAPS”) for the identification of biologically informative gene sets associated with patient survival. To test the effectiveness of SAPS, we use SAPS to perform a subtype-specific meta-analysis of prognostic signatures in large breast and ovarian cancer meta-data sets. This analysis represents the largest of its kind ever performed. Our analyses show that only a small subset of the gene sets found statistically significant using standard measures achieve significance by SAPS. We identify new prognostic signatures in breast and ovarian cancer and their corresponding molecular subtypes, and we demonstrate a striking similarity between prognostic pathways in ER negative breast cancer and ovarian cancer, suggesting new shared therapeutic targets for these aggressive malignancies. SAPS is a powerful new method for deriving robust prognostic biological pathways from clinically annotated genomic datasets.
doi:10.1371/journal.pcbi.1002875
PMCID: PMC3554539  PMID: 23365551
4.  Molecular Subtyping of Serous Ovarian Tumors Reveals Multiple Connections to Intrinsic Breast Cancer Subtypes 
PLoS ONE  2014;9(9):e107643.
Objective
Transcriptional profiling of epithelial ovarian cancer has revealed molecular subtypes correlating to biological and clinical features. We aimed to determine gene expression differences between malignant, benign and borderline serous ovarian tumors, and investigate similarities with the well-established intrinsic molecular subtypes of breast cancer.
Methods
Global gene expression profiling using Illumina's HT12 Bead Arrays was applied to 59 fresh-frozen serous ovarian malignant, benign and borderline tumors. Nearest centroid classification was performed applying previously published gene profiles for the ovarian and breast cancer subtypes. Correlations to gene expression modules representing key biological breast cancer features were also sought. Validation was performed using an independent, publicly available dataset.
Results
5,944 genes were significantly differentially expressed between benign and malignant serous ovarian tumors, with cell cycle processes enriched in the malignant subgroup. Borderline tumors were split between the two clusters. Significant correlations between the malignant serous tumors and the highly aggressive ovarian cancer signatures, and the basal-like breast cancer subtype were found. The benign and borderline serous tumors together were significantly correlated to the normal-like breast cancer subtype and the ovarian cancer signature derived from borderline tumors. The borderline tumors in the study dataset, in addition, also correlated significantly to the luminal A breast cancer subtype. These findings remained when analyzed in an independent dataset, supporting links between the molecular subtypes of ovarian cancer and breast cancer beyond those recently acknowledged.
Conclusions
These data link the transcriptional profiles of serous ovarian cancer to the intrinsic molecular subtypes of breast cancer, in line with the shared clinical and molecular features between high-grade serous ovarian cancer and basal-like breast cancer, and suggest that biomarkers and targeted therapies may overlap between these tumor subsets. The link between benign and borderline ovarian cancer and luminal breast cancer may indicate endocrine responsiveness in a subset of ovarian cancers.
doi:10.1371/journal.pone.0107643
PMCID: PMC4166462  PMID: 25226589
5.  A Steiner tree-based method for biomarker discovery and classification in breast cancer metastasis 
BMC Genomics  2012;13(Suppl 6):S8.
Background
Metastatic breast cancer is a leading cause of cancer-related deaths in women worldwide. DNA microarray has become an important tool to help identify biomarker genes for improving the prognosis of breast cancer. Recently, it was shown that pathway-level relationships between genes can be incorporated to build more robust classification models and to obtain more useful biological insight from such models. Due to the unavailability of complete pathways, protein-protein interaction (PPI) network is becoming more popular to researcher and opens a new way to investigate the developmental process of breast cancer.
Methods
In this study, a network-based method is proposed to combine microarray gene expression profiles and PPI network for biomarker discovery for breast cancer metastasis. The key idea in our approach is to identify a small number of genes to connect differentially expressed genes into a single component in a PPI network; these intermediate genes contain important information about the pathways involved in metastasis and have a high probability of being biomarkers.
Results
We applied this approach on two breast cancer microarray datasets, and for both cases we identified significant numbers of well-known biomarker genes for breast cancer metastasis. Those selected genes are significantly enriched with biological processes and pathways related to cancer carcinogenic process, and, importantly, have much higher stability across different datasets than in previous studies. Furthermore, our selected genes significantly increased cross-data classification accuracy of breast cancer metastasis.
Conclusions
The randomized Steiner tree based approach described in this study is a new way to discover biomarker genes for breast cancer, and improves the prediction accuracy of metastasis. Though the analysis is limited here only to breast cancer, it can be easily applied to other diseases.
doi:10.1186/1471-2164-13-S6-S8
PMCID: PMC3481447  PMID: 23134806
6.  Ovarian Carcinoma Subtypes Are Different Diseases: Implications for Biomarker Studies 
PLoS Medicine  2008;5(12):e232.
Background
Although it has long been appreciated that ovarian carcinoma subtypes (serous, clear cell, endometrioid, and mucinous) are associated with different natural histories, most ovarian carcinoma biomarker studies and current treatment protocols for women with this disease are not subtype specific. With the emergence of high-throughput molecular techniques, distinct pathogenetic pathways have been identified in these subtypes. We examined variation in biomarker expression rates between subtypes, and how this influences correlations between biomarker expression and stage at diagnosis or prognosis.
Methods and Findings
In this retrospective study we assessed the protein expression of 21 candidate tissue-based biomarkers (CA125, CRABP-II, EpCam, ER, F-Spondin, HE4, IGF2, K-Cadherin, Ki-67, KISS1, Matriptase, Mesothelin, MIF, MMP7, p21, p53, PAX8, PR, SLPI, TROP2, WT1) in a population-based cohort of 500 ovarian carcinomas that was collected over the period from 1984 to 2000. The expression of 20 of the 21 biomarkers differs significantly between subtypes, but does not vary across stage within each subtype. Survival analyses show that nine of the 21 biomarkers are prognostic indicators in the entire cohort but when analyzed by subtype only three remain prognostic indicators in the high-grade serous and none in the clear cell subtype. For example, tumor proliferation, as assessed by Ki-67 staining, varies markedly between different subtypes and is an unfavourable prognostic marker in the entire cohort (risk ratio [RR] 1.7, 95% confidence interval [CI] 1.2%–2.4%) but is not of prognostic significance within any subtype. Prognostic associations can even show an inverse correlation within the entire cohort, when compared to a specific subtype. For example, WT1 is more frequently expressed in high-grade serous carcinomas, an aggressive subtype, and is an unfavourable prognostic marker within the entire cohort of ovarian carcinomas (RR 1.7, 95% CI 1.2%–2.3%), but is a favourable prognostic marker within the high-grade serous subtype (RR 0.5, 95% CI 0.3%–0.8%).
Conclusions
The association of biomarker expression with survival varies substantially between subtypes, and can easily be overlooked in whole cohort analyses. To avoid this effect, each subtype within a cohort should be analyzed discretely. Ovarian carcinoma subtypes are different diseases, and these differences should be reflected in clinical research study design and ultimately in the management of ovarian carcinoma.
David Huntsman and colleagues describe the associations between biomarker expression patterns and survival in different ovarian cancer subtypes. They suggest that the management of ovarian cancer should reflect differences between these subtypes.
Editors' Summary
Background.
Every year, about 200,000 women develop ovarian cancer and more than 100,000 die from the disease. Ovarian epithelial cancer (carcinoma) occurs when epithelial cells from the ovary or fallopian tube acquire mutations or equivalent changes that allow them to grow uncontrollably within one of the ovaries (two small organs in the pelvis that produce eggs) and acquire the potential to spread around the body (metastasize). While the cancer is confined to the ovaries, cancer specialists call this stage I disease; 70%–80% of women diagnosed with stage I ovarian cancer survive for at least 5 y. However, only a fifth of ovarian cancers are diagnosed at this stage; in the majority of patients the cancer has spread into the pelvis (stage II disease), into the peritoneal cavity (the space around the gut, stomach, and liver; stage III disease), or metastasized to distant organs such as brain (stage IV disease). This peritoneal spread might be associated with often only vague abdominal pain and mild digestive disturbances. Patients with advanced-stage ovarian carcinoma are treated with a combination of surgery and chemotherapy but, despite recent advances in treatment, only 15% of women diagnosed with stage IV disease survive for 5 y.
Why Was This Study Done?
Although it is usually regarded as a single disease, there are actually several distinct subtypes of ovarian carcinoma. These are classified according to their microscopic appearance as high-grade serous, low-grade serous, clear cell, endometrioid, and mucinous ovarian carcinomas. These subtypes develop differently and respond differently to chemotherapy. Yet scientists studying ovarian carcinoma usually regard this cancer as a single entity, and current treatment protocols for the disease are not subtype specific. Might better progress be made toward understanding ovarian carcinoma and toward improving its treatment if each subtype were treated as a separate disease? Why are some tumors confined to the ovary, whereas the majority spread beyond the ovary at time of diagnosis? In this study, the researchers address these questions by asking whether correlations between the expression of “biomarkers” (molecules made by cancer cells that can be used to detect tumors and to monitor treatment effectiveness) and the stage at diagnosis or length of survival can be explained by differential biomarker expression between different subtypes of ovarian carcinoma. They also address the question of whether early stage and late stage ovarian carcinomas are fundamentally different.
What Did the Researchers Do and Find?
The researchers measured the expression of 21 candidate protein biomarkers in 500 ovarian carcinoma samples collected in British Columbia, Canada, between 1984 and 2000. For 20 of the biomarkers, the fraction of tumors expressing the biomarker varied significantly between ovarian carcinoma subtypes. Considering all the tumors together, ten biomarkers had different expression levels in early and late stage tumors. However, when each subtype was considered separately, the expression of none of the biomarkers varied with stage. When the researchers asked whether the expression of any of the biomarkers correlated with survival times, they found that nine biomarkers were unfavorable indicators of outcome when all the tumors were considered together. That is, women whose tumors expressed any of these biomarkers had a higher risk of dying from ovarian cancer than women whose tumors did not express these biomarkers. However, only three biomarkers were unfavorable indicators for high-grade serous carcinomas considered alone and the expression of a biomarker called WT1 in this subtype of ovarian carcinoma is associated with a lower risk of dying. Similarly, expression of the biomarker Ki-67 was an unfavorable prognostic indicator when all the tumors were considered, but was not a prognostic indicator for any individual subtype.
What Do These Findings Mean?
These and other findings indicate that biomarker expression is more strongly associated with ovarian carcinoma subtype than with stage. In other words, biomarker expression is constant from early to late stage, but only within a given subtype. Second, the association of biomarker expression with survival varies between subtypes, hence lumping all subtypes together can yield misleading results. Although these findings need confirming in more tumor samples, they support the view that ovarian carcinoma subtypes are different diseases. In practical terms, therefore, these findings suggest that better ways to detect and treat ovarian cancer are more likely to be found if future biomarker studies and clinical research studies investigate each subtype of ovarian carcinoma separately rather than grouping them all together.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050232.
The US National Cancer Institute provides a brief description of what cancer is and how it develops and information on all aspects of ovarian cancer for patients and professionals. It also provides a fact sheet on tumor markers (in English and Spanish)
The UK charity Cancerbackup provides general information about cancer and more specific information about ovarian cancer, including tumor staging
doi:10.1371/journal.pmed.0050232
PMCID: PMC2592352  PMID: 19053170
7.  Nuclear Receptor Expression Defines a Set of Prognostic Biomarkers for Lung Cancer 
PLoS Medicine  2010;7(12):e1000378.
David Mangelsdorf and colleagues show that nuclear receptor expression is strongly associated with clinical outcomes of lung cancer patients, and this expression profile is a potential prognostic signature for lung cancer patient survival time, particularly for individuals with early stage disease.
Background
The identification of prognostic tumor biomarkers that also would have potential as therapeutic targets, particularly in patients with early stage disease, has been a long sought-after goal in the management and treatment of lung cancer. The nuclear receptor (NR) superfamily, which is composed of 48 transcription factors that govern complex physiologic and pathophysiologic processes, could represent a unique subset of these biomarkers. In fact, many members of this family are the targets of already identified selective receptor modulators, providing a direct link between individual tumor NR quantitation and selection of therapy. The goal of this study, which begins this overall strategy, was to investigate the association between mRNA expression of the NR superfamily and the clinical outcome for patients with lung cancer, and to test whether a tumor NR gene signature provided useful information (over available clinical data) for patients with lung cancer.
Methods and Findings
Using quantitative real-time PCR to study NR expression in 30 microdissected non-small-cell lung cancers (NSCLCs) and their pair-matched normal lung epithelium, we found great variability in NR expression among patients' tumor and non-involved lung epithelium, found a strong association between NR expression and clinical outcome, and identified an NR gene signature from both normal and tumor tissues that predicted patient survival time and disease recurrence. The NR signature derived from the initial 30 NSCLC samples was validated in two independent microarray datasets derived from 442 and 117 resected lung adenocarcinomas. The NR gene signature was also validated in 130 squamous cell carcinomas. The prognostic signature in tumors could be distilled to expression of two NRs, short heterodimer partner and progesterone receptor, as single gene predictors of NSCLC patient survival time, including for patients with stage I disease. Of equal interest, the studies of microdissected histologically normal epithelium and matched tumors identified expression in normal (but not tumor) epithelium of NGFIB3 and mineralocorticoid receptor as single gene predictors of good prognosis.
Conclusions
NR expression is strongly associated with clinical outcomes for patients with lung cancer, and this expression profile provides a unique prognostic signature for lung cancer patient survival time, particularly for those with early stage disease. This study highlights the potential use of NRs as a rational set of therapeutically tractable genes as theragnostic biomarkers, and specifically identifies short heterodimer partner and progesterone receptor in tumors, and NGFIB3 and MR in non-neoplastic lung epithelium, for future detailed translational study in lung cancer.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Lung cancer, the most common cause of cancer-related death, kills 1.3 million people annually. Most lung cancers are “non-small-cell lung cancers” (NSCLCs), and most are caused by smoking. Exposure to chemicals in smoke causes changes in the genes of the cells lining the lungs that allow the cells to grow uncontrollably and to move around the body. How NSCLC is treated and responds to treatment depends on its “stage.” Stage I tumors, which are small and confined to the lung, are removed surgically, although chemotherapy is also sometimes given. Stage II tumors have spread to nearby lymph nodes and are treated with surgery and chemotherapy, as are some stage III tumors. However, because cancer cells in stage III tumors can be present throughout the chest, surgery is not always possible. For such cases, and for stage IV NSCLC, where the tumor has spread around the body, patients are treated with chemotherapy alone. About 70% of patients with stage I and II NSCLC but only 2% of patients with stage IV NSCLC survive for five years after diagnosis; more than 50% of patients have stage IV NSCLC at diagnosis.
Why Was This Study Done?
Patient responses to treatment vary considerably. Oncologists (doctors who treat cancer) would like to know which patients have a good prognosis (are likely to do well) to help them individualize their treatment. Consequently, the search is on for “prognostic tumor biomarkers,” molecules made by cancer cells that can be used to predict likely clinical outcomes. Such biomarkers, which may also be potential therapeutic targets, can be identified by analyzing the overall pattern of gene expression in a panel of tumors using a technique called microarray analysis and looking for associations between the expression of sets of genes and clinical outcomes. In this study, the researchers take a more directed approach to identifying prognostic biomarkers by investigating the association between the expression of the genes encoding nuclear receptors (NRs) and clinical outcome in patients with lung cancer. The NR superfamily contains 48 transcription factors (proteins that control the expression of other genes) that respond to several hormones and to diet-derived fats. NRs control many biological processes and are targets for several successful drugs, including some used to treat cancer.
What Did the Researchers Do and Find?
The researchers analyzed the expression of NR mRNAs using “quantitative real-time PCR” in 30 microdissected NSCLCs and in matched normal lung tissue samples (mRNA is the blueprint for protein production). They then used an approach called standard classification and regression tree analysis to build a prognostic model for NSCLC based on the expression data. This model predicted both survival time and disease recurrence among the patients from whom the tumors had been taken. The researchers validated their prognostic model in two large independent lung adenocarcinoma microarray datasets and in a squamous cell carcinoma dataset (adenocarcinomas and squamous cell carcinomas are two major NSCLC subtypes). Finally, they explored the roles of specific NRs in the prediction model. This analysis revealed that the ability of the NR signature in tumors to predict outcomes was mainly due to the expression of two NRs—the short heterodimer partner (SHP) and the progesterone receptor (PR). Expression of either gene could be used as a single gene predictor of the survival time of patients, including those with stage I disease. Similarly, the expression of either nerve growth factor induced gene B3 (NGFIB3) or mineralocorticoid receptor (MR) in normal tissue was a single gene predictor of a good prognosis.
What Do These Findings Mean?
These findings indicate that the expression of NR mRNA is strongly associated with clinical outcomes in patients with NSCLC. Furthermore, they identify a prognostic NR expression signature that provides information on the survival time of patients, including those with early stage disease. The signature needs to be confirmed in more patients before it can be used clinically, and researchers would like to establish whether changes in mRNA expression are reflected in changes in protein expression if NRs are to be targeted therapeutically. Nevertheless, these findings highlight the potential use of NRs as prognostic tumor biomarkers. Furthermore, they identify SHP and PR in tumors and two NRs in normal lung tissue as molecules that might provide new targets for the treatment of lung cancer and new insights into the early diagnosis, pathogenesis, and chemoprevention of lung cancer.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000378.
The Nuclear Receptor Signaling Atlas (NURSA) is consortium of scientists sponsored by the US National Institutes of Health that provides scientific reagents, datasets, and educational material on nuclear receptors and their co-regulators to the scientific community through a Web-based portal
The Cancer Prevention and Research Institute of Texas (CPRIT) provides information and resources to anyone interested in the prevention and treatment of lung and other cancers
The US National Cancer Institute provides detailed information for patients and professionals about all aspects of lung cancer, including information on non-small-cell carcinoma and on tumor markers (in English and Spanish)
Cancer Research UK also provides information about lung cancer and information on how cancer starts
MedlinePlus has links to other resources about lung cancer (in English and Spanish)
Wikipedia has a page on nuclear receptors (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1000378
PMCID: PMC3001894  PMID: 21179495
8.  Identification of Robust Pathway Markers for Cancer through Rank-Based Pathway Activity Inference 
Advances in Bioinformatics  2013;2013:618461.
One important problem in translational genomics is the identification of reliable and reproducible markers that can be used to discriminate between different classes of a complex disease, such as cancer. The typical small sample setting makes the prediction of such markers very challenging, and various approaches have been proposed to address this problem. For example, it has been shown that pathway markers, which aggregate the gene activities in the same pathway, tend to be more robust than gene markers. Furthermore, the use of gene expression ranking has been demonstrated to be robust to batch effects and that it can lead to more interpretable results. In this paper, we propose an enhanced pathway activity inference method that uses gene ranking to predict the pathway activity in a probabilistic manner. The main focus of this work is on identifying robust pathway markers that can ultimately lead to robust classifiers with reproducible performance across datasets. Simulation results based on multiple breast cancer datasets show that the proposed inference method identifies better pathway markers that can predict breast cancer metastasis with higher accuracy. Moreover, the identified pathway markers can lead to better classifiers with more consistent classification performance across independent datasets.
doi:10.1155/2013/618461
PMCID: PMC3600350  PMID: 23533400
9.  Comparative survival analysis of breast cancer microarray studies identifies important prognostic genetic pathways 
BMC Cancer  2010;10:573.
Background
An estimated 12% of females in the United States will develop breast cancer in their lifetime. Although, there are advances in treatment options including surgery and chemotherapy, breast cancer is still the second most lethal cancer in women. Thus, there is a clear need for better methods to predict prognosis for each breast cancer patient. With the advent of large genetic databases and the reduction in cost for the experiments, researchers are faced with choosing from a large pool of potential prognostic markers from numerous breast cancer gene expression profile studies.
Methods
Five microarray datasets related to breast cancer were examined using gene set analysis and the cancers were categorized into different subtypes using a scoring system based on genetic pathway activity.
Results
We have observed that significant genes in the individual studies show little reproducibility across the datasets. From our comparative analysis, using gene pathways with clinical variables is more reliable across studies and shows promise in assessing a patient's prognosis.
Conclusions
This study concludes that, in light of clinical variables, there are significant gene pathways in common across the datasets. Specifically, several pathways can further significantly stratify patients for survival. These candidate pathways should help to develop a panel of significant biomarkers for the prognosis of breast cancer patients in a clinical setting.
doi:10.1186/1471-2407-10-573
PMCID: PMC2972286  PMID: 20964848
10.  Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood 
BMC Medical Genomics  2013;6(Suppl 1):S4.
Background
Breast cancer is worldwide the second most common type of cancer after lung cancer. Traditional mammography and Tissue Microarray has been studied for early cancer detection and cancer prediction. However, there is a need for more reliable diagnostic tools for early detection of breast cancer. This can be a challenge due to a number of factors and logistics. First, obtaining tissue biopsies can be difficult. Second, mammography may not detect small tumors, and is often unsatisfactory for younger women who typically have dense breast tissue. Lastly, breast cancer is not a single homogeneous disease but consists of multiple disease states, each arising from a distinct molecular mechanism and having a distinct clinical progression path which makes the disease difficult to detect and predict in early stages.
Results
In the paper, we present a Support Vector Machine based on Recursive Feature Elimination and Cross Validation (SVM-RFE-CV) algorithm for early detection of breast cancer in peripheral blood and show how to use SVM-RFE-CV to model the classification and prediction problem of early detection of breast cancer in peripheral blood.
The training set which consists of 32 health and 33 cancer samples and the testing set consisting of 31 health and 34 cancer samples were randomly separated from a dataset of peripheral blood of breast cancer that is downloaded from Gene Express Omnibus. First, we identified the 42 differentially expressed biomarkers between "normal" and "cancer". Then, with the SVM-RFE-CV we extracted 15 biomarkers that yield zero cross validation score. Lastly, we compared the classification and prediction performance of SVM-RFE-CV with that of SVM and SVM Recursive Feature Elimination (SVM-RFE).
Conclusions
We found that 1) the SVM-RFE-CV is suitable for analyzing noisy high-throughput microarray data, 2) it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features, and 3) it can improve the prediction performance (Area Under Curve) in the testing data set from 0.5826 to 0.7879. Further pathway analysis showed that the biomarkers are associated with Signaling, Hemostasis, Hormones, and Immune System, which are consistent with previous findings. Our prediction model can serve as a general model for biomarker discovery in early detection of other cancers. In the future, Polymerase Chain Reaction (PCR) is planned for validation of the ability of these potential biomarkers for early detection of breast cancer.
doi:10.1186/1755-8794-6-S1-S4
PMCID: PMC3552693  PMID: 23369435
11.  Intra-tumor Genetic Heterogeneity and Mortality in Head and Neck Cancer: Analysis of Data from The Cancer Genome Atlas 
PLoS Medicine  2015;12(2):e1001786.
Background
Although the involvement of intra-tumor genetic heterogeneity in tumor progression, treatment resistance, and metastasis is established, genetic heterogeneity is seldom examined in clinical trials or practice. Many studies of heterogeneity have had prespecified markers for tumor subpopulations, limiting their generalizability, or have involved massive efforts such as separate analysis of hundreds of individual cells, limiting their clinical use. We recently developed a general measure of intra-tumor genetic heterogeneity based on whole-exome sequencing (WES) of bulk tumor DNA, called mutant-allele tumor heterogeneity (MATH). Here, we examine data collected as part of a large, multi-institutional study to validate this measure and determine whether intra-tumor heterogeneity is itself related to mortality.
Methods and Findings
Clinical and WES data were obtained from The Cancer Genome Atlas in October 2013 for 305 patients with head and neck squamous cell carcinoma (HNSCC), from 14 institutions. Initial pathologic diagnoses were between 1992 and 2011 (median, 2008). Median time to death for 131 deceased patients was 14 mo; median follow-up of living patients was 22 mo. Tumor MATH values were calculated from WES results. Despite the multiple head and neck tumor subsites and the variety of treatments, we found in this retrospective analysis a substantial relation of high MATH values to decreased overall survival (Cox proportional hazards analysis: hazard ratio for high/low heterogeneity, 2.2; 95% CI 1.4 to 3.3). This relation of intra-tumor heterogeneity to survival was not due to intra-tumor heterogeneity’s associations with other clinical or molecular characteristics, including age, human papillomavirus status, tumor grade and TP53 mutation, and N classification. MATH improved prognostication over that provided by traditional clinical and molecular characteristics, maintained a significant relation to survival in multivariate analyses, and distinguished outcomes among patients having oral-cavity or laryngeal cancers even when standard disease staging was taken into account. Prospective studies, however, will be required before MATH can be used prognostically in clinical trials or practice. Such studies will need to examine homogeneously treated HNSCC at specific head and neck subsites, and determine the influence of cancer therapy on MATH values. Analysis of MATH and outcome in human-papillomavirus-positive oropharyngeal squamous cell carcinoma is particularly needed.
Conclusions
To our knowledge this study is the first to combine data from hundreds of patients, treated at multiple institutions, to document a relation between intra-tumor heterogeneity and overall survival in any type of cancer. We suggest applying the simply calculated MATH metric of heterogeneity to prospective studies of HNSCC and other tumor types.
In this study, Rocco and colleagues examine data collected as part of a large, multi-institutional study, to validate a measure of tumor heterogeneity called MATH and determine whether intra-tumor heterogeneity is itself related to mortality.
Editors’ Summary
Background
Normally, the cells in human tissues and organs only reproduce (a process called cell division) when new cells are needed for growth or to repair damaged tissues. But sometimes a cell somewhere in the body acquires a genetic change (mutation) that disrupts the control of cell division and allows the cell to grow continuously. As the mutated cell grows and divides, it accumulates additional mutations that allow it to grow even faster and eventually from a lump, or tumor (cancer). Other mutations subsequently allow the tumor to spread around the body (metastasize) and destroy healthy tissues. Tumors can arise anywhere in the body—there are more than 200 different types of cancer—and about one in three people will develop some form of cancer during their lifetime. Many cancers can now be successfully treated, however, and people often survive for years after a diagnosis of cancer before, eventually, dying from another disease.
Why Was This Study Done?
The gradual acquisition of mutations by tumor cells leads to the formation of subpopulations of cells, each carrying a different set of mutations. This “intra-tumor heterogeneity” can produce tumor subclones that grow particularly quickly, that metastasize aggressively, or that are resistant to cancer treatments. Consequently, researchers have hypothesized that high intra-tumor heterogeneity leads to worse clinical outcomes and have suggested that a simple measure of this heterogeneity would be a useful addition to the cancer staging system currently used by clinicians for predicting the likely outcome (prognosis) of patients with cancer. Here, the researchers investigate whether a measure of intra-tumor heterogeneity called “mutant-allele tumor heterogeneity” (MATH) is related to mortality (death) among patients with head and neck squamous cell carcinoma (HNSCC)—cancers that begin in the cells that line the moist surfaces inside the head and neck, such as cancers of the mouth and the larynx (voice box). MATH is based on whole-exome sequencing (WES) of tumor and matched normal DNA. WES uses powerful DNA-sequencing systems to determine the variations of all the coding regions (exons) of the known genes in the human genome (genetic blueprint).
What Did the Researchers Do and Find?
The researchers obtained clinical and WES data for 305 patients who were treated in 14 institutions, primarily in the US, after diagnosis of HNSCC from The Cancer Genome Atlas, a catalog established by the US National Institutes of Health to map the key genomic changes in major types and subtypes of cancer. They calculated tumor MATH values for the patients from their WES results and retrospectively analyzed whether there was an association between the MATH values and patient survival. Despite the patients having tumors at various subsites and being given different treatments, every 10% increase in MATH value corresponded to an 8.8% increased risk (hazard) of death. Using a previously defined MATH-value cutoff to distinguish high- from low-heterogeneity tumors, compared to patients with low-heterogeneity tumors, patients with high-heterogeneity tumors were more than twice as likely to die (a hazard ratio of 2.2). Other statistical analyses indicated that MATH provided improved prognostic information compared to that provided by established clinical and molecular characteristics and human papillomavirus (HPV) status (HPV-positive HNSCC at some subsites has a better prognosis than HPV-negative HNSCC). In particular, MATH provided prognostic information beyond that provided by standard disease staging among patients with mouth or laryngeal cancers.
What Do These Findings Mean?
By using data from more than 300 patients treated at multiple institutions, these findings validate the use of MATH as a measure of intra-tumor heterogeneity in HNSCC. Moreover, they provide one of the first large-scale demonstrations that intra-tumor heterogeneity is clinically important in the prognosis of any type of cancer. Before the MATH metric can be used in clinical trials or in clinical practice as a prognostic tool, its ability to predict outcomes needs to be tested in prospective studies that examine the relation between MATH and the outcomes of patients with identically treated HNSCC at specific head and neck subsites, that evaluate the use of MATH for prognostication in other tumor types, and that determine the influence of cancer treatments on MATH values. Nevertheless, these findings suggest that MATH should be considered as a biomarker for survival in HNSCC and other tumor types, and raise the possibility that clinicians could use MATH values to decide on the best treatment for individual patients and to choose patients for inclusion in clinical trials.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001786.
The US National Cancer Institute (NCI) provides information about cancer and how it develops and about head and neck cancer (in English and Spanish)
Cancer Research UK, a not-for-profit organization, provides general information about cancer and how it develops, and detailed information about head and neck cancer; the Merseyside Regional Head and Neck Cancer Centre provides patient stories about HNSCC
Wikipedia provides information about tumor heterogeneity, and about whole-exome sequencing (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
Information about The Cancer Genome Atlas is available
A PLOS Blog entry by Jessica Wapner explains more about MATH
doi:10.1371/journal.pmed.1001786
PMCID: PMC4323109  PMID: 25668320
12.  Identifying cancer biomarkers by network-constrained support vector machines 
BMC Systems Biology  2011;5:161.
Background
One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.
Results
We developed an integrated approach, namely network-constrained support vector machine (netSVM), for cancer biomarker identification with an improved prediction performance. The netSVM approach is specifically designed for network biomarker identification by integrating gene expression data and protein-protein interaction data. We first evaluated the effectiveness of netSVM using simulation studies, demonstrating its improved performance over state-of-the-art network-based methods and gene-based methods for network biomarker identification. We then applied the netSVM approach to two breast cancer data sets to identify prognostic signatures for prediction of breast cancer metastasis. The experimental results show that: (1) network biomarkers identified by netSVM are highly enriched in biological pathways associated with cancer progression; (2) prediction performance is much improved when tested across different data sets. Specifically, many genes related to apoptosis, cell cycle, and cell proliferation, which are hallmark signatures of breast cancer metastasis, were identified by the netSVM approach. More importantly, several novel hub genes, biologically important with many interactions in PPI network but often showing little change in expression as compared with their downstream genes, were also identified as network biomarkers; the genes were enriched in signaling pathways such as TGF-beta signaling pathway, MAPK signaling pathway, and JAK-STAT signaling pathway. These signaling pathways may provide new insight to the underlying mechanism of breast cancer metastasis.
Conclusions
We have developed a network-based approach for cancer biomarker identification, netSVM, resulting in an improved prediction performance with network biomarkers. We have applied the netSVM approach to breast cancer gene expression data to predict metastasis in patients. Network biomarkers identified by netSVM reveal potential signaling pathways associated with breast cancer metastasis, and help improve the prediction performance across independent data sets.
doi:10.1186/1752-0509-5-161
PMCID: PMC3214162  PMID: 21992556
13.  Survival-Related Profile, Pathways, and Transcription Factors in Ovarian Cancer 
PLoS Medicine  2009;6(2):e1000024.
Background
Ovarian cancer has a poor prognosis due to advanced stage at presentation and either intrinsic or acquired resistance to classic cytotoxic drugs such as platinum and taxoids. Recent large clinical trials with different combinations and sequences of classic cytotoxic drugs indicate that further significant improvement in prognosis by this type of drugs is not to be expected. Currently a large number of drugs, targeting dysregulated molecular pathways in cancer cells have been developed and are introduced in the clinic. A major challenge is to identify those patients who will benefit from drugs targeting these specific dysregulated pathways.The aims of our study were (1) to develop a gene expression profile associated with overall survival in advanced stage serous ovarian cancer, (2) to assess the association of pathways and transcription factors with overall survival, and (3) to validate our identified profile and pathways/transcription factors in an independent set of ovarian cancers.
Methods and Findings
According to a randomized design, profiling of 157 advanced stage serous ovarian cancers was performed in duplicate using ∼35,000 70-mer oligonucleotide microarrays. A continuous predictor of overall survival was built taking into account well-known issues in microarray analysis, such as multiple testing and overfitting. A functional class scoring analysis was utilized to assess pathways/transcription factors for their association with overall survival. The prognostic value of genes that constitute our overall survival profile was validated on a fully independent, publicly available dataset of 118 well-defined primary serous ovarian cancers. Furthermore, functional class scoring analysis was also performed on this independent dataset to assess the similarities with results from our own dataset. An 86-gene overall survival profile discriminated between patients with unfavorable and favorable prognosis (median survival, 19 versus 41 mo, respectively; permutation p-value of log-rank statistic = 0.015) and maintained its independent prognostic value in multivariate analysis. Genes that composed the overall survival profile were also able to discriminate between the two risk groups in the independent dataset. In our dataset 17/167 pathways and 13/111 transcription factors were associated with overall survival, of which 16 and 12, respectively, were confirmed in the independent dataset.
Conclusions
Our study provides new clues to genes, pathways, and transcription factors that contribute to the clinical outcome of serous ovarian cancer and might be exploited in designing new treatment strategies.
Ate van der Zee and colleagues analyze the gene expression profiles of ovarian cancer samples from 157 patients, and identify an 86-gene expression profile that seems to predict overall survival.
Editors' Summary
Background.
Ovarian cancer kills more than 100,000 women every year and is one of the most frequent causes of cancer death in women in Western countries. Most ovarian cancers develop when an epithelial cell in one of the ovaries (two small organs in the pelvis that produce eggs) acquires genetic changes that allow it to grow uncontrollably and to spread around the body (metastasize). In its early stages, ovarian cancer is confined to the ovaries and can often be treated successfully by surgery alone. Unfortunately, early ovarian cancer rarely has symptoms so a third of women with ovarian cancer have advanced disease when they first visit their doctor with symptoms that include vague abdominal pains and mild digestive disturbances. That is, cancer cells have spread into their abdominal cavity and metastasized to other parts of the body (so-called stage III and IV disease). The outlook for women diagnosed with stage III and IV disease, which are treated with a combination of surgery and chemotherapy, is very poor. Only 30% of women with stage III, and 5% with stage IV, are still alive five years after their cancer is diagnosed.
Why Was This Study Done?
If the cellular pathways that determine the biological behavior of ovarian cancer could be identified, it might be possible to develop more effective treatments for women with stage III and IV disease. One way to identify these pathways is to use gene expression profiling (a technique that catalogs all the genes expressed by a cell) to compare gene expression patterns in the ovarian cancers of women who survive for different lengths of time. Genes with different expression levels in tumors with different outcomes could be targets for new treatments. For example, it might be worth developing inhibitors of proteins whose expression is greatest in tumors with short survival times. In this study, the researchers develop an expression profile that is associated with overall survival in advanced-stage serous ovarian cancer (more than half of ovarian cancers originate in serous cells, epithelial cells that secrete a watery fluid). The researchers also assess the association of various cellular pathways and transcription factors (proteins that control the expression of other proteins) with survival in this type of ovarian carcinoma.
What Did the Researchers Do and Find?
The researchers analyzed the gene expression profiles of tumor samples taken from 157 patients with advanced stage serous ovarian cancer and used the “supervised principal components” method to build a predictor of overall survival from these profiles and patient survival times. This 86-gene predictor discriminated between patients with favorable and unfavorable outcomes (average survival times of 41 and 19 months, respectively). It also discriminated between groups of patients with these two outcomes in an independent dataset collected from 118 additional serous ovarian cancers. Next, the researchers used “functional class scoring” analysis to assess the association between pathway and transcription factor expression in the tumor samples and overall survival. Seventeen of 167 KEGG pathways (“wiring” diagrams of molecular interactions, reactions and relations involved in cellular processes and human diseases listed in the Kyoto Encyclopedia of Genes and Genomes) were associated with survival, 16 of which were confirmed in the independent dataset. Finally, 13 of 111 analyzed transcription factors were associated with overall survival in the tumor samples, 12 of which were confirmed in the independent dataset.
What Do These Findings Mean?
These findings identify an 86-gene overall survival gene expression profile that seems to predict overall survival for women with advanced serous ovarian cancer. However, before this profile can be used clinically, further validation of the profile and more robust methods for determining gene expression profiles are needed. Importantly, these findings also provide new clues about the genes, pathways and transcription factors that contribute to the clinical outcome of serous ovarian cancer, clues that can now be exploited in the search for new treatment strategies. Finally, these findings suggest that it might eventually be possible to tailor therapies to the needs of individual patients by analyzing which pathways are activated in their tumors and thus improve survival times for women with advanced ovarian cancer.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000024.
This study is further discussed in a PLoS Medicine Perspective by Simon Gayther and Kate Lawrenson
See also a related PLoS Medicine Research Article by Huntsman and colleagues
The US National Cancer Institute provides a brief description of what cancer is and how it develops, and information on all aspects of ovarian cancer for patients and professionals (in English and Spanish)
The UK charity Cancerbackup provides general information about cancer, and more specific information about ovarian cancer
MedlinePlus also provides links to other information about ovarian cancer (in English and Spanish)
The KEGG Pathway database provides pathway maps of known molecular networks involved in a wide range of cellular processes
doi:10.1371/journal.pmed.1000024
PMCID: PMC2634794  PMID: 19192944
14.  Systematic antibody generation and validation via tissue microarray technology leading to identification of a novel protein prognostic panel in breast cancer 
BMC Cancer  2013;13:175.
Background
Although omic-based discovery approaches can provide powerful tools for biomarker identification, several reservations have been raised regarding the clinical applicability of gene expression studies, such as their prohibitive cost. However, the limited availability of antibodies is a key barrier to the development of a lower cost alternative, namely a discrete collection of immunohistochemistry (IHC)-based biomarkers. The aim of this study was to use a systematic approach to generate and screen affinity-purified, mono-specific antibodies targeting progression-related biomarkers, with a view towards developing a clinically applicable IHC-based prognostic biomarker panel for breast cancer.
Methods
We examined both in-house and publicly available breast cancer DNA microarray datasets relating to invasion and metastasis, thus identifying a cohort of candidate progression-associated biomarkers. Of these, 18 antibodies were released for extended analysis. Validated antibodies were screened against a tissue microarray (TMA) constructed from a cohort of consecutive breast cancer cases (n = 512) to test the immunohistochemical surrogate signature.
Results
Antibody screening revealed 3 candidate prognostic markers: the cell cycle regulator, Anillin (ANLN); the mitogen-activated protein kinase, PDZ-Binding Kinase (PBK); and the estrogen response gene, PDZ-Domain Containing 1 (PDZK1). Increased expression of ANLN and PBK was associated with poor prognosis, whilst increased expression of PDZK1 was associated with good prognosis. A 3-marker signature comprised of high PBK, high ANLN and low PDZK1 expression was associated with decreased recurrence-free survival (p < 0.001) and breast cancer-specific survival (BCSS) (p < 0.001). This novel signature was associated with high tumour grade (p < 0.001), positive nodal status (p = 0.029), ER-negativity (p = 0.006), Her2-positivity (p = 0.036) and high Ki67 status (p < 0.001). However, multivariate Cox regression demonstrated that the signature was not a significant predictor of BCSS (HR = 6.38; 95% CI = 0.79-51.26, p = 0.082).
Conclusions
We have developed a comprehensive biomarker pathway that extends from discovery through to validation on a TMA platform. This proof-of-concept study has resulted in the identification of a novel 3-protein prognostic panel. Additional biochemical markers, interrogated using this high-throughput platform, may further augment the prognostic accuracy of this panel to a point that may allow implementation into routine clinical practice.
doi:10.1186/1471-2407-13-175
PMCID: PMC3668187  PMID: 23547718
Prognostic biomarkers; Tissue microarray; Breast cancer; Antibody screening; Antibody validation
15.  Estrogen receptor negative/progesterone receptor positive breast cancer is not a reproducible subtype 
Introduction
Estrogen receptor (ER) and progesterone receptor (PR) testing are performed in the evaluation of breast cancer. While the clinical utility of ER as a predictive biomarker to identify patients likely to benefit from hormonal therapy is well-established, the added value of PR is less well-defined. The primary goals of our study were to assess the distribution, inter-assay reproducibility, and prognostic significance of breast cancer subtypes defined by patterns of ER and PR expression.
Methods
We integrated gene expression microarray (GEM) and clinico-pathologic data from 20 published studies to determine the frequency (n = 4,111) and inter-assay reproducibility (n = 1,752) of ER/PR subtypes (ER+/PR+, ER+/PR-, ER-/PR-, ER-/PR+). To extend our findings, we utilized a cohort of patients from the Nurses’ Health Study (NHS) with ER/PR data recorded in the medical record and assessed on tissue microarrays (n = 2,011). In both datasets, we assessed the association of ER and PR expression with survival.
Results
In a genome-wide analysis, progesterone receptor was among the least variable genes in ER- breast cancer. The ER-/PR+ subtype was rare (approximately 1 to 4%) and showed no significant reproducibility (Kappa = 0.02 and 0.06, in the GEM and NHS datasets, respectively). The vast majority of patients classified as ER-/PR+ in the medical record (97% and 94%, in the GEM and NHS datasets) were re-classified by a second method. In the GEM dataset (n = 2,731), progesterone receptor mRNA expression was associated with prognosis in ER+ breast cancer (adjusted P <0.001), but not in ER- breast cancer (adjusted P = 0.21). PR protein expression did not contribute significant prognostic information to multivariate models considering ER and other standard clinico-pathologic features in the GEM or NHS datasets.
Conclusion
ER-/PR+ breast cancer is not a reproducible subtype. PR expression is not associated with prognosis in ER- breast cancer, and PR does not contribute significant independent prognostic information to multivariate models considering ER and other standard clinico-pathologic factors. Given that PR provides no clinically actionable information in ER+ breast cancer, these findings question the utility of routine PR testing in breast cancer.
doi:10.1186/bcr3462
PMCID: PMC3978610  PMID: 23971947
Estrogen receptor; Progesterone receptor; Breast cancer; Immunohistochemistry; Gene expression microarrays; Biomarkers; Inter-assay reproducibility
16.  The Preclinical Natural History of Serous Ovarian Cancer: Defining the Target for Early Detection 
PLoS Medicine  2009;6(7):e1000114.
Pat Brown and colleagues carry out a modeling study and define what properties a biomarker-based screening test would require in order to be clinically useful.
Background
Ovarian cancer kills approximately 15,000 women in the United States every year, and more than 140,000 women worldwide. Most deaths from ovarian cancer are caused by tumors of the serous histological type, which are rarely diagnosed before the cancer has spread. Rational design of a potentially life-saving early detection and intervention strategy requires understanding the lesions we must detect in order to prevent lethal progression. Little is known about the natural history of lethal serous ovarian cancers before they become clinically apparent. We can learn about this occult period by studying the unsuspected serous cancers that are discovered in a small fraction of apparently healthy women who undergo prophylactic bilateral salpingo-oophorectomy (PBSO).
Methods and Findings
We developed models for the growth, progression, and detection of occult serous cancers on the basis of a comprehensive analysis of published data on serous cancers discovered by PBSO in BRCA1 mutation carriers. Our analysis yielded several critical insights into the early natural history of serous ovarian cancer. First, these cancers spend on average more than 4 y as in situ, stage I, or stage II cancers and approximately 1 y as stage III or IV cancers before they become clinically apparent. Second, for most of the occult period, serous cancers are less than 1 cm in diameter, and not visible on gross examination of the ovaries and Fallopian tubes. Third, the median diameter of a serous ovarian cancer when it progresses to an advanced stage (stage III or IV) is about 3 cm. Fourth, to achieve 50% sensitivity in detecting tumors before they advance to stage III, an annual screen would need to detect tumors of 1.3 cm in diameter; 80% detection sensitivity would require detecting tumors less than 0.4 cm in diameter. Fifth, to achieve a 50% reduction in serous ovarian cancer mortality with an annual screen, a test would need to detect tumors of 0.5 cm in diameter.
Conclusions
Our analysis has formalized essential conditions for successful early detection of serous ovarian cancer. Although the window of opportunity for early detection of these cancers lasts for several years, developing a test sufficiently sensitive and specific to take advantage of that opportunity will be a challenge. We estimated that the tumors we would need to detect to achieve even 50% sensitivity are more than 200 times smaller than the clinically apparent serous cancers typically used to evaluate performance of candidate biomarkers; none of the biomarker assays reported to date comes close to the required level of performance. Overcoming the signal-to-noise problem inherent in detection of tiny tumors will likely require discovery of truly cancer-specific biomarkers or development of novel approaches beyond traditional blood protein biomarkers. While this study was limited to ovarian cancers of serous histological type and to those arising in BRCA1 mutation carriers specifically, we believe that the results are relevant to other hereditary serous cancers and to sporadic ovarian cancers. A similar approach could be applied to other cancers to aid in defining their early natural history and to guide rational design of an early detection strategy.
Please see later in the article for Editors' Summary
Editors' Summary
Background
Every year about 190,000 women develop ovarian cancer and more than 140,000 die from the disease. Ovarian cancer occurs when a cell on the surface of the ovaries (two small organs in the pelvis that produce eggs) or in the Fallopian tubes (which connect the ovaries to the womb) acquires genetic changes (mutations) that allow it to grow uncontrollably and to spread around the body (metastasize). For women whose cancer is diagnosed when it is confined to the site of origin—ovary or Fallopian tube—(stage I disease), the outlook is good; 70%–80% of these women survive for at least 5 y. However, very few ovarian cancers are diagnosed this early. Usually, by the time the cancer causes symptoms (often only vague abdominal pain and mild digestive disturbances), it has spread into the pelvis (stage II disease), into the space around the gut, stomach, and liver (stage III disease), or to distant organs (stage IV disease). Patients with advanced-stage ovarian cancer are treated with surgery and chemotherapy but, despite recent treatment improvements, only 15% of women diagnosed with stage IV disease survive for 5 y.
Why Was This Study Done?
Most deaths from ovarian cancer are caused by serous ovarian cancer, a tumor subtype that is rarely diagnosed before it has spread. Early detection of serous ovarian cancer would save the lives of many women but no one knows what these cancers look like before they spread or how long they grow before they become clinically apparent. Learning about this occult (hidden) period of ovarian cancer development by observing tumors from their birth to late-stage disease is not feasible. However, some aspects of the early natural history of ovarian cancer can be studied by using data collected from healthy women who have had their ovaries and Fallopian tubes removed (prophylactic bilateral salpingo-oophorectomy [PBSO]) because they have inherited a mutated version of the BRCA1 gene that increases their ovarian cancer risk. In a few of these women, unsuspected ovarian cancer is discovered during PBSO. In this study, the researchers identify and analyze the available reports on occult serous ovarian cancer found this way and then develop mathematical models describing the early natural history of ovarian cancer.
What Did the Researchers Do and Find?
The researchers first estimated the time period during which the detection of occult tumors might save lives using the data from these reports. Serous ovarian cancers, they estimated, spend more than 4 y as in situ (a very early stage of cancer development), stage I, or stage II cancers and about 1 y as stage III and IV cancers before they become clinically apparent. Next, the researchers used the data to develop mathematical models for the growth, progression, and diagnosis of serous ovarian cancer (the accuracy of which depends on the assumptions used to build the models and on the quality of the data fed into them). These models indicated that, for most of the occult period, serous cancers had a diameter of less than 1 cm (too small to be detected during surgery or by gross examination of the ovaries or Fallopian tubes) and that more than half of serous cancers had advanced to stage III/IV by the time they measured 3 cm across. Furthermore, to enable the detection of half of serous ovarian cancers before they reached stage III, an annual screening test would need to detect cancers with a diameter of 1.3 cm and to halve deaths from serous ovarian cancer, an annual screening test would need to detect 0.5-cm diameter tumors.
What Do These Findings Mean?
These findings suggest that the time period over which the early detection of serous ovarian cancer would save lives is surprisingly long. More soberingly, the authors find that a test that is sensitive and specific enough to take advantage of this “window of opportunity” would need to detect tumors hundreds of times smaller than clinically apparent serous cancers. So far no ovarian cancer-specific protein or other biomarker has been identified that could be used to develop a test that comes anywhere near this level of performance. Identification of truly ovarian cancer-specific biomarkers or novel strategies will be needed in order to take advantage of the window of opportunity. The stages prior to clinical presentation of other lethal cancers are still very poorly understood. Similar studies of the early natural history of these cancers could help guide the development of rational early detection strategies.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000114.
The US National Cancer Institute provides a brief description of what cancer is and how it develops and information on all aspects of ovarian cancer for patients and professionals. It also provides a fact sheet on BRCA1 mutations and cancer risk (in English and Spanish)
The UK charity Cancerbackup also provides information about all aspects of ovarian cancer
MedlinePlus provides a list of links to additional information about ovarian cancer (in English and Spanish)
The Canary Foundation is a nonprofit organization dedicated to development of effective strategies for early detection of cancers including ovarian cancer.
doi:10.1371/journal.pmed.1000114
PMCID: PMC2711307  PMID: 19636370
17.  Core module biomarker identification with network exploration for breast cancer metastasis 
BMC Bioinformatics  2012;13:12.
Background
In a complex disease, the expression of many genes can be significantly altered, leading to the appearance of a differentially expressed "disease module". Some of these genes directly correspond to the disease phenotype, (i.e. "driver" genes), while others represent closely-related first-degree neighbours in gene interaction space. The remaining genes consist of further removed "passenger" genes, which are often not directly related to the original cause of the disease. For prognostic and diagnostic purposes, it is crucial to be able to separate the group of "driver" genes and their first-degree neighbours, (i.e. "core module") from the general "disease module".
Results
We have developed COMBINER: COre Module Biomarker Identification with Network ExploRation. COMBINER is a novel pathway-based approach for selecting highly reproducible discriminative biomarkers. We applied COMBINER to three benchmark breast cancer datasets for identifying prognostic biomarkers. COMBINER-derived biomarkers exhibited 10-fold higher reproducibility than other methods, with up to 30-fold greater enrichment for known cancer-related genes, and 4-fold enrichment for known breast cancer susceptible genes. More than 50% and 40% of the resulting biomarkers were cancer and breast cancer specific, respectively. The identified modules were overlaid onto a map of intracellular pathways that comprehensively highlighted the hallmarks of cancer. Furthermore, we constructed a global regulatory network intertwining several functional clusters and uncovered 13 confident "driver" genes of breast cancer metastasis.
Conclusions
COMBINER can efficiently and robustly identify disease core module genes and construct their associated regulatory network. In the same way, it is potentially applicable in the characterization of any disease that can be probed with microarrays.
doi:10.1186/1471-2105-13-12
PMCID: PMC3349569  PMID: 22257533
18.  Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment 
PLoS Computational Biology  2013;9(3):e1002975.
Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox's proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by or . This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://compbio.cs.umn.edu/Net-Cox/.
Author Summary
Network-based computational models are attracting increasing attention in studying cancer genomics because molecular networks provide valuable information on the functional organizations of molecules in cells. Survival analysis mostly with the Cox proportional hazard model is widely used to predict or correlate gene expressions with time to an event of interest (outcome) in cancer genomics. Surprisingly, network-based survival analysis has not received enough attention. In this paper, we studied resistance to chemotherapy in ovarian cancer with a network-based Cox model, called Net-Cox. The experiments confirm that networks representing gene co-expression or functional relations can be used to improve the accuracy and the robustness of survival prediction of outcome in ovarian cancer treatment. The study also revealed subnetwork signatures that are enriched by extracellular matrix receptors and modulators and the downstream nuclear signaling components of extracellular signal-regulators, respectively. In particular, FBN1, which was detected as a signature gene of high confidence by Net-Cox with network information, was validated as a biomarker for predicting early recurrence in platinum-sensitive ovarian cancer patients in laboratory.
doi:10.1371/journal.pcbi.1002975
PMCID: PMC3605061  PMID: 23555212
19.  Cancer Screening: A Mathematical Model Relating Secreted Blood Biomarker Levels to Tumor Sizes  
PLoS Medicine  2008;5(8):e170.
Background
Increasing efforts and financial resources are being invested in early cancer detection research. Blood assays detecting tumor biomarkers promise noninvasive and financially reasonable screening for early cancer with high potential of positive impact on patients' survival and quality of life. For novel tumor biomarkers, the actual tumor detection limits are usually unknown and there have been no studies exploring the tumor burden detection limits of blood tumor biomarkers using mathematical models. Therefore, the purpose of this study was to develop a mathematical model relating blood biomarker levels to tumor burden.
Methods and Findings
Using a linear one-compartment model, the steady state between tumor biomarker secretion into and removal out of the intravascular space was calculated. Two conditions were assumed: (1) the compartment (plasma) is well-mixed and kinetically homogenous; (2) the tumor biomarker consists of a protein that is secreted by tumor cells into the extracellular fluid compartment, and a certain percentage of the secreted protein enters the intravascular space at a continuous rate. The model was applied to two pathophysiologic conditions: tumor biomarker is secreted (1) exclusively by the tumor cells or (2) by both tumor cells and healthy normal cells. To test the model, a sensitivity analysis was performed assuming variable conditions of the model parameters. The model parameters were primed on the basis of literature data for two established and well-studied tumor biomarkers (CA125 and prostate-specific antigen [PSA]). Assuming biomarker secretion by tumor cells only and 10% of the secreted tumor biomarker reaching the plasma, the calculated minimally detectable tumor sizes ranged between 0.11 mm3 and 3,610.14 mm3 for CA125 and between 0.21 mm3 and 131.51 mm3 for PSA. When biomarker secretion by healthy cells and tumor cells was assumed, the calculated tumor sizes leading to positive test results ranged between 116.7 mm3 and 1.52 × 106 mm3 for CA125 and between 27 mm3 and 3.45 × 105 mm3 for PSA. One of the limitations of the study is the absence of quantitative data available in the literature on the secreted tumor biomarker amount per cancer cell in intact whole body animal tumor models or in cancer patients. Additionally, the fraction of secreted tumor biomarkers actually reaching the plasma is unknown. Therefore, we used data from published cell culture experiments to estimate tumor cell biomarker secretion rates and assumed a wide range of secretion rates to account for their potential changes due to field effects of the tumor environment.
Conclusions
This study introduced a linear one-compartment mathematical model that allows estimation of minimal detectable tumor sizes based on blood tumor biomarker assays. Assuming physiological data on CA125 and PSA from the literature, the model predicted detection limits of tumors that were in qualitative agreement with the actual clinical performance of both biomarkers. The model may be helpful in future estimation of minimal detectable tumor sizes for novel proteomic biomarker assays if sufficient physiologic data for the biomarker are available. The model may address the potential and limitations of tumor biomarkers, help prioritize biomarkers, and guide investments into early cancer detection research efforts.
Sanjiv Gambhir and colleagues describe a linear one-compartment mathematical model that allows estimation of minimal detectable tumor sizes based on blood tumor biomarker assays.
Editors' Summary
Background.
Cancers—disorganized masses of cells that can occur in any tissue—develop when cells acquire genetic changes that allow them to grow uncontrollably and to spread around the body (metastasize). If a cancer (tumor) is detected when it is small, surgery can often provide a cure. Unfortunately, many cancers (particularly those deep inside the body) are not detected until they are large enough to cause pain or other symptoms by pressing against surrounding tissue. By this time, it may be impossible to remove the original tumor surgically and there may be metastases scattered around the body. In such cases, radiotherapy and chemotherapy can sometimes help, but the outlook for patients whose cancers are detected late is often poor. Consequently, researchers are trying to develop early detection tests for different types of cancer. Many tumors release specific proteins—“cancer biomarkers”—into the blood and the hope is that it might be possible to find sets of blood biomarkers that detect cancers when they are still small and thus save many lives.
Why Was This Study Done?
For most biomarkers, it is not known how the amount of protein detected in the blood relates to tumor size or how sensitive the assays for biomarkers must be to improve patient survival. In this study, the researchers develop a “linear one-compartment” mathematical model to predict how large tumors need to be before blood biomarkers can be used to detect them and test this model using published data on two established cancer biomarkers—CA125 and prostate-specific antigen (PSA). CA125 is used to monitor the progress of patients with ovarian cancer after treatment; ovarian cancer is rarely diagnosed in its early stages and only one-fourth of women with advanced disease survive for 5 y after diagnosis. PSA is used to screen for prostate cancer and has increased the detection of this cancer in its early stages when it is curable.
What Did the Researchers Do and Find?
To develop a model that relates secreted blood biomarker levels to tumor sizes, the researchers assumed that biomarkers mix evenly throughout the patient's blood, that cancer cells secrete biomarkers into the fluid that surrounds them, that 0.1%–20% of these secreted proteins enter the blood at a continuous rate, and that biomarkers are continuously removed from the blood. The researchers then used their model to calculate the smallest tumor sizes that might be detectable with these biomarkers by feeding in existing data on CA125 and on PSA, including assay detection limits and the biomarker secretion rates of cancer cells growing in dishes. When only tumor cells secreted the biomarker and 10% of the secreted biomarker reach the blood, the model predicted that ovarian tumors between 0.11 mm3 (smaller than a grain of salt) and nearly 4,000 mm3 (about the size of a cherry) would be detectable by measuring CA125 blood levels (the range was determined by varying the amount of biomarker secreted by the tumor cells and the assay sensitivity); for prostate cancer, the detectable tumor sizes ranged from similar lower size to about 130 mm3 (pea-sized). However, healthy cells often also secrete small quantities of cancer biomarkers. With this condition incorporated into the model, the estimated detectable tumor sizes (or total tumor burden including metastases) ranged between grape-sized and melon-sized for ovarian cancers and between pea-sized to about grapefruit-sized for prostate cancers.
What Do These Findings Mean?
The accuracy of the calculated tumor sizes provided by the researchers' mathematical model is limited by the lack of data on how tumors behave in the human body and by the many assumptions incorporated into the model. Nevertheless, the model predicts detection limits for ovarian and prostate cancer that broadly mirror the clinical performance of both biomarkers. Somewhat worryingly, the model also indicates that a tumor may have to be very large for blood biomarkers to reveal its presence, a result that could limit the clinical usefulness of biomarkers, especially if they are secreted not only by tumor cells but also by healthy cells. Given this finding, as more information about how biomarkers behave in the human body becomes available, this model (and more complex versions of it) should help researchers decide which biomarkers are likely to improve early cancer detection and patient outcomes.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050170.
The US National Cancer Institute provides a brief description of what cancer is and how it develops and a fact sheet on tumor markers; it also provides information on all aspects of ovarian and prostate cancer for patients and professionals, including information on screening and testing (in English and Spanish)
The UK charity Cancerbackup also provides general information about cancer and more specific information about ovarian and prostate cancer, including the use of CA125 and PSA for screening and follow-up
The American Society of Clinical Oncology offers a wide range of information on various cancer types, including online published articles on the current status of cancer diagnosis and management from the educational book developed by the annual meeting faculty and presenters. Registration is mandatory, but information is free
doi:10.1371/journal.pmed.0050170
PMCID: PMC2517618  PMID: 18715113
20.  Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity 
PLoS ONE  2009;4(12):e8161.
With the advent of high-throughput technologies for measuring genome-wide expression profiles, a large number of methods have been proposed for discovering diagnostic markers that can accurately discriminate between different classes of a disease. However, factors such as the small sample size of typical clinical data, the inherent noise in high-throughput measurements, and the heterogeneity across different samples, often make it difficult to find reliable gene markers. To overcome this problem, several studies have proposed the use of pathway-based markers, instead of individual gene markers, for building the classifier. Given a set of known pathways, these methods estimate the activity level of each pathway by summarizing the expression values of its member genes, and use the pathway activities for classification. It has been shown that pathway-based classifiers typically yield more reliable results compared to traditional gene-based classifiers. In this paper, we propose a new classification method based on probabilistic inference of pathway activities. For a given sample, we compute the log-likelihood ratio between different disease phenotypes based on the expression level of each gene. The activity of a given pathway is then inferred by combining the log-likelihood ratios of the constituent genes. We apply the proposed method to the classification of breast cancer metastasis, and show that it achieves higher accuracy and identifies more reproducible pathway markers compared to several existing pathway activity inference methods.
doi:10.1371/journal.pone.0008161
PMCID: PMC2781165  PMID: 19997592
21.  An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer 
Genome Biology  2007;8(8):R157.
A feature selection method was used in an analysis of three major microarray expression datasets to identify molecular subclasses and prognostic markers in estrogen receptor-negative breast cancer, showing that it is a heterogeneous disease with at least four main subtypes.
Background
Estrogen receptor (ER)-negative breast cancer specimens are predominantly of high grade, have frequent p53 mutations, and are broadly divided into HER2-positive and basal subtypes. Although ER-negative disease has overall worse prognosis than does ER-positive breast cancer, not all ER-negative breast cancer patients have poor clinical outcome. Reliable identification of ER-negative tumors that have a good prognosis is not yet possible.
Results
We apply a recently proposed feature selection method in an integrative analysis of three major microarray expression datasets to identify molecular subclasses and prognostic markers in ER-negative breast cancer. We find a subclass of basal tumors, characterized by over-expression of immune response genes, which has a better prognosis than the rest of ER-negative breast cancers. Moreover, we show that, in contrast to ER-positive tumours, the majority of prognostic markers in ER-negative breast cancer are over-expressed in the good prognosis group and are associated with activation of complement and immune response pathways. Specifically, we identify an immune response related seven-gene module and show that downregulation of this module confers greater risk for distant metastasis (hazard ratio 2.02, 95% confidence interval 1.2-3.4; P = 0.009), independent of lymph node status and lymphocytic infiltration. Furthermore, we validate the immune response module using two additional independent datasets.
Conclusion
We show that ER-negative basal breast cancer is a heterogeneous disease with at least four main subtypes. Furthermore, we show that the heterogeneity in clinical outcome of ER-negative breast cancer is related to the variability in expression levels of complement and immune response pathway genes, independent of lymphocytic infiltration.
doi:10.1186/gb-2007-8-8-r157
PMCID: PMC2374988  PMID: 17683518
22.  Breast Cancer DNA Methylation Profiles Are Associated with Tumor Size and Alcohol and Folate Intake 
PLoS Genetics  2010;6(7):e1001043.
Although tumor size and lymph node involvement are the current cornerstones of breast cancer prognosis, they have not been extensively explored in relation to tumor methylation attributes in conjunction with other tumor and patient dietary and hormonal characteristics. Using primary breast tumors from 162 (AJCC stage I–IV) women from the Kaiser Division of Research Pathways Study and the Illumina GoldenGate methylation bead-array platform, we measured 1,413 autosomal CpG loci associated with 773 cancer-related genes and validated select CpG loci with Sequenom EpiTYPER. Tumor grade, size, estrogen and progesterone receptor status, and triple negative status were significantly (Q-values <0.05) associated with altered methylation of 209, 74, 183, 69, and 130 loci, respectively. Unsupervised clustering, using a recursively partitioned mixture model (RPMM), of all autosomal CpG loci revealed eight distinct methylation classes. Methylation class membership was significantly associated with patient race (P<0.02) and tumor size (P<0.001) in univariate tests. Using multinomial logistic regression to adjust for potential confounders, patient age and tumor size, as well as known disease risk factors of alcohol intake and total dietary folate, were all significantly (P<0.0001) associated with methylation class membership. Breast cancer prognostic characteristics and risk-related exposures appear to be associated with gene-specific tumor methylation, as well as overall methylation patterns.
Author Summary
The current standard prognostic indicator for breast cancer is tumor-node-metastasis staging; though, as population-based studies and clinical trials are conducted, molecular characterization of disease is beginning to allow improved markers of prognosis and assist clinicians in choosing the most appropriate therapies. We investigated DNA methylation profiles in over 160 well annotated breast tumor samples and found significant relationships with standard and other known predictors of prognosis, as well as established risk factors for disease: alcohol intake and dietary folate. Recently the United States National Cancer Institute Cancer Biomarkers Research Group articulated a need for a “Strategic Approach to Validating Methylated Genes as Biomarkers for Breast Cancer,” and our work is extremely responsive to this call for a national strategy. Recognizing the increasing use of pre-operative chemotherapy for patients with operable, early-stage disease, there is added complexity in breast cancer staging. Since chemotherapy can considerably decrease tumor size, it is still unclear whether pre-operative or post-operative stage best informs prognosis and treatment decisions for patients electing pre-operative chemotherapy. However, our data clearly illustrate the promise of tumor DNA methylation for augmenting tumor staging and can be attained with minimal tissue in a pre-operative context.
doi:10.1371/journal.pgen.1001043
PMCID: PMC2912395  PMID: 20686660
23.  Mining expressed sequence tags identifies cancer markers of clinical interest 
BMC Bioinformatics  2006;7:481.
Background
Gene expression data are a rich source of information about the transcriptional dis-regulation of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers.
Results
We present an approach to mine expressed sequence tags to discover cancer biomarkers. A false discovery rate analysis suggests that the approach generates less than 22% false discoveries when applied to combined human and mouse whole genome screens. With this approach, we identify the 200 genes most consistently differentially expressed in cancer (called HM200) and proceed to characterize these genes. When used for prediction in a variety of cancer classification tasks (in 24 independent cancer microarray datasets, 59 classifications total), we show that HM200 and the shorter gene list HM100 are very competitive cancer biomarker sets. Indeed, when compared to 13 published cancer marker gene lists, HM200 achieves the best or second best classification performance in 79% of the classifications considered.
Conclusion
These results indicate the existence of at least one general cancer marker set whose predictive value spans several tumor types and classification types. Our comparison with other marker gene lists shows that HM200 markers are mostly novel cancer markers. We also identify the previously published Pomeroy-400 list as another general cancer marker set. Strikingly, Pomeroy-400 has 27 genes in common with HM200. Our data suggest that a core set of genes are responsive to the deregulation of pathways involved in tumorigenesis in a variety of tumor types and that these genes could serve as transcriptional cancer markers in applications of clinical interest. Finally, our study suggests new strategies to select and evaluate cancer biomarkers in microarray studies.
doi:10.1186/1471-2105-7-481
PMCID: PMC1635568  PMID: 17078886
24.  Module-Based Breast Cancer Classification 
The reliability and reproducibility of gene biomarkers for classification of cancer patients has been challenged due to measurement noise and biological heterogeneity among patients. In this paper, we propose a novel module-based feature selection framework, which integrates biological network information and gene expression data to identify biomarkers not as individual genes but as functional modules. Results from four breast cancer studies demonstrate that the identified module biomarkers i) achieve higher classification accuracy in independent validation datasets; ii) are more reproducible than individual gene markers; iii) improve the biological interpretability of results; and iv) are enriched in cancer “disease drivers”.
PMCID: PMC3736598  PMID: 23819260
Cancer biomarkers; systems biology; feature selection; disease classification
25.  Clinical relevance of breast cancer-related genes as potential biomarkers for oral squamous cell carcinoma 
BMC Cancer  2014;14:324.
Background
Squamous cell carcinoma of the oral cavity (OSCC) is a common cancer form with relatively low 5-year survival rates, due partially to late detection and lack of complementary molecular markers as targets for treatment. Molecular profiling of head and neck cancer has revealed biological similarities with basal-like breast and lung carcinoma. Recently, we showed that 16 genes were consistently altered in invasive breast tumors displaying varying degrees of aggressiveness.
Methods
To extend our findings from breast cancer to another cancer type with similar characteristics, we performed an integrative analysis of transcriptomic and proteomic data to evaluate the prognostic significance of the 16 putative breast cancer-related biomarkers in OSCC using independent microarray datasets and immunohistochemistry. Predictive models for disease-specific (DSS) and/or overall survival (OS) were calculated for each marker using Cox proportional hazards models.
Results
We found that CBX2, SCUBE2, and STK32B protein expression were associated with important clinicopathological features for OSCC (peritumoral inflammatory infiltration, metastatic spread to the cervical lymph nodes, and tumor size). Consequently, SCUBE2 and STK32B are involved in the hedgehog signaling pathway which plays a pivotal role in metastasis and angiogenesis in cancer. In addition, CNTNAP2 and S100A8 protein expression were correlated with DSS and OS, respectively.
Conclusions
Taken together, these candidates and the hedgehog signaling pathway may be putative targets for drug development and clinical management of OSCC patients.
doi:10.1186/1471-2407-14-324
PMCID: PMC4031971  PMID: 24885002
Oral squamous cell carcinoma; Outcome prediction; Molecular biomarker; Immunohistochemistry; Model validation

Results 1-25 (1055837)