Search tips
Search criteria

Results 1-25 (962862)

Clipboard (0)

Related Articles

1.  A large, consistent plasma proteomics data set from prospectively collected breast cancer patient and healthy volunteer samples 
Variability of plasma sample collection and of proteomics technology platforms has been detrimental to generation of large proteomic profile datasets from human biospecimens.
We carried out a clinical trial-like protocol to standardize collection of plasma from 204 healthy and 216 breast cancer patient volunteers. The breast cancer patients provided follow up samples at 3 month intervals. We generated proteomics profiles from these samples with a stable and reproducible platform for differential proteomics that employs a highly consistent nanofabricated ChipCube™ chromatography system for peptide detection and quantification with fast, single dimension mass spectrometry (LC-MS). Protein identification is achieved with subsequent LC-MS/MS analysis employing the same ChipCube™ chromatography system.
With this consistent platform, over 800 LC-MS plasma proteomic profiles from prospectively collected samples of 420 individuals were obtained. Using a web-based data analysis pipeline for LC-MS profiling data, analyses of all peptide peaks from these plasma LC-MS profiles reveals an average coefficient of variability of less than 15%. Protein identification of peptide peaks of interest has been achieved with subsequent LC-MS/MS analyses and by referring to a spectral library created from about 150 discrete LC-MS/MS runs. Verification of peptide quantity and identity is demonstrated with several Multiple Reaction Monitoring analyses. These plasma proteomic profiles are publicly available through ProteomeCommons.
From a large prospective cohort of healthy and breast cancer patient volunteers and using a nano-fabricated chromatography system, a consistent LC-MS proteomics dataset has been generated that includes more than 800 discrete human plasma profiles. This large proteomics dataset provides an important resource in support of breast cancer biomarker discovery and validation efforts.
PMCID: PMC3120690  PMID: 21619653
2.  Precision of Multiple Reaction Monitoring Mass Spectrometry Analysis of Formalin-Fixed, Paraffin-Embedded Tissue 
Journal of Proteome Research  2012;11(6):3498-3505.
We compared the reproducibility of multiple reaction monitoring (MRM) mass spectrometry-based peptide quantitation in tryptic digests from formalin-fixed, paraffin-embedded (FFPE) and frozen clear cell renal cell carcinoma tissues. The analyses targeted a candidate set of 114 peptides previously identified in shotgun proteomic analyses, of which 104 were detectable in FFPE and frozen tissue. Although signal intensities for MRM of peptides from FFPE tissue were on average 66% of those in frozen tissue, median coefficients of variation (CV) for measurements in FFPE and frozen tissues were nearly identical (18–20%). Measurements of lysine C-terminal peptides and arginine C-terminal peptides from FFPE tissue were similarly reproducible (19.5% and 18.3% median CV, respectively). We further evaluated the precision of MRM-based quantitation by analysis of peptides from the Her2 receptor in FFPE and frozen tissues from a Her2 overexpressing mouse xenograft model of breast cancer and in human FFPE breast cancer specimens. We obtained equivalent MRM measurements of HER2 receptor levels in FFPE and frozen mouse xenografts derived from HER2-overexpressing BT474 cells and HER2-negative Sum159 cells. MRM analyses of 5 HER2-positive and 5 HER-negative human FFPE breast tumors confirmed the results of immunohistochemical analyses, thus demonstrating the feasibility of HER2 protein quantification in FFPE tissue specimens. The data demonstrate that MRM analyses can be performed with equal precision on FFPE and frozen tissues and that lysine-containing peptides can be selected for quantitative comparisons, despite the greater impact of formalin fixation on lysine residues. The data further illustrate the feasibility of applying MRM to quantify clinically important tissue biomarkers in FFPE specimens.
PMCID: PMC3368395  PMID: 22530795
formalin-fixed; paraffin-embedded tissue; multiple reaction monitoring; breast cancer; biomarkers; HER2
3.  The Standard Protein Mix Database: A Diverse Dataset to Assist in the Production of Improved Peptide and Protein Identification Software Tools 
Journal of proteome research  2007;7(1):96-103.
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training datasets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last five years, we sought to generate a dataset of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the “ISB standard protein mix”, using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF) and two MALDI-TOF-TOF platforms. The resulting dataset, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at
PMCID: PMC2577160  PMID: 17711323
Proteomics; reference dataset; database search software; standard protein mix; Standard Protein Mix Database
4.  Targeted Strategy for Selective Identification of Secreted Breast Tumor Proteins in Plasma Using Mouse Xenograft Models 
Early detection of breast cancer is associated with improved patient survival. While early disease is commonly identified by patient self-examination and breast mammography, interpretation of these findings are highly subjective and often require significant disease burden to achieve sensitivity. Cancer screening utilizing blood-based assays, such as measurement of prostate-specific antigen (PSA) abundance for prostate cancer, has proven to be a minimally invasive method that aids in detecting early disease. The generation of a blood-based assay for the detection of early disease in breast cancer would enable more facile disease diagnosis and thus expedite patient care.
The discovery of proteins actively shed or secreted by tumor cells into blood plasma by global proteomic analyses has proven analytically challenging, due mainly to the large dynamic range of protein abundances in blood. Common methods to enrich for tumor-specific proteins include depletion of abundant proteins from plasma samples, such as albumin and immunoglobulins. Furthermore, strategies are needed to detect blood-based candidates derived specifically from tumor cell populations to provide high-confidence candidates for further validation efforts.
To this end, we have developed a method combining global proteomic analyses of plasma collected from a mouse xenograft model of primary human breast cancer with post-data acquisition filtering of species-specific peptide search results. Primary xenograft models enable analyses of human tumor tissue in non-native biological backgrounds. Therefore, species-specific protein and gene sequences can be exploited in discovery efforts to selectively identify tumor cell-specific characteristics. Preliminary studies of plasma analyzed from xenograft-bearing mice have resulted in the identification of human-specific peptides corresponding to proteins previously described as being secreted from breast tissue and associated with breast cancer pathogenesis. Application of this strategy to proteomic analyses from a cohort of xenograft mice bearing HER2+ and triple negative breast cancer tissues will be presented.
PMCID: PMC3630706
5.  A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet 
BMC Bioinformatics  2012;13(Suppl 16):S1.
PeptideProphet is a post-processing algorithm designed to evaluate the confidence in identifications of MS/MS spectra returned by a database search. In this manuscript we describe the "what and how" of PeptideProphet in a manner aimed at statisticians and life scientists who would like to gain a more in-depth understanding of the underlying statistical modeling. The theory and rationale behind the mixture-modeling approach taken by PeptideProphet is discussed from a statistical model-building perspective followed by a description of how a model can be used to express confidence in the identification of individual peptides or sets of peptides. We also demonstrate how to evaluate the quality of model fit and select an appropriate model from several available alternatives. We illustrate the use of PeptideProphet in association with the Trans-Proteomic Pipeline, a free suite of software used for protein identification.
PMCID: PMC3489532  PMID: 23176103
6.  Discovery of pathway biomarkers from coupled proteomics and systems biology methods 
BMC Genomics  2010;11(Suppl 2):S12.
Breast cancer is worldwide the second most common type of cancer after lung cancer. Plasma proteome profiling may have a higher chance to identify protein changes between plasma samples such as normal and breast cancer tissues. Breast cancer cell lines have long been used by researches as model system for identifying protein biomarkers. A comparison of the set of proteins which change in plasma with previously published findings from proteomic analysis of human breast cancer cell lines may identify with a higher confidence a subset of candidate protein biomarker.
In this study, we analyzed a liquid chromatography (LC) coupled tandem mass spectrometry (MS/MS) proteomics dataset from plasma samples of 40 healthy women and 40 women diagnosed with breast cancer. Using a two-sample t-statistics and permutation procedure, we identified 254 statistically significant, differentially expressed proteins, among which 208 are over-expressed and 46 are under-expressed in breast cancer plasma. We validated this result against previously published proteomic results of human breast cancer cell lines and signaling pathways to derive 25 candidate protein biomarkers in a panel. Using the pathway analysis, we observed that the 25 “activated” plasma proteins were present in several cancer pathways, including ‘Complement and coagulation cascades’, ‘Regulation of actin cytoskeleton’, and ‘Focal adhesion’, and match well with previously reported studies. Additional gene ontology analysis of the 25 proteins also showed that cellular metabolic process and response to external stimulus (especially proteolysis and acute inflammatory response) were enriched functional annotations of the proteins identified in the breast cancer plasma samples. By cross-validation using two additional proteomics studies, we obtained 86% and 83% similarities in pathway-protein matrix between the first study and the two testing studies, which is much better than the similarity we measured with proteins.
We presented a ‘systems biology’ method to identify, characterize, analyze and validate panel biomarkers in breast cancer proteomics data, which includes 1) t statistics and permutation process, 2) network, pathway and function annotation analysis, and 3) cross-validation of multiple studies. Our results showed that the systems biology approach is essential to the understanding molecular mechanisms of panel protein biomarkers.
PMCID: PMC2975409  PMID: 21047379
7.  Adaptive Discriminant Function Analysis and Re-ranking of MS/MS Database Search Results for Improved Peptide Identification in Shotgun Proteomics 
Journal of proteome research  2008;7(11):4878-4889.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum dataset. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments.
PMCID: PMC3744223  PMID: 18788775
Tandem Mass Spectrometry; Database searching; Peptide Identification; Statistical Modeling; Adaptive Discriminant Analysis; Mass Accuracy; Decoy Sequences
8.  Risk Prediction for Breast, Endometrial, and Ovarian Cancer in White Women Aged 50 y or Older: Derivation and Validation from Population-Based Cohort Studies 
PLoS Medicine  2013;10(7):e1001492.
Ruth Pfeiffer and colleagues describe models to calculate absolute risks for breast, endometrial, and ovarian cancers for white, non-Hispanic women over 50 years old using easily obtainable risk factors.
Please see later in the article for the Editors' Summary
Breast, endometrial, and ovarian cancers share some hormonal and epidemiologic risk factors. While several models predict absolute risk of breast cancer, there are few models for ovarian cancer in the general population, and none for endometrial cancer.
Methods and Findings
Using data on white, non-Hispanic women aged 50+ y from two large population-based cohorts (the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial [PLCO] and the National Institutes of Health–AARP Diet and Health Study [NIH-AARP]), we estimated relative and attributable risks and combined them with age-specific US-population incidence and competing mortality rates. All models included parity. The breast cancer model additionally included estrogen and progestin menopausal hormone therapy (MHT) use, other MHT use, age at first live birth, menopausal status, age at menopause, family history of breast or ovarian cancer, benign breast disease/biopsies, alcohol consumption, and body mass index (BMI); the endometrial model included menopausal status, age at menopause, BMI, smoking, oral contraceptive use, MHT use, and an interaction term between BMI and MHT use; the ovarian model included oral contraceptive use, MHT use, and family history or breast or ovarian cancer. In independent validation data (Nurses' Health Study cohort) the breast and ovarian cancer models were well calibrated; expected to observed cancer ratios were 1.00 (95% confidence interval [CI]: 0.96–1.04) for breast cancer and 1.08 (95% CI: 0.97–1.19) for ovarian cancer. The number of endometrial cancers was significantly overestimated, expected/observed = 1.20 (95% CI: 1.11–1.29). The areas under the receiver operating characteristic curves (AUCs; discriminatory power) were 0.58 (95% CI: 0.57–0.59), 0.59 (95% CI: 0.56–0.63), and 0.68 (95% CI: 0.66–0.70) for the breast, ovarian, and endometrial models, respectively.
These models predict absolute risks for breast, endometrial, and ovarian cancers from easily obtainable risk factors and may assist in clinical decision-making. Limitations are the modest discriminatory ability of the breast and ovarian models and that these models may not generalize to women of other races.
Please see later in the article for the Editors' Summary
Editors' Summary
In 2008, just three types of cancer accounted for 10% of global cancer-related deaths. That year, about 460,000 women died from breast cancer (the most frequently diagnosed cancer among women and the fifth most common cause of cancer-related death). Another 140,000 women died from ovarian cancer, and 74,000 died from endometrial (womb) cancer (the 14th and 20th most common causes of cancer-related death, respectively). Although these three cancers originate in different tissues, they nevertheless share many risk factors. For example, current age, age at menarche (first period), and parity (the number of children a woman has had) are all strongly associated with breast, ovarian, and endometrial cancer risk. Because these cancers share many hormonal and epidemiological risk factors, a woman with a high breast cancer risk is also likely to have an above-average risk of developing ovarian or endometrial cancer.
Why Was This Study Done?
Several statistical models (for example, the Breast Cancer Risk Assessment Tool) have been developed that estimate a woman's absolute risk (probability) of developing breast cancer over the next few years or over her lifetime. Absolute risk prediction models are useful in the design of cancer prevention trials and can also help women make informed decisions about cancer prevention and treatment options. For example, a woman at high risk of breast cancer might decide to take tamoxifen for breast cancer prevention, but ideally she needs to know her absolute endometrial cancer risk before doing so because tamoxifen increases the risk of this cancer. Similarly, knowledge of her ovarian cancer risk might influence a woman's decision regarding prophylactic removal of her ovaries to reduce her breast cancer risk. There are few absolute risk prediction models for ovarian cancer, and none for endometrial cancer, so here the researchers develop models to predict the risk of these cancers and of breast cancer.
What Did the Researchers Do and Find?
Absolute risk prediction models are constructed by combining estimates for risk factors from cohorts with population-based incidence rates from cancer registries. Models are validated in an independent cohort by testing their ability to identify people with the disease in an independent cohort and their ability to predict the observed numbers of incident cases. The researchers used data on white, non-Hispanic women aged 50 years or older that were collected during two large prospective US cohort studies of cancer screening and of diet and health, and US cancer incidence and mortality rates provided by the Surveillance, Epidemiology, and End Results Program to build their models. The models all included parity as a risk factor, as well as other factors. The model for endometrial cancer, for example, also included menopausal status, age at menopause, body mass index (an indicator of the amount of body fat), oral contraceptive use, menopausal hormone therapy use, and an interaction term between menopausal hormone therapy use and body mass index. Individual women's risk for endometrial cancer calculated using this model ranged from 1.22% to 17.8% over the next 20 years depending on their exposure to various risk factors. Validation of the models using data from the US Nurses' Health Study indicated that the endometrial cancer model overestimated the risk of endometrial cancer but that the breast and ovarian cancer models were well calibrated—the predicted and observed risks for these cancers in the validation cohort agreed closely. Finally, the discriminatory power of the models (a measure of how well a model separates people who have a disease from people who do not have the disease) was modest for the breast and ovarian cancer models but somewhat better for the endometrial cancer model.
What Do These Findings Mean?
These findings show that breast, ovarian, and endometrial cancer can all be predicted using information on known risk factors for these cancers that is easily obtainable. Because these models were constructed and validated using data from white, non-Hispanic women aged 50 years or older, they may not accurately predict absolute risk for these cancers for women of other races or ethnicities. Moreover, the modest discriminatory power of the breast and ovarian cancer models means they cannot be used to decide which women should be routinely screened for these cancers. Importantly, however, these well-calibrated models should provide realistic information about an individual's risk of developing breast, ovarian, or endometrial cancer that can be used in clinical decision-making and that may assist in the identification of potential participants for research studies.
Additional Information
Please access these websites via the online version of this summary at
This study is further discussed in a PLOS Medicine Perspective by Lars Holmberg and Andrew Vickers
The US National Cancer Institute provides comprehensive information about cancer (in English and Spanish), including detailed information about breast cancer, ovarian cancer, and endometrial cancer;
Information on the Breast Cancer Risk Assessment Tool, the Surveillance, Epidemiology, and End Results Program, and on the prospective cohort study of screening and the diet and health study that provided the data used to build the models is also available on the NCI site
Cancer Research UK, a not-for-profit organization, provides information about cancer, including detailed information on breast cancer, ovarian cancer, and endometrial cancer
The UK National Health Service Choices website has information and personal stories about breast cancer, ovarian cancer, and endometrial cancer; the not-for-profit organization Healthtalkonline also provides personal stories about dealing with breast cancer and ovarian cancer
PMCID: PMC3728034  PMID: 23935463
9.  Extensive Mass Spectrometry-based Analysis of the Fission Yeast Proteome 
We report a high quality and system-wide proteome catalogue covering 71% (3,542 proteins) of the predicted genes of fission yeast, Schizosaccharomyces pombe, presenting the largest protein dataset to date for this important model organism. We obtained this high proteome and peptide (11.4 peptides/protein) coverage by a combination of extensive sample fractionation, high resolution Orbitrap mass spectrometry, and combined database searching using the iProphet software as part of the Trans-Proteomics Pipeline. All raw and processed data are made accessible in the S. pombe PeptideAtlas. The identified proteins showed no biases in functional properties and allowed global estimation of protein abundances. The high coverage of the PeptideAtlas allowed correlation with transcriptomic data in a system-wide manner indicating that post-transcriptional processes control the levels of at least half of all identified proteins. Interestingly, the correlation was not equally tight for all functional categories ranging from rs >0.80 for proteins involved in translation to rs <0.45 for signal transduction proteins. Moreover, many proteins involved in DNA damage repair could not be detected in the PeptideAtlas despite their high mRNA levels, strengthening the translation-on-demand hypothesis for members of this protein class. In summary, the extensive and publicly available S. pombe PeptideAtlas together with the generated proteotypic peptide spectral library will be a useful resource for future targeted, in-depth, and quantitative proteomic studies on this microorganism.
PMCID: PMC3675828  PMID: 23462206
10.  Tumor Microenvironment-Derived Proteins Dominate the Plasma Proteome Response During Breast Cancer Induction and Progression 
Cancer research  2011;71(15):5090-5100.
Tumor development relies upon essential contributions from the tumor microenvironment and host immune alterations. These contributions may inform the plasma proteome in a manner that could be exploited for cancer diagnosis and prognosis. In this study, we employed a systems biology approach to characterize the plasma proteome response in the inducible HER2/neu mouse model of breast cancer during tumor induction, progression and regression. Mass spectrometry data derived from ∼ 1.6 million spectra identified protein networks involved in wound healing, microenvironment and metabolism that coordinately changed during tumor development. The observed alterations developed prior to cancer detection, increased progressively with tumor growth, and reverted toward baseline with tumor regression. Gene expression and immunohistochemical analyses suggested that the cancer-associated plasma proteome was derived from transcriptional responses in the non-cancerous host tissues as well as the developing tumor. The proteomic signature was distinct from a non-specific response to inflammation. Overall, the developing tumor simultaneously engaged a number of innate physiological processes, including wound repair, immune response, coagulation and complement cascades, tissue remodeling and metabolic homeostasis that were all detectable in plasma. Our findings offer an integrated view of tumor development with relevance to plasma-based strategies to detect and diagnose cancer.
PMCID: PMC3148311  PMID: 21653680
11.  A Mouse to Human Search for Plasma Proteome Changes Associated with Pancreatic Tumor Development 
PLoS Medicine  2008;5(6):e123.
The complexity and heterogeneity of the human plasma proteome have presented significant challenges in the identification of protein changes associated with tumor development. Refined genetically engineered mouse (GEM) models of human cancer have been shown to faithfully recapitulate the molecular, biological, and clinical features of human disease. Here, we sought to exploit the merits of a well-characterized GEM model of pancreatic cancer to determine whether proteomics technologies allow identification of protein changes associated with tumor development and whether such changes are relevant to human pancreatic cancer.
Methods and Findings
Plasma was sampled from mice at early and advanced stages of tumor development and from matched controls. Using a proteomic approach based on extensive protein fractionation, we confidently identified 1,442 proteins that were distributed across seven orders of magnitude of abundance in plasma. Analysis of proteins chosen on the basis of increased levels in plasma from tumor-bearing mice and corroborating protein or RNA expression in tissue documented concordance in the blood from 30 newly diagnosed patients with pancreatic cancer relative to 30 control specimens. A panel of five proteins selected on the basis of their increased level at an early stage of tumor development in the mouse was tested in a blinded study in 26 humans from the CARET (Carotene and Retinol Efficacy Trial) cohort. The panel discriminated pancreatic cancer cases from matched controls in blood specimens obtained between 7 and 13 mo prior to the development of symptoms and clinical diagnosis of pancreatic cancer.
Our findings indicate that GEM models of cancer, in combination with in-depth proteomic analysis, provide a useful strategy to identify candidate markers applicable to human cancer with potential utility for early detection.
Samir Hanash and colleagues identify proteins that are increased at an early stage of pancreatic tumor development in a mouse model and may be a useful tool in detecting early tumors in humans.
Editors' Summary
Cancers are life-threatening, disorganized masses of cells that can occur anywhere in the human body. They develop when cells acquire genetic changes that allow them to grow uncontrollably and to spread around the body (metastasize). If a cancer is detected when it is still small and has not metastasized, surgery can often provide a cure. Unfortunately, many cancers are detected only when they are large enough to press against surrounding tissues and cause pain or other symptoms. By this time, surgical removal of the original (primary) tumor may be impossible and there may be secondary cancers scattered around the body. In such cases, radiotherapy and chemotherapy can sometimes help, but the outlook for patients whose cancers are detected late is often poor. One cancer type for which late detection is a particular problem is pancreatic adenocarcinoma. This cancer rarely causes any symptoms in its early stages. Furthermore, the symptoms it eventually causes—jaundice, abdominal and back pain, and weight loss—are seen in many other illnesses. Consequently, pancreatic cancer has usually spread before it is diagnosed, and most patients die within a year of their diagnosis.
Why Was This Study Done?
If a test could be developed to detect pancreatic cancer in its early stages, the lives of many patients might be extended. Tumors often release specific proteins—“cancer biomarkers”—into the blood, a bodily fluid that can be easily sampled. If a protein released into the blood by pancreatic cancer cells could be identified, it might be possible to develop a noninvasive screening test for this deadly cancer. In this study, the researchers use a “proteomic” approach to identify potential biomarkers for early pancreatic cancer. Proteomics is the study of the patterns of proteins made by an organism, tissue, or cell and of the changes in these patterns that are associated with various diseases.
What Did the Researchers Do and Find?
The researchers started their search for pancreatic cancer biomarkers by studying the plasma proteome (the proteins in the fluid portion of blood) of mice genetically engineered to develop cancers that closely resemble human pancreatic tumors. Through the use of two techniques called high-resolution mass spectrometry and acrylamide isotopic labeling, the researchers identified 165 proteins that were present in larger amounts in plasma collected from mice with early and/or advanced pancreatic cancer than in plasma from control mice. Then, to test whether any of these protein changes were relevant to human pancreatic cancer, the researchers analyzed blood samples collected from patients with pancreatic cancer. These samples, they report, contained larger amounts of some of these proteins than blood collected from patients with chronic pancreatitis, a condition that has similar symptoms to pancreatic cancer. Finally, using blood samples collected during a clinical trial, the Carotene and Retinol Efficacy Trial (a cancer-prevention study), the researchers showed that the measurement of five of the proteins present in increased amounts at an early stage of tumor development in the mouse model discriminated between people with pancreatic cancer and matched controls up to 13 months before cancer diagnosis.
What Do These Findings Mean?
These findings suggest that in-depth proteomic analysis of genetically engineered mouse models of human cancer might be an effective way to identify biomarkers suitable for the early detection of human cancers. Previous attempts to identify such biomarkers using human samples have been hampered by the many noncancer-related differences in plasma proteins that exist between individuals and by problems in obtaining samples from patients with early cancer. The use of a mouse model of human cancer, these findings indicate, can circumvent both of these problems. More specifically, these findings identify a panel of proteins that might allow earlier detection of pancreatic cancer and that might, therefore, extend the life of some patients who develop this cancer. However, before a routine screening test becomes available, additional markers will need to be identified and extensive validation studies in larger groups of patients will have to be completed.
Additional Information.
Please access these Web sites via the online version of this summary at
The MedlinePlus Encyclopedia has a page on pancreatic cancer (in English and Spanish). Links to further information are provided by MedlinePlus
The US National Cancer Institute has information about pancreatic cancer for patients and health professionals (in English and Spanish)
The UK charity Cancerbackup also provides information for patients about pancreatic cancer
The Clinical Proteomic Technologies for Cancer Initiative (a US National Cancer Institute initiative) provides a tutorial about proteomics and cancer and information on the Mouse Proteomic Technologies Initiative
PMCID: PMC2504036  PMID: 18547137
12.  Classification of HER2 Receptor Status in Breast Cancer Tissues by MALDI Imaging Mass Spectrometry 
Clinical laboratory testing for HER2 status in newly diagnosed, primary breast cancer tissues is critically important for therapeutic decision making. Matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS) is a powerful tool for investigating proteins through the direct and morphology-driven analysis of tissue sections. Unlike immunohistochemistry (IHC), MALDI-IMS enables the acquisition of complex protein expression profiles without any labeling. We hypothesized that MALDI-IMS may determine HER2 status directly from breast cancer tissues. Breast cancer tissues (n=48) predefined for HER2 status by IHC and fluorescence-in-situ-hybridization (FISH) were subjected to MALDI-IMS and protein profiles were obtained through direct analysis of tissue sections. Protein identification was performed by tissue micro-extraction and fractionation followed by top-down tandem mass spectrometry on a spherical ion trap with ETD. A discovery and an independent validation set were used to predict HER2 status by applying proteomic classification algorithms. We found that specific protein/peptide expression changes strongly correlated with the HER2 over expression (m/z 4740, 8404, 8419, 8455, 8570, 8607, 8626). Among these, we identified m/z 8404 as Cysteine-rich intestinal protein 1 (CRIP1). Of particular note, the proteomic signature was able to accurately define HER2-positive from HER2-negative tissues achieving high values for sensitivity of 83%, for specificity of 92% and an overall accuracy of 89% (95% CI: 65% to 99%). Our results underscore the potential of MALDI-IMS proteomic algorithms for morphology-driven tissue diagnostics such as HER2 testing and show that MALDI-IMS can reveal biologically significant molecular details from tissues which are not limited to traditional high-abundance proteins. CRIP1 is a cytosolic protein that is potentially useful for serum based diagnostics of HER2 if tissue leakage can be demonstrated.
PMCID: PMC2918067
13.  Design and utilization of the colorectal and pancreatic neoplasm virtual biorepository: An early detection research network initiative 
The Early Detection Research Network (EDRN) colorectal and pancreatic neoplasm virtual biorepository is a bioinformatics-driven system that provides high-quality clinicopathology-rich information for clinical biospecimens. This NCI-sponsored EDRN resource supports translational cancer research. The information model of this biorepository is based on three components: (a) development of common data elements (CDE), (b) a robust data entry tool and (c) comprehensive data query tools.
The aim of the EDRN initiative is to develop and sustain a virtual biorepository for support of translational research. High-quality biospecimens were accrued and annotated with pertinent clinical, epidemiologic, molecular and genomic information. A user-friendly annotation tool and query tool was developed for this purpose. The various components of this annotation tool include: CDEs are developed from the College of American Pathologists (CAP) Cancer Checklists and North American Association of Central Cancer Registries (NAACR) standards. The CDEs provides semantic and syntactic interoperability of the data sets by describing them in the form of metadata or data descriptor. The data entry tool is a portable and flexible Oracle-based data entry application, which is an easily mastered, web-based tool. The data query tool facilitates investigators to search deidentified information within the warehouse through a “point and click” interface thus enabling only the selected data elements to be essentially copied into a data mart using a dimensional-modeled structure from the warehouse’s relational structure.
The EDRN Colorectal and Pancreatic Neoplasm Virtual Biorepository database contains multimodal datasets that are available to investigators via a web-based query tool. At present, the database holds 2,405 cases and 2,068 tumor accessions. The data disclosure is strictly regulated by user’s authorization. The high-quality and well-characterized biospecimens have been used in different translational science research projects as well as to further various epidemiologic and genomics studies.
The EDRN Colorectal and Pancreatic Neoplasm Virtual Biorepository with a tangible translational biomedical informatics infrastructure facilitates translational research. The data query tool acts as a central source and provides a mechanism for researchers to efficiently query clinically annotated datasets and biospecimens that are pertinent to their research areas. The tool ensures patient health information protection by disclosing only deidentified data with Institutional Review Board and Health Insurance Portability and Accountability Act protocols.
PMCID: PMC2956178  PMID: 21031013
Colorectal and pancreatic neoplasm; tissue banking informatics
14.  A neural network approach to multi-biomarker panel discovery by high-throughput plasma proteomics profiling of breast cancer 
BMC Proceedings  2013;7(Suppl 7):S10.
In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection.
In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set).
Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.
PMCID: PMC4044889  PMID: 24565503
15.  Proteomic characterization of Her2/neu-overexpressing breast cancer cells 
Proteomics  2010;10(21):3800-3810.
The receptor tyrosine kinase HER2 is an oncogene amplified in invasive breast cancer and its overexpression in mammary epithelial cell lines is a strong determinant of a tumorigenic phenotype. Accordingly, HER2-overexpressing mammary tumors are commonly indicative of a poor prognosis in patients. Several quantitative proteomic studies have employed two-dimensional gel electrophoresis in combination with tandem mass spectrometry, which provides only limited information about the molecular mechanisms underlying HER2/neu signaling. In the present study, we used a SILAC-based approach to compare the proteomic profile of normal breast epithelial cells with that of Her2/neu-overexpressing mammary epithelial cells, isolated from primary mammary tumors arising in MMTV-Her2/neu transgenic mice. We identified 23 proteins with relevant annotated functions in breast cancer, showing a substantial differential expression. This included overexpression of creatine kinase, retinol-binding protein 1, thymosin beta 4 and tumor protein D52, which correlated with the tumorigenic phenotype of Her2-overexpressing cells. The differential expression pattern of two genes, gelsolin and retinol binding protein 1, was further validated in normal and tumor tissues. Finally, an in silico analysis of published cancer microarray datasets revealed a 23-gene signature which can be used to predict the probability of metastasis-free survival in breast cancer patients.
PMCID: PMC4327899  PMID: 20960451
Cancer biomarker; Her2; quantitative proteomics and SILAC
16.  A feedback framework for protein inference with peptides identified from tandem mass spectra 
Proteome Science  2012;10:68.
Protein inference is an important computational step in proteomics. There exists a natural nest relationship between protein inference and peptide identification, but these two steps are usually performed separately in existing methods. We believe that both peptide identification and protein inference can be improved by exploring such nest relationship.
In this study, a feedback framework is proposed to process peptide identification reports from search engines, and an iterative method is implemented to exemplify the processing of Sequest peptide identification reports according to the framework. The iterative method is verified on two datasets with known validity of proteins and peptides, and compared with ProteinProphet and PeptideProphet. The results have shown that not only can the iterative method infer more true positive and less false positive proteins than ProteinProphet, but also identify more true positive and less false positive peptides than PeptideProphet.
The proposed iterative method implemented according to the feedback framework can unify and improve the results of peptide identification and protein inference.
PMCID: PMC3776439  PMID: 23164319
17.  National Mesothelioma Virtual Bank: A standard based biospecimen and clinical data resource to enhance translational research 
BMC Cancer  2008;8:236.
Advances in translational research have led to the need for well characterized biospecimens for research. The National Mesothelioma Virtual Bank is an initiative which collects annotated datasets relevant to human mesothelioma to develop an enterprising biospecimen resource to fulfill researchers' need.
The National Mesothelioma Virtual Bank architecture is based on three major components: (a) common data elements (based on College of American Pathologists protocol and National North American Association of Central Cancer Registries standards), (b) clinical and epidemiologic data annotation, and (c) data query tools. These tools work interoperably to standardize the entire process of annotation. The National Mesothelioma Virtual Bank tool is based upon the caTISSUE Clinical Annotation Engine, developed by the University of Pittsburgh in cooperation with the Cancer Biomedical Informatics Grid™ (caBIG™, see ). This application provides a web-based system for annotating, importing and searching mesothelioma cases. The underlying information model is constructed utilizing Unified Modeling Language class diagrams, hierarchical relationships and Enterprise Architect software.
The database provides researchers real-time access to richly annotated specimens and integral information related to mesothelioma. The data disclosed is tightly regulated depending upon users' authorization and depending on the participating institute that is amenable to the local Institutional Review Board and regulation committee reviews.
The National Mesothelioma Virtual Bank currently has over 600 annotated cases available for researchers that include paraffin embedded tissues, tissue microarrays, serum and genomic DNA. The National Mesothelioma Virtual Bank is a virtual biospecimen registry with robust translational biomedical informatics support to facilitate basic science, clinical, and translational research. Furthermore, it protects patient privacy by disclosing only de-identified datasets to assure that biospecimens can be made accessible to researchers.
PMCID: PMC2533341  PMID: 18700971
18.  MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques 
BMC Genomics  2012;13(Suppl 5):S4.
The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs) needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity.
Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches.
Our approach not only enhances the computational performance, and thus the turn around time of MS-based experiments in proteomics, but also improves the information content with benefits of a higher proteome coverage. This improvement, for instance, increases the chance to identify important drug targets or biomarkers for drug development or molecular diagnostics.
PMCID: PMC3477001  PMID: 23095859
Machine learning; Bioinformatics; Peptide/protein identification; Shotgun proteomics; Phosphoproteomics; Tandem mass spectrometry
19.  Analysis of cancer risk and BRCA1 and BRCA2 mutation prevalence in the kConFab familial breast cancer resource 
Breast Cancer Research  2006;8(1):R12.
The Kathleen Cuningham Foundation Consortium for Research into Familial Breast Cancer (kConFab) is a multidisciplinary, collaborative framework for the investigation of familial breast cancer. Based in Australia, the primary aim of kConFab is to facilitate high-quality research by amassing a large and comprehensive resource of epidemiological and clinical data with biospecimens from individuals at high risk of breast and/or ovarian cancer, and from their close relatives.
Epidemiological, family history and lifestyle data, as well as biospecimens, are collected from multiple-case breast cancer families ascertained through family cancer clinics in Australia and New Zealand. We used the Tyrer-Cuzick algorithms to assess the prospective risk of breast cancer in women in the kConFab cohort who were unaffected with breast cancer at the time of enrolment in the study.
Of kConFab's first 822 families, 518 families had multiple cases of female breast cancer alone, 239 had cases of female breast and ovarian cancer, 37 had cases of female and male breast cancer, and 14 had both ovarian cancer as well as male and female breast cancer. Data are currently held for 11,422 people and germline DNAs for 7,389. Among the 812 families with at least one germline sample collected, the mean number of germline DNA samples collected per family is nine. Of the 747 families that have undergone some form of mutation screening, 229 (31%) carry a pathogenic or splice-site mutation in BRCA1 or BRCA2. Germline DNAs and data are stored from 773 proven carriers of BRCA1 or BRCA1 mutations. kConFab's fresh tissue bank includes 253 specimens of breast or ovarian tissue – both normal and malignant – including 126 from carriers of BRCA1 or BRCA2 mutations.
These kConFab resources are available to researchers anywhere in the world, who may apply to kConFab for biospecimens and data for use in ethically approved, peer-reviewed projects. A high calculated risk from the Tyrer-Cuzick algorithms correlated closely with the subsequent occurrence of breast cancer in BRCA1 and BRCA2 mutation positive families, but this was less evident in families in which no pathogenic BRCA1 or BRCA2 mutation has been detected.
PMCID: PMC1413975  PMID: 16507150
20.  A decade of experience in the development and implementation of tissue banking informatics tools for intra and inter-institutional translational research 
Tissue banking informatics deals with standardized annotation, collection and storage of biospecimens that can further be shared by researchers. Over the last decade, the Department of Biomedical Informatics (DBMI) at the University of Pittsburgh has developed various tissue banking informatics tools to expedite translational medicine research. In this review, we describe the technical approach and capabilities of these models.
Clinical annotation of biospecimens requires data retrieval from various clinical information systems and the de-identification of the data by an honest broker. Based upon these requirements, DBMI, with its collaborators, has developed both Oracle-based organ-specific data marts and a more generic, model-driven architecture for biorepositories. The organ-specific models are developed utilizing Oracle server tools and software applications and the model-driven architecture is implemented in a J2EE framework.
The organ-specific biorepositories implemented by DBMI include the Cooperative Prostate Cancer Tissue Resource (, Pennsylvania Cancer Alliance Bioinformatics Consortium (, EDRN Colorectal and Pancreatic Neoplasm Database ( and Specialized Programs of Research Excellence (SPORE) Head and Neck Neoplasm Database ( The model-based architecture is represented by the National Mesothelioma Virtual Bank ( These biorepositories provide thousands of well annotated biospecimens for the researchers that are searchable through query interfaces available via the Internet.
These systems, developed and supported by our institute, serve to form a common platform for cancer research to accelerate progress in clinical and translational research. In addition, they provide a tangible infrastructure and resource for exposing research resources and biospecimen services in collaboration with the clinical anatomic pathology laboratory information system (APLIS) and the cancer registry information systems.
PMCID: PMC2941965  PMID: 20922029
Tissue banking informatics; information models for translational research
21.  Development of Automated SISCAPA Assays for High-Throughput Quantitation of Protein Biomarkers 
Quantitation of proteotypic peptides in digests of plasma by SRM-MS allows specific, internally-standardized measurement of protein biomarkers and can achieve sub-nanogram/mL detection levels when specific anti-peptide antibodies are used to enrich target peptides from the plasma digests (SISCAPA). For this study, proteotypic tryptic peptides (initially 5 peptides per protein) were selected representing known protein biomarkers: PAI3 (protein C inhibitor), LPS binding protein, transferrin receptor, osteopontin, ferritin light chain, mesothelin, alpha-fetoprotein, HER2/neu, CA-125 and thyroglobulin. Affinity-purified polyclonal antibodies against the two peptides for each protein showing highest titers were characterized in SISCAPA assays, after which rabbit monoclonal antibodies (RabMAbs) were prepared (Epitomics, Inc.) against the best performing peptide for each target, except for Tg, for which mAbs were made against two peptides. The SISCAPA assay has been automated allowing processing of 96 samples in less than 30 minutes.
The eluted peptides are delivered in a volume (20 μL) and solvent (5% acetic acid) suitable for subsequent injection into a reversed-phase LC system. Parameters for each of the 11 target peptides and cognate labeled standards have been optimized, permitting use of retention-time scheduled MRM data collection and rapid (3 min) analysis times. Results will be presented demonstrating the performance of the workflow for high-throughput quantitation of protein biomarkers.
PMCID: PMC3630603
22.  Expression Signature Developed from a Complex Series of Mouse Models Accurately Predicts Human Breast Cancer Survival 
The capability of microarray platform to interrogate thousands of genes has led to the development of molecular diagnostic tools for cancer patients. While large-scale comparative studies of clinical samples are often limited by the access of human tissues, expression profiling databases of various human cancer types are publicly available for researchers. Given that mouse models have been instrumental to our current understanding of cancer progression, we aimed to test the hypothesis that novel gene signatures possessing predictability in clinical outcome can be derived by coupling genomic analyses in mouse models of cancer with publicly available human cancer datasets.
Experimental Design
We established a complex series of syngeneic metastatic animal models using a murine breast cancer cell line. Tumor RNA was hybridized on Affymetrix MouseGenome-430A2.0 GeneChips. With the use of Venn logic, gene signatures that represent metastatic competency were derived and tested against publicly available human breast and lung cancer datasets.
Survival analyses showed that the spontaneous metastasis gene signature was significantly associated with metastasis-free and overall survival (p<0.0005). Consequently, the six-gene model was determined and demonstrated statistical predictability in predicting survival in breast cancer patients. In addition, the model was able to stratify poor from good prognosis for lung cancer patients in majority of the datasets analyzed.
Together, our data support that novel gene signature derived from mouse models of cancer can be utilized for predicting human cancer outcome. Our approaches set precedence that similar strategies may be used to decipher novel gene signatures for clinical utility.
PMCID: PMC2866744  PMID: 20028755
Mouse model; Gene Expression; Breast Cancer; Lung Cancer; Survival; Signature
23.  Disparities in knowledge and willingness to donate research biospecimens: a mixed-methods study in an underserved urban community 
Journal of Community Genetics  2014;5(4):329-336.
Although research involving biospecimens is essential in advancing cancer research, minorities, especially African-Americans, are underrepresented in such research. We conducted a mixed-method (qualitative focus groups among African-Americans and quantitative cross-sectional surveys) study on factors associated with biospecimen knowledge and donation intent in the medically underserved urban communities in Southeast and Southwest Washington, DC. Focus groups were conducted among 41 African-Americans and survey data was available from 302 community residents of different races/ethnicities using convenience sampling. We used logistic regression to model the association between biospecimen knowledge and donation intent with selected sociodemographic variables using survey data. Only 47 % of the participants had knowledge of the different types of biospecimens. In multivariate logistic regression models, male gender, African-American race, and low education levels were significantly associated with lower knowledge about biospecimens. Compared to Whites (79 %), fewer African-Americans (39 %) and Hispanics (57 %) had knowledge of biospecimens but the difference was significant for African-Americans only. Positive intent to donate biospecimens for research was observed among 36 % of the survey respondents. After multivariate adjustment, only biospecimen knowledge was associated with donation intent (odds ratio = 1.91, 95 % confidence interval 1.12, 3.27). Contrary to popular opinion, “mistrust of the medical community” was not the most commonly reported barrier for biospecimen donation among African-Americans. “Not knowing how biospecimens will be used” and “lack of knowledge of biospecimens” were the most common barriers. Our study highlights the importance of education on biospecimens among community residents to increase minority participation in biospecimen research.
Electronic supplementary material
The online version of this article (doi:10.1007/s12687-014-0187-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4159473  PMID: 24771039
Biospecimen; Knowledge; African-American; Disparities
24.  Association between Cutaneous Nevi and Breast Cancer in the Nurses' Health Study: A Prospective Cohort Study 
PLoS Medicine  2014;11(6):e1001659.
Using data from the Nurses' Health Study, Jiali Han and colleagues examine the association between number of cutaneous nevi and the risk for breast cancer.
Please see later in the article for the Editors' Summary
Cutaneous nevi are suggested to be hormone-related. We hypothesized that the number of cutaneous nevi might be a phenotypic marker of plasma hormone levels and predict subsequent breast cancer risk.
Methods and Findings
We followed 74,523 female nurses for 24 y (1986–2010) in the Nurses' Health Study and estimate the relative risk of breast cancer according to the number of cutaneous nevi. We adjusted for the known breast cancer risk factors in the models. During follow-up, a total of 5,483 invasive breast cancer cases were diagnosed. Compared to women with no nevi, women with more cutaneous nevi had higher risks of breast cancer (multivariable-adjusted hazard ratio, 1.04, 95% confidence interval [CI], 0.98–1.10 for 1–5 nevi; 1.15, 95% CI, 1.00–1.31 for 6–14 nevi, and 1.35, 95% CI, 1.04–1.74 for 15 or more nevi; p for continuous trend = 0.003). Over 24 y of follow-up, the absolute risk of developing breast cancer increased from 8.48% for women without cutaneous nevi to 8.82% (95% CI, 8.31%–9.33%) for women with 1–5 nevi, 9.75% (95% CI, 8.48%–11.11%) for women with 6–14 nevi, and 11.4% (95% CI, 8.82%–14.76%) for women with 15 or more nevi. The number of cutaneous nevi was associated with increased risk of breast cancer only among estrogen receptor (ER)–positive tumors (multivariable-adjusted hazard ratio per five nevi, 1.09, 95% CI, 1.02–1.16 for ER+/progesterone receptor [PR]–positive tumors; 1.08, 95% CI, 0.94–1.24 for ER+/PR− tumors; and 0.99, 95% CI, 0.86–1.15 for ER−/PR− tumors). Additionally, we tested plasma hormone levels according to the number of cutaneous nevi among a subgroup of postmenopausal women without postmenopausal hormone use (n = 611). Postmenopausal women with six or more nevi had a 45.5% higher level of free estradiol and a 47.4% higher level of free testosterone compared to those with no nevi (p for trend = 0.001 for both). Among a subgroup of 362 breast cancer cases and 611 matched controls with plasma hormone measurements, the multivariable-adjusted odds ratio for every five nevi attenuated from 1.25 (95% CI, 0.89–1.74) to 1.16 (95% CI, 0.83–1.64) after adjusting for plasma hormone levels. Key limitations in this study are that cutaneous nevi were self-counted in our cohort and that the study was conducted in white individuals, and thus the findings do not necessarily apply to other populations.
Our results suggest that the number of cutaneous nevi may reflect plasma hormone levels and predict breast cancer risk independently of previously known factors.
Please see later in the article for the Editors' Summary
Editors' Summary
One woman in eight will develop breast cancer during her lifetime. Breast cancer begins when cells in the breast acquire genetic changes that allow them to divide uncontrollably (which leads to the formation of a lump in the breast) and to move around the body (metastasize). The treatment of breast cancer, which is diagnosed using mammography (a breast X-ray) or manual breast examination and biopsy, usually involves surgery to remove the lump, or the whole breast (mastectomy) if the cancer has started to metastasize. After surgery, women often receive chemotherapy or radiotherapy to kill any remaining cancer cells and may also be given drugs that block the action of estrogen and progesterone, female sex hormones that stimulate the growth of some breast cancer cells. Globally, half a million women die from breast cancer each year. However, in developed countries, nearly 90% of women affected by breast cancer are still alive five years after diagnosis.
Why Was This Study Done?
Several sex hormone–related factors affect breast cancer risk, including at what age a woman has her first child (pregnancy alters sex hormone levels) and her age at menopause, when estrogen levels normally drop. Moreover, postmenopausal women with high circulating levels of estrogen and testosterone (a male sex hormone) have an increased breast cancer risk. Interestingly, moles (nevi)—dark skin blemishes that are a risk factor for the development of melanoma, a type of skin cancer—often darken or enlarge during pregnancy. Might the number of nevi be a marker of hormone levels, and could nevi counts therefore be used to predict an individual's risk of breast cancer? In this prospective cohort study, the researchers look for an association between number of nevi and breast cancer risk among participants in the US Nurses' Health Study (NHS). A prospective cohort study enrolls a group of people, determines their baseline characteristics, and follows them over time to see which characteristics are associated with the development of certain diseases. The NHS, which enrolled 121,700 female nurses aged 30–55 years in 1976, is studying risk factors for cancer and other chronic diseases in women.
What Did the Researchers Do and Find?
In 1986, nearly 75,000 NHS participants (all of whom were white) reported how many nevi they had on their left arm. Over the next 24 years, 5,483 invasive breast cancers were diagnosed in these women. Compared to women with no nevi, women with increasing numbers of nevi had a higher risk of breast cancer after adjustment for known breast cancer risk factors. Specifically, among women with 1–5 nevi, the hazard ratio (HR) for breast cancer was 1.04, whereas among women with 15 or more nevi the HR was 1.35. An HR compares how often a particular event occurs in two groups with different characteristics; an HR greater than one indicates that a specific characteristic is associated with an increased risk of the event. Over 24 years of follow-up, the absolute risk of developing breast cancer was 8.48% in women with no nevi but 11.4% for women with 15 or more nevi. Notably, postmenopausal women with six or more nevi had higher blood levels of estrogen and testosterone than women with no nevi. Finally, in a subgroup analysis, the association between number of nevi and breast cancer risk disappeared after adjustment for hormone levels.
What Do These Findings Mean?
These findings support the hypothesis that the number of nevi reflects sex hormone levels in women and may predict breast cancer risk. Notably, they show that the association between breast cancer risk and nevus number was independent of known risk factors for breast cancer, and that the risk of breast cancer increased with the number of nevi in a dose-dependent manner. These findings also suggest that a hormonal mechanism underlies the association between nevus number and breast cancer risk. Because this study involved only white participants, these findings may not apply to non-white women. Moreover, the use of self-reported data on nevus numbers may affect the accuracy of these findings. Finally, because this study is observational, these findings are insufficient to support any changes in clinical recommendations for breast cancer screening or diagnosis. Nevertheless, these data and those in an independent PLOS Medicine Research Article by Kvaskoff et al. support the need for further investigation of the association between nevi and breast cancer risk and of the mechanisms underlying this relationship.
Additional Information
Please access these websites via the online version of this summary at
An independent PLOS Medicine Research Article by Kvaskoff et al. also investigates the relationship between nevi and breast cancer risk
The US National Cancer Institute provides comprehensive information about cancer (in English and Spanish), including detailed information for patients and professionals about breast cancer; it also has a fact sheet on moles
Cancer Research UK, a not-for profit organization, provides information about cancer, including detailed information on breast cancer
The UK National Health Service Choices website has information and personal stories about breast cancer; the not-for profit organization Healthtalkonline also provides personal stories about dealing with breast cancer
More information about the Nurses' Health Study is available
PMCID: PMC4051600  PMID: 24915186
25.  Gene Expression Profiling for Guiding Adjuvant Chemotherapy Decisions in Women with Early Breast Cancer 
Executive Summary
In February 2010, the Medical Advisory Secretariat (MAS) began work on evidence-based reviews of published literature surrounding three pharmacogenomic tests. This project came about when Cancer Care Ontario (CCO) asked MAS to provide evidence-based analyses on the effectiveness and cost-effectiveness of three oncology pharmacogenomic tests currently in use in Ontario.
Evidence-based analyses have been prepared for each of these technologies. These have been completed in conjunction with internal and external stakeholders, including a Provincial Expert Panel on Pharmacogenomics (PEPP). Within the PEPP, subgroup committees were developed for each disease area. For each technology, an economic analysis was also completed by the Toronto Health Economics and Technology Assessment Collaborative (THETA) and is summarized within the reports.
The following reports can be publicly accessed at the MAS website at: or at
Gene Expression Profiling for Guiding Adjuvant Chemotherapy Decisions in Women with Early Breast Cancer: An Evidence-Based and Economic Analysis
Epidermal Growth Factor Receptor Mutation (EGFR) Testing for Prediction of Response to EGFR-Targeting Tyrosine Kinase Inhibitor (TKI) Drugs in Patients with Advanced Non-Small-Cell Lung Cancer: An Evidence-Based and Ecopnomic Analysis
K-RAS testing in Treatment Decisions for Advanced Colorectal Cancer: an Evidence-Based and Economic Analysis
To review and synthesize the available evidence regarding the laboratory performance, prognostic value, and predictive value of Oncotype-DX for the target population.
Clinical Need: Condition and Target Population
The target population of this review is women with newly diagnosed early stage (stage I–IIIa) invasive breast cancer that is estrogen-receptor (ER) positive and/or progesterone-receptor (PR) positive. Much of this review, however, is relevant for women with early stage (I and II) invasive breast cancer that is specifically ER positive, lymph node (LN) negative and human epidermal growth factor receptor 2 (HER-2/neu) negative. This refined population represents an estimated incident population of 3,315 new breast cancers in Ontario (according to 2007 data). Currently it is estimated that only 15% of these women will develop a distant metastasis at 10 years; however, a far great proportion currently receive adjuvant chemotherapy, suggesting that more women are being treated with chemotherapy than can benefit. There is therefore a need to develop better prognostic and predictive tools to improve the selection of women that may benefit from adjuvant chemotherapy.
Technology of Concern
The Oncotype-DX Breast Cancer Assay (Genomic Health, Redwood City, CA) quantifies gene expression for 21 genes in breast cancer tissue by performing reverse transcription polymerase chain reaction (RT-PCR) on formalin-fixed paraffin-embedded (FFPE) tumour blocks that are obtained during initial surgery (lumpectomy, mastectomy, or core biopsy) of women with early breast cancer that is newly diagnosed. The panel of 21 genes include genes associated with tumour proliferation and invasion, as well as other genes related to HER-2/neu expression, ER expression, and progesterone receptor (PR) expression.
Research Questions
What is the laboratory performance of Oncotype-DX?
How reliable is Oncotype-DX (i.e., how repeatable and reproducible is Oncotype-DX)?
How often does Oncotype-DX fail to give a useable result?
What is the prognostic value of Oncotype-DX?*
Is Oncotype-DX recurrence score associated with the risk of distant recurrence or death due to any cause in women with early breast cancer receiving tamoxifen?
What is the predictive value of Oncotype-DX?*
Does Oncoytpe-DX recurrence score predict significant benefit in terms of improvements in 10-year distant recurrence or death due to any cause for women receiving tamoxifen plus chemotherapy in comparison to women receiving tamoxifen alone?
How does Oncotype-DX compare to other known predictors of risk such as Adjuvant! Online?
How does Oncotype-DX impact patient quality of life and clinical/patient decision-making?
Research Methods
Literature Search
Search Strategy
A literature search was performed on March 19th, 2010 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations, EMBASE, the Cumulative Index to Nursing & Allied Health Literature (CINAHL), the Cochrane Library, and the International Agency for Health Technology Assessment (INAHTA) for studies published from January 1st, 2006 to March 19th, 2010. A starting search date of January 1st, 2006 was because a comprehensive systematic review of Oncotype-DX was identified in preliminary literature searching. This systematic review, by Marchionni et al. (2008), included literature up to January 1st, 2007. All studies identified in the review by Marchionni et al. as well as those identified in updated literature searching were used to form the evidentiary base of this review. The quality of the overall body of evidence was identified as high, moderate, low or very low according to GRADE methodology.
Inclusion Criteria
Any observational trial, controlled clinical trial, randomized controlled trial (RCT), meta-analysis or systematic review that reported on the laboratory performance, prognostic value and/or predictive value of Oncotype-DX testing, or other outcome relevant to the Key Questions, specific to the target population was included.
Exclusion Criteria
Studies that did not report original data or original data analysis,
Studies published in a language other than English,
Studies reported only in abstract or as poster presentations (such publications were not sought nor included in this review since the MAS does not generally consider evidence that is not subject to peer review nor does the MAS consider evidence that lacks detailed description of methodology).
Outcomes of Interest
Outcomes of interest varied depending on the Key Question. For the Key Questions of prognostic and predictive value (Key Questions #2 and #3), the prospectively defined primary outcome was risk of 10-year distant recurrence. The prospectively defined secondary outcome was 10-year death due to any cause (i.e., overall survival). All additional outcomes such as risk of locoregional recurrence or disease-free survival (DFS) were not prospectively determined for this review but were reported as presented in included trials; these outcomes are referenced as tertiary outcomes in this review. Outcomes for other Key Questions (i.e., Key Questions #1, #4 and #5) were not prospectively defined due to the variability in endpoints relevant for these questions.
Summary of Findings
A total of 26 studies were included. Of these 26 studies, only five studies were relevant to the primary questions of this review (Key Questions #2 and #3). The following conclusions were drawn from the entire body of evidence:
There is a lack of external validation to support the reliability of Oncotype-DX; however, the current available evidence derived from internal industry validation studies suggests that Oncotype-DX is reliable (i.e., Oncotype-DX is repeatable and reproducible).
Current available evidence suggests a moderate failure rate of Oncotype-DX testing; however, the failure rate observed across clinical trials included in this review is likely inflated; the current Ontario experience suggests an acceptably lower rate of test failure.
In women with newly diagnosed early breast cancer (stage I–II) that is estrogen-receptor positive and/or progesterone-receptor positive and lymph-node negative:
There is low quality evidence that Oncotype-DX has prognostic value in women who are being treated with adjuvant tamoxifen or anastrozole (the latter for postmenopausal women only),
There is very low quality evidence that Oncotype-DX can predict which women will benefit from adjuvant CMF/MF chemotherapy in women being treated with adjuvant tamoxifen.
In postmenopausal women with newly diagnosed early breast cancer that is estrogen-receptor positive and/or progesterone-receptor positive and lymph-node positive:
There is low quality evidence that Oncotype-DX has limited prognostic value in women who are being treated with adjuvant tamoxifen or anastrozole,
There is very low quality evidence that Oncotype-DX has limited predictive value for predicting which women will benefit from adjuvant CAF chemotherapy in women who are being treated with adjuvant tamoxifen.
There are methodological and statistical limitations that affect both the generalizability of the current available evidence, as well as the magnitude and statistical strength of the observed effect sizes; in particular:
Of the major predictive trials, Oncotype-DX scores were only produced for a small subset of women (<40% of the original randomized population) potentially disabling the effects of treatment randomization and opening the possibility of selection bias;
Data is not specific to HER-2/neu-negative women;
There were limitations with multivariate statistical analyses.
Additional trials of observational design may provide further validation of the prognostic and predictive value of Oncotype-DX; however, it is unlikely that prospective or randomized data will become available in the near future due to ethical, time and resource considerations.
There is currently insufficient evidence investigating how Oncoytpe-DX compares to other known prognostic estimators of risk, such as Adjuvant! Online, and there is insufficient evidence investigating how Oncotype-DX would impact clinician/patient decision-making in a setting generalizable to Ontario.
PMCID: PMC3382301  PMID: 23074401

Results 1-25 (962862)