The identification of molecular markers with prognostic significance may help cancer patients avoid treatment that is unlikely to be successful. In breast cancer, for example, clinical studies have shown that adding adjuvant chemotherapy to tamoxifen in the treatment of node-negative, hormone-receptor (HR)-positive breast cancer improves disease outcome.1
However, treatment with tamoxifen alone is associated with a 15% likelihood of distant recurrence at 10 years in this population, suggesting that 85% of these patients would do well without the addition of cytotoxic chemotherapy and could avoid the adverse events inherent to such treatment.1
Nevertheless, the current National Comprehensive Cancer Network (NCCN)2
and St. Gallen3
clinical practice guidelines, utilizing classical histopathology and immunohistochemical prognostic markers, categorize less than 10% of node-negative HR-positive patients at low enough risk of recurrence to forgo adjuvant chemotherapy. These treatment guidelines assume that patients will derive the same degree of benefit from chemotherapy regardless of their baseline risk.
Clinical oncologists want two basic pieces of information to make treatment decisions for individual patients: 1) a patient’s baseline risk after legacy treatment and 2) the expected degree of additional benefit she will receive from systemic therapy added to legacy treatment. Legacy treatment can consist of surgery alone, surgery plus anti-hormonal therapy, or surgery plus chemohormonal therapy, depending on the stage and hormone receptor status of the tumor. This varying clinical context dictates that most clinically useful prognostic and predictive markers are those developed with a specific clinical context in mind and tested and validated within that clinical context. Clinical test can provide either prognostic or predictive information or both. .
In the past there was no distinction made between “prognostic” and “predictive” markers when investigators conducted studies and such practice was a source of great confusion in the field. More recently a distinction between the two has become the norm rather than the exception. However it also has become clear that for most chemotherapy regimens in use today, general prognosticators end up as predictive markers of the degree of benefit from chemotherapy. For example, estrogen receptor status is not only prognostic and predictive of response to anti-hormone therapy, but also predictive of response to chemotherapy (the presence of the estrogen receptor is associated with less benefit from chemotherapy). Another example is uPA/PAI-1. Harbeck and colleagues analyzed samples from more than 8000 women with node-negative breast cancer.4
Intratumoral concentration of urokinase-type plasminogen activator (uPA) and its inhibitor, plasminogen activator inhibitor type 1 (PAI-1), were found to correlate directly with risk of disease recurrence. Despite the fact that these markers have no suspected direct role in chemotherapy response, patients categorized as high-risk, based on uPA/PAI-1 levels derived significant benefit from chemotherapy, whereas low-risk patients gained little from the addition of chemotherapy.
To understand the intriguing interaction between prognosis and prediction of chemotherapy response, one needs to look at the studies published by the “Seattle Project.”5
In this project, a collection of mutant yeast strains that harbor specific mutations on genes involved in DNA repair or cell cycle control pathways were generated and then tested against FDA-approved chemotherapeutic agents. Such tests provide synthetic lethality screening; for example, among all mutant yeast strains, only those with defects in double-strand DNA repair pathways will be sensitive to Mitoxantrone, which binds topoisomerase II alpha and causes DNA double-strand breaks. The investigators hoped to catalog specific targets for all chemotherapy agents. What they found, however, was highly disappointing - - most of the chemotherapeutic agents tested, especially those used for treatment of breast cancer such as 5-fluorouracil and doxorubicin did not show any selectivity for targets. Therefore tumors with high proliferation rates are expected to be sensitive to such agents. This explains why most of the prognostic markers are also predictive markers for chemotherapy response.
Classic prognostic tools such as tumor grade have traditionally been regarded as important indicators of breast cancer risk.6
Indeed, Adjuvant On Line (AOL), a SEER data-based algorithm integrating clinical (age, nodal status) and histopathological (estrogen receptor, size, grade) features of breast cancer, has been shown to accurately predict 10 year mortality rate.7, 8
In a study we carried out (unpublished), AOL was also predictive of the degree of benefit from chemotherapy. Yet, while excellent in providing average risk assessment for cohorts of patients, assessment of these markers can be highly variable at the individual patient level. One study that compared tumor grade assessments by three independent pathologists found that concordance was less than 50%,9
suggesting that the accuracy of risk estimates based on histological grade may vary considerably. It is hoped that gene- expression-based markers will provide more reproducible individualized risk assessments.
With this general background, let’s examine the current state of the art of gene-expression-based prognostic and predictive markers for breast cancer.
Assays for Gene Expression
There are many different ways to quantify the levels of expression of genes. On a broad scale, there are techniques that require the initial conversion of messenger RNA (mRNA) into complementary DNA (cDNA) using reverse transcriptase in order to measure gene expression. Other methods do not require such conversion and directly manipulate mRNA to determine expression. All of the currently available clinical prognostic tests are based on the conversion of mRNA into cDNA, which can be achieved by using the poly-A tail of mRNA as a template (oligo-dT priming) or by using a gene specific template (gene-specific-priming). Oligo-dT priming provides the ability to convert an entire mRNA species into a library of cDNAs. This library can then be used as a template for many different assays, e.g., polymerase chain reaction (of each gene target of interest) or microarray gene expression profiling, which allows the examination of expression levels of essentially all genes. Microarray gene expression profiling is achieved by hybridizing fluorescence-labeled cDNA to a library of oligonucleotides representing literally all known human genes that are bound to a solid matrix (in the Affymetrix GeneChip® [Santa Clara, CA], and Agilent array [Santa Clara, CA], or beads [Illumina bead array (San Diego, California)]).10
After hybridization, expression levels of each gene are quantified using laser scanning microscopes (scanners). It used to be that academic institutions printed their own microarrays on glass slides. However due to decreasing cost and to technical reproducibility issues, most have switched to mass-produced arrays from manufacturers such as Affymtetrix, Agilent, or Illumina. Each microarray manufacturer uses different methodologies to construct microarrays, and the resulting microarrays have different hybridization characteristics and dynamic ranges. One example of a clinical prognostic assay based on microarray technology is the MammaPrint® assay (Agendia [Amsterdam, Netherlands]).11, 12
This assay is based on an Agilent microarray platform. Patients are assigned to either good or poor prognosis categories based on the distance of gene expression levels of 70 genes from model expression levels of good or poor prognosis groups (centroids) developed from reference clinical samples used in the development of the assay. The MammaPrint® is a validated FDA cleared assay that provides additional prognostic information beyond what clinical and pathological features offer. However it requires the use of high quality RNA that can be obtained only from fresh tissue procured in RNARetain solution or snap-frozen tumor tissue and therefore cannot be applied to formalin-fixed, paraffin embedded tumor blocks (FPET).
Overcoming FPET Barriers
What makes RNA extracted from FPET a poor starting material for microarray analysis?
Masuda et al.
analyzed the RNA extracted from FPET for chemical modifications.13
RNA extracted from freshly made FPET that were fixed and processed in ideal conditions (fixed in 10% buffered formalin at 4°C) were fairly well preserved. Although the extracted RNAs showed no sign of degradation compared with fresh samples, they were poor substrates for cDNA synthesis and subsequent polymerase chain reaction (PCR) amplification, so that only PCR amplification of short targets was possible. This seems to be due to chemical modification of nucleic acids in FPET, i.e., the addition of mono-methylol (-CH2
OH) groups to all four bases to varying degrees, and dimerization of adenines through methylene bridging. In addition to chemical modification, RNAs in FPET continue to be degraded or fragmented over time during storage for reasons that are unclear. Cronin et al. systematically examined the quality of RNA extracted from breast cancer FPET specimens taken at different times.14
RNAs from FPET archived for ~1 year were less fragmented than RNA archived for ~6 or 17 years.
For this reason, gene-specific priming with short PCR amplification target sequences is recommended for FPET.14
In this scenario, a gene-specific cDNA of short length is synthesized using reverse transcription with a gene specific primer, and then a new set of primers targeting that cDNA is used for PCR amplification. This process is called reverse transcription PCR (RT-PCR).
RT-PCR can be performed with end-point product quantification or real-time product quantification. For clinical assays, usually real-time product quantification is used (QRT-PCR). QRT-PCR can be performed because DNA polymerase used for a PCR reaction also has exonuclease activity which degrades DNA already bound to the template. Thus if an oligonucleotide probe (which binds to the middle of the PCR-amplified DNA region and is designed in such a way that the fluorescence signal is released only upon degradation by exonuclease) is mixed into a PCR reaction tube, then the release of the fluorescence signal should be directly proportional to the quantity of PCR product. Real-time PCR devices measure fluorescence signals at defined points of each thermal cycle which allows quantification of PCR products in real time. In the gene expression field, QRT-PCR is regarded as the gold standard to which all other assays are compared. When combined with gene-specific priming, QRT-PCR can provide highly accurate relative expression levels using degraded RNA from FPET as a starting material. An additional problem with FPET is that the absolute signal of RT-PCR from the same amount of starting RNA decreases significantly if blocks have been stored for a long time, resulting in an approximate 100-fold reduction in signal if the block is 10 years old compared to a freshly made block.14
However, careful normalization based on genes with minimal variation of expression level among different tumor samples can largely compensate for these differences in absolute signal.14
Dx® assay (Redwood City, CA), a prognostic test for node-negative estrogen receptor-positive breast cancer offered by a commercial reference laboratory, is based on QRT-PCR measurement of 16 cancer genes that are normalized to the measurement of 5 reference genes.9, 15
Development and Validation of OncotypeDx® Assay
Statistical experts agree that the steps taken to develop the Oncotype
Dx® was exemplary and recommend that others take a similar approach in developing new clinical assays.16
To develop a context-specific prognostic assay that address which women diagnosed with axillary node-negative and estrogen receptor-positive breast cancer require more than 5 years of tamoxifen therapy, National Surgical Adjuvant Breast and Bowel Project (NSABP) investigators have collaborated with scientists at Genomic Health, Inc. (Redwood City, CA), the developers of methods for high-throughput QRT-PCR using RNA extracted from FPET. First, individual QRT-PCR assays optimized for fragmented RNA substrates that can be isolated from FPET were developed for 250 candidate genes identified through literature and database searches. Many genes identified as prognostic genes through gene expression profiling studies, including the 70 genes from the MammaPrint® assay, were included in this set of 250 genes. Three cohorts including women in the tamoxifen-treated arm of NSABP trial B-20 were examined for the expression of 250 genes. Correlating clinical outcome with expression levels yielded many prognostic genes from these three studies, and 16 top-performing genes were identified for final model building and validation. Relative expression levels of the 16 genes were measured in relationship to average expression levels of 5 reference genes. While the majority of these 16 genes were found to be estrogen receptor (ESR1, PGR, BCL2, SCUBE2) and proliferation (Ki67, STK15, Survivin, CCNB1, MYBL2) related, there are genes that did not belong to these two categories (HER2, GRB7, MMP11, CTSL2, GSTM1, CD68, BACG1). An unscaled recurrence score (RSu) was calculated using coefficients defined on the basis of regression analysis of gene expression. Recurrence Score (RS) was rescaled from RSu as follows: RS=0 if RSu<0; RS=20x (RSu-6.7) if 0≤RSu≤100; and RS=100 if RSu>100. Final validation of the RS was achieved by examining its performance in an independent cohort from NSABP trial B-14, that was not used in the model-building process. The validation study was conducted with a rigorous predefined statistical analysis plan with pre-specified outcome endpoints and cut-offs for RS. The predefined low risk group (RS below 18) demonstrated a significantly better prognoses compared to higher risk groups.9
Compared to the NCCN or St. Gallen criteria, which assigned less than 10% of these patients from B-14 into the low risk group, RS was able to categorize 50% of these patients into a low-risk group that had similar 10-year distant disease–free survival (DDFS) rates as the low-risk groups identified by the NCCN or St. Gallen criteria (unpublished data).
When reporting on the B-14 study, Paik et al
. used three subgroups arbitrarily chosen prospectively for Kaplan-Meier plot comparisons.9
However, RS was developed originally as a continuous variable and should be utilized as such. For example, the prognosis for a patient with an RS of 17 (who is categorized as low risk in the B-14 study) would not be very different from that of a patient with an RS of 19 (categorized as intermediate risk). On the other hand, the prognosis for a patient with an RS of 2 would be fairly different from one with an RS of 17, although both are in the low risk group. As discussed above, because of the inclusion of proliferation and estrogen receptor-related genes, RS was also expected to be also predictive of chemotherapy response; this was tested in NSABP trial B-20 where, indeed, a higher RS was associated with a higher degree of benefit from adjuvant chemotherapy.15
Although this interaction between RS and chemotherapy has not been validated outside NSABP trial B-20 until recently, due to the low baseline risk of low RS patients, the oncology community came to a consensus that there was enough evidence not to use chemotherapy in such patients, given the high possibility that the benefit from chemotherapy was expected to be low. However, for patients in the intermediate range of RS for whom there is enough baseline risk to be worried about, but with uncertain degree of benefit from chemotherapy based on B-20 results, it was felt that a randomized prospective comparison between modern antihormonal therapy including aromatase inhibitor with or without chemotherapy would be important. This formed the basis for the design of the TAILORx trial - - a US Intergroup study which employs upfront testing with Oncotype
Dx® and the assignment of intermediate risk patients to either hormonal therapy alone or chemohormonal therapy.17
The tumor tissue bank established from this trial will be highly valuable in the further optimization of gene-expression-based or other prognostic and predictive assays.
Which Gene–expression-based Prognostic Test is Better?
None of these tests is perfect. Performance among the tests is similar, as has been demonstrated elegantly by the group at North Carolina.18
Thus their utility should be based on the clinical context for which they were developed.16
However, it may be useful to pathologists and clinicians to have an understanding of the fundamental philosophical differences researchers exercised in the development of various gene expression assays. summarizes the differences between two commercial tests (Oncotype
Dx® and MammaPrint®).
Summary of Differences Between Two Commercially Available Prognostic Tests in USA
From a conceptual view-point, there are two kinds of gene-expression-based tests: those that provide results as a continuous variable, and those that provide categorical (usually dichotomous) results. The Oncotype
is an example of the former and MammaPrint®11
and intrinsic subtype assays19
are examples of the latter. The Oncotype
Dx® assay assumes a biological continuum and reports a score ranging from 0 to 100, which is the result of mathematical transformation of Cox model of expression levels of 16 cancer-related genes in the tumor.9
Each score is associated with an expected distant recurrence rate at 10 years among estrogen-receptor-positive node-negative patients treated with tamoxifen and an expected degree of benefit from adding chemotherapy to tamoxifen. The MammaPrint® assay compares expression levels of 70 cancer genes measured by microarray centroids or averaged expression levels of good or poor prognosis groups from a reference study, and cases are assigned to either group based on distance from each centroids.11
Although each case is different and represent one on a biological continuum, the cases are assigned to dichotomous classification based on resemblance. For the intrinsic subtypes developed by the Stanford group, that assignment is to multiple groups (luminal A/B, HER2, and basal-like). As expected, within each subgroup, there is a continuum of expression levels of classification genes.19
Physicians often prefer to deal with dichotomous results in the clinic, but many fail to recognize the fact that such dichotomous tests are the results of transforming continuous results into dichotomous ones and that an assessment of risk based on that dichotomous classification provides an averaged risk rather than an individual risk. To assume that the MammaPrint® assay is better than the Oncotype
Dx® assay because it can assign intermediate-risk patients defined by RS into either a good or a poor prognosis group is a mistake.