|Home | About | Journals | Submit | Contact Us | Français|
Cancers of unknown primary origin (CUP) constitute 3%–5% (50,000 to 70,000 cases) of all newly diagnosed cancers per year in the United States. Including cancers of uncertain primary origin, the total number increases to 12%–15% (180,000 to 220,000 cases) of all newly diagnosed cancers per year in the United States. Cancers of unknown/uncertain primary origins present major diagnostic and clinical challenges because the tumor tissue of origin is crucial for selecting optimal treatment. MicroRNAs are a family of noncoding, regulatory RNA genes involved in carcinogenesis. MicroRNAs that are highly stable in clinical samples and tissue specific serve as ideal biomarkers for cancer diagnosis. Our first-generation assay identified the tumor of origin based on 48 microRNAs measured on a quantitative real-time polymerase chain reaction platform and differentiated 25 tumor types.
We present here the development and validation of a second-generation assay that identifies 42 tumor types using a custom microarray. A combination of a binary decision-tree and a k-nearest-neighbor classifier was developed to identify the tumor of origin based on the expression of 64 microRNAs.
Overall assay sensitivity (positive agreement), measured blindly on a validation set of 509 independent samples, was 85%. The sensitivity reached 90% for cases in which the assay reported a single answer (>80% of cases). A clinical validation study on 52 true CUP patients showed 88% concordance with the clinicopathological evaluation of the patients.
The abilities of the assay to identify 42 tumor types with high accuracy and to maintain the same performance in samples from patients clinically diagnosed with CUP promise improved utility in the diagnosis of cancers of unknown/uncertain primary origins.
Cancers of unknown primary origin (CUP) constitute 3%–5% (50,000 to 70,000 cases) of all newly diagnosed cancers per year in the United States. Including cancers of uncertain primary origin, the total number increases to 12%–15% (180,000 to 220,000 cases) of all newly diagnosed cancers per year in the United States. The identification of the tissue of origin presents a challenge in many cases, even after a complete assessment that includes patient history, physical examination, imaging, serum markers, and pathological evaluation of tumor samples [1–4]. However, identification of tumor origin is crucial for the patient management plan because many oncology treatments are based on knowledge of the specific tumor type, especially with the growing number of cytotoxic and targeted therapies shown to be effective against specific cancers [5–9]. In addition, entry criteria into clinical trials and reimbursement strategies  are based on knowing the primary origin. Most importantly, it has been demonstrated that tumor-specific therapy leads to better survival [9, 11]; however, a broad-spectrum treatment approach is used when the putative site of origin cannot be assessed, which is suboptimal. Although immunohistochemistry (IHC) markers are widely used and well characterized, they are unable to determine a definitive tissue of origin in over 30% of cases ; in addition, they are highly subjective and dependent on many variables. Therefore, there is a substantial need to find complementary diagnostic tools for determining tissue of origin.
Currently, molecular profiling of cancers of unknown primary origin is available using expression microarrays and quantitative real-time polymerase chain reaction (qRT-PCR), targeting different molecules, namely mRNA or microRNA . MicroRNAs are particularly suitable as biomarkers for identifying tumor origin as their expression levels reflect tissue differentiation and tumorigenesis [14–17]. In addition, microRNAs have been shown to be highly stable in formalin-fixed paraffin-embedded (FFPE) tissue blocks, the most common and readily available specimen type in pathology [18–20]. In fact, microRNA profiling has been described as being superior to mRNA profiling in FFPE tissue [20, 21].
We recently described the development and validation of a qRT-PCR assay that identifies the tissue of origin for FFPE tumor samples based on the expression levels of 48 microRNAs . The qRT-PCR miRview mets assay was designed to discriminate between 25 possible classes corresponding to 17 distinct tissues and organs of origin; it was shown to correctly identify the tissue of origin in 85% of the cases in an independent validation study. The assay was further validated in two additional studies using samples from actual patients with CUP, demonstrating its usefulness in the more clinically relevant cases of CUP [23, 24].
Although these 25 tumor types cover the majority of tumor types seen in adults with cancer of unknown/uncertain origin [25, 26], we set forth to develop a second-generation diagnostic assay to identify a wider range of tumor types. This assay could enable physicians to resolve more cases of unknown or uncertain diagnoses and therefore enable more optimal treatment selection. The tumor panel of the second generation assay, miRview mets2, has been expanded to include additional carcinomas and neuroendocrine tumors, as well as a variety of sarcomas and lymphoma. The 42 tumor types in its panel are described in Table 1. The main clinical need is for identifying the origin of metastases, but it is not uncommon for physicians to be uncertain whether a tumor is a metastasis or a primary tumor, such as with malignancy in the lung or liver or cases for which the clinical presentation is not consistent with the pathology. The assay is therefore designed to identify the tissue of origin of both metastases and primary tumors at the site of the biopsy/resection.
To achieve the expansion of the tumor panel and enable more efficient upscaling of sample volume, we developed the second-generation assay on custom-designed microarrays, which offer several advantages as discussed later. Here we describe the validation of the assay on 509 blinded samples of known origin, as well as results from an interlaboratory reproducibility study on 179 samples. We further extended the validation to a more challenging group of actual patients with CUP by evaluating the assay performance on 52 CUP cases from the same set studied before on the first-generation assay . The results of this validation confirm the high level of accuracy of microRNA-based profiling in CUP cases and also demonstrate the importance of adding additional tumor types to the assay.
Tumor samples were obtained from several sources (see supplemental online data). Institutional review board approvals were obtained in accordance with institutes' guidelines. Samples were obtained by surgical resections and biopsies (dated 1990–2010) and included primary tumors and metastases of defined origins. An additional review of specimens confirmed the reference diagnosis as defined in the original records. In 37 cases, microdissection was performed (supplemental online data) . Tumor cellular content reached at least 60% for >95% of the samples (based on hematoxylin-eosin slides). Tumors containing significant necrosis (cutoff arbitrarily set at >35%) and sections containing significant hemorrhage (cutoff arbitrarily set at >50%) were excluded. Tumors with significant fibrosis or desmoplastic reaction (>50%) were also excluded, although the fibrotic tissue is typically not very cellular.
Total RNA was extracted as previously described . Briefly, FFPE sections were deparaffinized with xylene, washed in ethanol, and digested with proteinase K. RNA was extracted using acid-phenol:chloroform followed by ethanol precipitation and DNase digestion. Following a second acid-phenol:chloroform extraction, the pellet was resuspended in nuclease-free water and analyzed for its concentration and purity by spectrophotometry (NanoDrop1000).
Custom-designed arrays from Agilent Technologies (Santa Clara, CA) that harbor 8 identical subarrays (8 × 15,000 format) were used. Then 0.25–1 μg of total RNA was labeled by ligation of an RNA linker, p-rCrU-Cy/dye (BioSpring, Frankfurt, Germany; Cy3 or Cy5) to the 3′ end. Synthetic small RNA controls were spiked before labeling. Slides were incubated with the labeled RNA for 12–16 hours at 55°C and washed according to the Agilent protocol. Arrays were scanned using the Agilent DNA Microarray Scanner Bundle at a resolution of 5 μm, dual pass at 100%, and 10% laser power.
Array images were analyzed using Agilent Feature Extraction software version 10.7.1.1. Triplicate spots were combined to produce one signal by taking the logarithmic mean of reliable spots. Analysis was performed in log space (log2). Normalization was performed for each sample with respect to a reference vector (R), calculated by taking the median expression level over the training set. For each sample data vector S, a second-degree polynomial F was found so as to provide the best fit between S and R, such that R ≈ F(S). This was performed on a set of invariant microRNAs; remote datapoints (outliers) were not used for fitting the polynomial. For each probe in the sample (element Si in the vector S), the normalized value (in log2) Mi is calculated from the initial value Si by transforming it with the polynomial function F, so that Mi = F(Si).
Following extraction, seven RNA samples together with a positive control (PC) underwent labeling and hybridization to one array. The PC is an RNA sample that was set as a reference and met defined quality assurance (QA) criteria: Pearson correlation to the reference hybridization, median of differences from reference, and the number of the expressed microRNAs in the dynamic range (expression >300). QA for each sample was based on several parameters, such as the number of microRNAs in the dynamic range, the 98th percentile expression level of the microRNA, the Pearson correlation between the hybridization spikes and the reference, the expression of the negative control probes, and the number of microRNAs with consistent triplicate signals. The signal values of the 64 assay microRNAs for each sample were obtained following normalization and used as input to the assay classifier.
Figure 1 describes the development and validation of the second-generation assay, which was developed using the same principles and statistical methods as the first-generation assay, with several improvements allowing for the expansion of the tumor panel. The assay development was mainly based on expression profiles of 1282 primary and metastatic FFPE samples from the 42 tumor types described in Table 1. The assay was first validated on 509 samples (Table 1) of known origin in a blinded manner, for which it demonstrated 85% accuracy; the vast majority of samples resulted in a single reported origin, which was accurate in 90% of these cases.
We then extended the validation to address CUP, which is the most challenging diagnostic dilemma for a test designed to identify tumor origin. Because by definition no definitive reference diagnosis exists in CUP, these cases present a challenge for test validation. We performed validation of the assay on CUP cases, assessing the performance of the assay using clinicopathologic evaluation, and demonstrated that the performance of the assay remains the same for these challenging cases.
We have developed an array platform that measures the expression level of almost 1,000 microRNAs. This platform was the basis for the development of the miRview mets2 assay. The custom-made array is designed to harbor eight identical subarrays allowing for the simultaneous hybridization of seven samples plus a PC. To increase the measurement precision, each microRNA-related DNA oligonucleotide probe was spotted in triplicate and the logarithmic mean signal intensity was calculated.
To determine the performance of these subarrays, several parameters were studied. A reference sample was labeled and rehybridized to the array on different days. When either an RNA sample extracted from a fresh-frozen sample or RNA extracted from a FFPE sample was measured dozens of times, the overall mean correlation coefficient of both was 0.99, demonstrating the high reproducibility of the process (supplemental online Figs. S1A and S1B). Reproducibility was also demonstrated by comparing 179 RNA samples hybridized in two different laboratories (supplemental online Fig. S2). The sensitivity and dynamic range of the platform were measured using five artificial RNAs (similar in length and composition to endogenous microRNAs) in different concentrations. The lowest sensitivity was 0.1 fmol with a linear dynamic range of 103. The specificity was measured by hybridizing five members of the hsa-let-7 family (with 1–4 nucleotide mismatches) and comparing the signal of the relevant probe to the other probes. As seen in supplemental online Fig. S1C, specificity of 10- to 100-fold is achieved (except for let-7c when hybridized to labeled let-7b), demonstrating a high level of specificity for as little as a single nucleotide mismatch.
The assay relies on two classifiers to determine the tissue of origin, a binary decision tree and a k-nearest-neighbor (KNN). Both classifiers assign a tissue of origin based on the normalized expression of 64 microRNAs as measured by the array. The decision tree predicts the tissue of origin by following the branches and choosing the left or right branch at each node (Fig. 2). This binary decision is made at each node by comparing a combination of microRNA expression levels to a preset threshold. This approach is described in detail elsewhere .
The prediction of the tree is accompanied by a confidence measure, p, which is the cumulative probability (between 0 and 1) over all individual probabilities in the decisions taken in the nodes of the path taken to the tree result. The KNN approach compares the expression across all 64 microRNAs to the dataset of the 1,282 training samples and selects the majority vote among the nearest five samples, measured by Pearson correlation (see Rosenwald et al.  for more details on this approach). The KNN prediction is also accompanied by a confidence measure, V, which is the number of neighbors (between 1 and 5) agreeing with the KNN reported result.
Each of the two classifiers predicts one of the 42 tumor types listed in Table 1 or one of the following seven tumor classes:
These additional possible diagnoses are reported when the classifier has high certainty regarding the tumor class (e.g., sarcoma) but low certainty regarding the specific tumor type (e.g., which type of sarcoma). Importantly, for these tumor classes, knowledge of the specific subtypes does not have major therapeutic implications, or the subtypes can be determined by further investigation. The two predictions are then combined into a single predicted tissue of origin or two different predictions, based on whether the two classifiers agree (either on tumor type or on one of the seven tumor classes) and on their confidence measures (p and V). When two predictions are reported, they are ordered by the likelihood as estimated by the positive predictive value of each of the answers. When both classifiers exhibit very low confidence in their result (low p and V), the assay does not generate a result and reports that the microRNA expression pattern of the sample does not match any of the expression patterns in the panel closely enough.
We estimated the performance of the assay by cross-validating the training set data and then by additional validation sets as detailed later. Cross-validation of the training data showed that the estimated overall accuracy of the assay is 87%, and that in 86% of the cases a single origin is reported, with an accuracy of 89%.
The assay performance was assessed using an independent set of 509 validation samples (Fig. 1; Table 1). These archival samples included primary as well as metastatic tumor samples, whose original clinical diagnosis (reference diagnosis) was one of the 42 tumor types on which the classifier was trained. The samples were processed according to the appropriate standard operating procedures by personnel blinded to the original reference diagnosis of the samples, and classifications were automatically generated by dedicated software. In all, 11 of the 509 samples (2%) failed QA and an additional 9 samples (2%) completed processing but did not generate a result. For 489 samples (96%), including 146 metastatic tumor samples (30% of the samples), the assay was completed successfully and produced tissue-of-origin predictions. For 418 of the 489 samples, the reference diagnosis was predicted by at least one of the two classifiers, resulting in an overall sensitivity (positive agreement) of 85%. Specificity (negative agreement) was >99%. One of the seven tumor classes, rather than a specific tumor type, was reported for 54 (11%) of the cases. For 403 samples (82%), the assay reported a single tissue of origin (supplemental online Fig. S5). For these single-prediction cases, the sensitivity was 90% (361 of 403). Reassuringly, these performance values are very similar to the results obtained by cross-validation on the training data.
We further analyzed the assay validation data set by different divisions to subgroups. The performance of the assay in metastatic and primary tumors showed no significant difference for all origins except prostate, which was previously discussed as a special challenging case . Tumor percentage in the acceptable range for the assay (>60%) also had no effect on the assay performance, regardless of whether the sample underwent microdissection. To test the performance of the assay according to biopsy site, we calculated for each biopsy site the expected performance based on the distribution of the origins of the metastases to this site (in the validation set) and checked whether there was any site with a significant performance difference from the expected performance. No biopsy site showed any significant difference, attesting to the assay's performance being insensitive to the biopsy site.
Interlaboratory reproducibility was assessed by processing RNA from the training and validation samples independently and blindly in two Rosetta Genomics laboratories (Philadelphia and Israel). Data and classifications for 179 samples that produced results in both laboratories were compared. A Pearson correlation on the expression of the 64 assay microRNAs of >0.95 was achieved in 160 (89%) samples (supplemental online Fig. S3). In addition, the two laboratories agreed on the diagnosis in 175 (98%) of the cases, demonstrating the robustness of the assay.
Assay performance to correctly identify the primary tumor type in patients with brain metastases from unknown origin was tested in a cohort of 55 CUP samples (52 patients) published previously . One sample (<2%) failed QA. For 3 of the remaining 54 samples (6%), the assay did not generate a result. For the remaining 51 samples (48 patients), the assay was completed successfully and produced results (Table 22).). Three different brain metastases from one patient (ID 21), later found to have lung adenocarcinoma, were correctly identified by the assay. For another patient (ID 39), two metachronous metastases were studied and resulted in a classification of a carcinoid tumor either in the lung or the intestine. Clinical evaluation of the patient determined a neuroendocrine tumor of unknown primary—a diagnosis that is compatible with both assay predictions. For performance evaluation, we use only one sample per patient.
To evaluate the performance of the assay, we implemented the same concordance score as published previously , based on the clinicopathological data available at the time of diagnosis, additional information gathered during patient follow-up, and in some cases data resulting from investigations following the assay result.
The score divides the results into four main categories:
The assay result predicted a convincing suggested origin (i.e., score type 1 or type 2) in 42 (88%) of the 48 cases that had a suggested origin based on clinical and/or pathological data. For 23 (48%) out of 48 cases, the assay generated a single answer. A clinical and/or pathological match was achieved in 21 (91%) of these samples.
Case ID 38 illustrates the power of the assay in a patient in whom extensive clinical and pathological workup failed to provide a convincing tissue of origin. miRview mets2 suggested a sarcoma as the origin, although sarcoma was not part of the original differential. Following this result, IHC evaluation of the sample was extended by numerous panepithelial markers as well as lymphoma- and melanoma-markers. In line with the assay result, of all markers tested, the tumor cells revealed a robust and strong expression only for smooth muscle actin and focal robust tumor cell expression of vimentin, both of which are mesenchymal antigens frequently encountered in sarcomas (Fig. 3).
We present here an improved assay for prediction of the tissue of origin in metastatic samples. The second-generation assay employs the expression of 64 microRNAs to predict 42 tumor types, covering >92% of all solid tumors . The assay uses a custom-designed microarray and the results were highly reproducible when the assay was performed in two laboratories. The overall accuracy of the assay, based on an independent validation set of 509 samples, was high (85%), with 82% of the samples producing a single predicted origin with 90% accuracy. The assay was also validated on a set of CNS metastatic samples of patients with CUP, resulting in 88% concordance with the clinicopathological evaluation of the patients—an extremely high concordance compared with published studies looking at different genomic profiling approaches for diagnosing the tumor of origin in patients with CUP [9, 28, 29].
Our previous experience developing a clinical assay for the identification of the origin of metastatic tumors, which identifies the tissue of origin from 17 organs with a total of 25 histologic subtypes , has shown proven usefulness in clinical studies that demonstrate the high accuracy of the molecular profiling results [23, 24]. Even though the most common primary tissues of origin for CUP were represented in our first-generation assay, there was a desire to improve its clinical utility by including other carcinomas, such as urothelial carcinoma, carcinoma of the uterine cervix, additional histological subtypes for renal cell carcinoma, adrenocortical carcinoma and pheochromocytoma, and different types of sarcoma, mesothelioma, lymphoma, and primitive germ cell tumors of the ovary.
One of the challenges for the development of FFPE-based assays with retrospective samples is that older archival blocks may not provide RNA of sufficient quality to obtain meaningful results. Penland et al. reported successful mRNA expression analysis using microarrays in only one-quarter of unselected FFPE blocks that were between 2 and 8 years old . More recently, much higher failure rates have been described for mRNA-based expression used for clinical commercial assays: 22% for Pathwork CUP assay  and 6%–29% for BioTheranostics CUP assay [28, 31]. The QA failure rate for the microRNA-based assay presented here is 2% (11 of 509 cases of known origin and 1 of 55 patients with CUP) for specimens 1–20 years old without reduction in the quality of the RNA extraction or the accuracy of the assay results (supplemental online Fig. S4).
Potential issues were recently raised  about the use of microarray platform as compared to qRT-PCR, claiming lower sensitivity, batch effects, and a limited dynamic range of 102. These limitations were all indeed demonstrated for mRNA microarray measurement. In contrast, our microRNA microarray platform (supplemental online Fig. S1) demonstrated an extremely high reproducibility (at least 10 different batches of microarrays were used), sensitivity, specificity, and a dynamic range of >103, thereby demonstrating the validity of this platform for use in a clinical setting.
Another potential issue often raised regarding the development and validation of molecular profiling assays is the number of specimens used. The miRview mets2 assay presented here was developed based on 1,282 tumor samples and validated on a cohort of 509 tissue specimens that was independent of the discovery and training cohort. The size of the validation cohort is similar to the 547 tissue samples used for the validation of the Pathwork CUP  assay that uses an Affymetrix microarray platform  and significantly more than the 187 samples used for the validation of the bioTheranostics CUP assay that uses a qRT-PCR platform . The number of specimens per tumor class that were used in the training phase of the assay ranged from 5 to 140 (median 24); in the validation cohort, the range was 2–26 samples per tumor class (median 15; Table 1). The tumors with the smallest numbers were typically subgroups of larger categories (e.g., different types of sarcomas). The validation panel included primaries and metastases from different differentiation levels, including poorly and undifferentiated tumors.
The assay was further validated on a cohort of actual CUP patients, previously studied on the first version of the microRNA-based assay . This validation confirms the high level of accuracy of microRNA-based profiling in CUP cases that we have seen in the earlier study and also demonstrates the improvement of the new assay with an overall concordance to the clinicopathological evaluation in 88% of the samples compared with 80% concordance in the previous study. This high level of concordance can be compared to other commercial tests, which have similar performance in validations based on known primaries but show marked deterioration in performance when testing real patients with CUP. Pathwork reported 62% concordance  and bioTheranostics reported 75%–76% concordance [28, 31], compared with the 88% concordance when using the miRview mets2 assay.
Molecular profiling in CUP should be considered in the context of IHC, which is a standard diagnostic method used to determine tissue origin. IHC is a powerful tool in CUP [32, 33] cases, but even with the use of IHC, there remains a need for additional diagnostic methods. The choice of the IHC panel itself is a subjective decision that may be biased by the clinical history and presentation of the patient. Interpretation of the IHC results is also subjective, resulting in high interobserver and intraobserver variability. The objective and unbiased approach of this assay is a major advantage, as well as its high reproducibility demonstrated in the interlaboratory results comparison. Moreover, in >30% of the cases, the staining pattern of IHC does not result in a conclusive diagnosis . This may be the case for tumor locations for which no specific markers are available or dedifferentiated tumors which have lost expression of characteristic markers. The fact that we found no deterioration of performance of our assay between cases of known primary and CUP cases that are more difficult to diagnose suggests that this molecular assay adds information to that obtained by IHC. Thus, the miRview mets2 assay may complement IHC and guide diagnosis in difficult or uncertain cases, especially when IHC studies are inconclusive or incompatible with clinical findings.
Finally, any given assay able to predict tissue of origin with high sensitivity and specificity is potentially interesting for clinical oncologists. It is the more practical issues, however, that determine its definite clinical implementation in day-to-day practice. One major issue with expression platform-based analyses is time. Ideally, the timeframe from obtaining the tissue to the decision to process the tissue on the platform to the result of the platform analysis guiding all further clinical decisions should not exceed the time usually needed for a standard pathology workup of a surgically obtained specimen. The total turnaround time for the miRview mets2 assay is 7–10 days, which is a timeframe well suited to meet clinical needs. In addition, in the case of patients with cancers of unknown or uncertain primary origin, this short processing time allows unguided tumor evaluation and staging investigations to be put on hold until the analysis data are available. Besides better guiding patient management and therapy, this might also help reduce constantly growing evaluation costs in patients with cancers of unknown or uncertain primary origin.
In summary, this improved second-generation microRNA-based assay can serve as a reliable diagnostic tool to aid physicians with challenging diagnostic dilemmas.
See www.TheOncologist.com for supplemental material available online.
E.M., W.C.M., and S.R. contributed equally to this work. T.B.E. is currently affiliated with the Department of Pathology, Cooper University Hospital, Camden, NJ.
Conception/Design: Wolf C. Mueller, Shai Rosenwald, Tina Bocker Edmonston, Ayelet Chajut, Yael Spector, Ranit Aharonov
Provision of study material or patients: Wolf C. Mueller, Margot Werner, Ulrike Lass, Iris Barshack, Meora Feinmesser, Monica Huszar, Franz Fogt
Collection and/or assembly of data: Eti Meiri, Wolf C. Mueller, Shai Rosenwald, Merav Zepeniuk, Elizabeth Klinke, Tina Bocker Edmonston, Iris Barshack, Meora Feinmesser, Monica Huszar, Franz Fogt, Karin Ashkenazi, Mats Sanden, Eran Goren, Nir Dromi, Orit Zion, Ilanit Burnstein, Yael Spector
Data analysis and interpretation: Eti Meiri, Wolf C. Mueller, Shai Rosenwald, Tina Bocker Edmonston, Karin Ashkenazi, Mats Sanden, Nir Dromi, Ayelet Chajut, Yael Spector, Ranit Aharonov
Manuscript writing: Wolf C. Mueller, Shai Rosenwald, Tina Bocker Edmonston, Mats Sanden, Ayelet Chajut, Yael Spector, Ranit Aharonov
Final approval of manuscript: Eti Meiri, Wolf C. Mueller, Shai Rosenwald, Merav Zepeniuk, Elizabeth Klinke, Tina Bocker Edmonston, Margot Werner, Ulrike Lass, Iris Barshack, Meora Feinmesser, Monica Huszar, Franz Fogt, Karin Ashkenazi, Mats Sanden, Eran Goren, Nir Dromi, Orit Zion, Ilanit Burnstein, Ayelet Chajut, Yael Spector, Ranit Aharonov