|Home | About | Journals | Submit | Contact Us | Français|
In the clinical management of early-stage cutaneous melanoma, it is critical to determine which patients are cured by surgery alone and which should be treated with adjuvant therapy. To assist in this decision, many groups have made an effort to use molecular information. However, although there are hundreds of studies that have sought to assess the potential prognostic value of molecular markers in predicting the course of cutaneous melanoma, at this time, no molecular method to improve risk stratification is part of recommended clinical practice. To help understand this disconnect, we conducted a systematic review and meta-analysis of the published literature that reported immunohistochemistry-based protein biomarkers of melanoma outcome. Three parallel search strategies were applied to the PubMed database through January 15, 2008, to identify cohort studies that reported associations between immunohistochemical expression and survival outcomes in melanoma that conformed to the REMARK criteria. Of the 102 cohort studies, we identified only 37 manuscripts, collectively describing 87 assays on 62 distinct proteins, which met all inclusion criteria. Promising markers that emerged included melanoma cell adhesion molecule (MCAM)/MUC18 (all-cause mortality [ACM] hazard ratio [HR] = 16.34; 95% confidence interval [CI] = 3.80 to 70.28), matrix metalloproteinase-2 (melanoma-specific mortality [MSM] HR = 2.6; 95% CI = 1.32 to 5.07), Ki-67 (combined ACM HR = 2.66; 95% CI = 1.41 to 5.01), proliferating cell nuclear antigen (ACM HR = 2.27; 95% CI = 1.56 to 3.31), and p16/INK4A (ACM HR = 0.29; 95% CI = 0.10 to 0.83, MSM HR = 0.4; 95% CI = 0.24 to 0.67). We further noted incomplete adherence to the REMARK guidelines: 14 of 27 cohort studies that failed to adequately report their methods and nine studies that failed to either perform multivariable analyses or report their risk estimates were published since 2005.
Cutaneous malignant melanoma (CMM), which accounted for 62500 new cases of cancer in 2008, is the sixth most common malignancy in men and the seventh most common in women in the United States (1). Although 80% of new lesions are localized to the skin where effective surgical resections result in more than 95% 5-year survival (1), disease can recur in individuals with localized lesions despite appropriate management (2). Because adjuvant therapy is not broadly indicated for localized melanoma due to unfavorable risk–benefit ratios (3), there is a critical need to identify, at the time of diagnosis, the subset of patients most likely to benefit from adjuvant treatment to improve overall survival outcomes. Although, in addition to localization, nine clinicopathologic prognostic markers have been identified for CMM and have been used to establish clinically validated risk stratifications among melanoma patients (4,5), risk models based on these markers do not account for all of the observed variability in melanoma-related survival. Indeed, in melanoma (6–8) as in other cancers (9,10), tumors with identical clinical and histological parameters have markedly different mRNA expression profiles, and tumor subgroups classified by gene expression can be strongly associated with differential survival.
Immunohistochemistry (IHC) is a widely accepted and well-documented method for characterizing patterns of protein expression while preserving tissue and cellular architecture (11). The introduction of tissue microarray (TMA) technology, in which samples from several hundred individual tissue blocks can be spotted on a single glass slide (12), extends the rigor of IHC-based biomarker assays both by facilitating high-throughput analysis of candidate proteins across large patient cohorts and by substantially reducing misclassification of expression across the cohort through the application of consistent staining conditions and reagents (13). However, unlike genomic or proteomic experiments that can be performed in parallel on a massive scale, IHC/TMA experiments must be done serially using a candidate gene approach and data from individual experiments must be combined to establish multimarker prognostic discriminators.
Several recent reviews have been published, each of which surveyed published IHC data on melanoma and focused on prognostic applications (14–16). However, none of these surveys prioritized the available data according to REMARK study design or methodological assessment quality metrics (17). In addition, even among the high-quality studies, the heterogeneity in experimental procedures such as antigen retrieval, choice and final dilution of primary antibody, and antibody validation through appropriate positive and negative controls and interobserver variability in describing the staining patterns, selection of cut points, and assignment of specimens to categories could have influenced the direction, magnitude, or statistical significance of the proposed association (18,19). Furthermore, none of these reviews limited inclusion to proteins evaluated in a multivariable setting adjusted for known clinical prognostic characteristics, a REMARK requirement (17). Because new molecular markers need to enhance the current routine estimators of prognosis to be adopted for use in the clinic, studies that do not extend statistical analyses beyond univariate survival measures are less valuable than studies that do.
In this systematic review and meta-analysis, we sought to determine the candidate biomarkers for which there was sufficient evidence to support prospective validation in a controlled clinical environment and to identify the functional pathways for which the data either suggest a lack of involvement in melanoma prognosis or the need for additional investigation due to insufficient rigor among previously executed studies. We identified the subset of candidate IHC-based protein predictors of melanoma outcome from the published literature that were evaluated according to robust sampling, laboratory, and statistical methods. Then, by applying a systems-based approach to the eligible data, we examined which tumor-sustaining pathways and component proteins are prognostic for melanoma all-cause mortality (ACM), melanoma-specific mortality (MSM), and disease-free survival (DFS).
To identify all primary research articles that evaluated levels of candidate protein expression, as measured by IHC, as a prognostic factor among individuals diagnosed with CMM, we searched the PubMed medical literature database on January 15, 2008, without language restrictions, using the following three independent queries:
One reviewer (B. E. G. Rothberg) inspected the title and abstract of each electronic citation to identify those manuscripts that were likely to report the assay of melanoma samples by IHC and obtained their full texts. Supplemental PubMed searches by names of authors contributing to five or more potentially relevant manuscripts were performed to identify any additional manuscripts not included in the primary queries. In those cases in which several publications derived from the same set of IHC data, only the study presenting the largest dataset was included. Five manuscripts that were not published in English were translated into English for further evaluation.
We used published guidelines for reporting IHC-based tumor marker studies (17) and quality metrics for evaluating IHC-based studies for inclusion in cancer-related meta-analyses (19) as inclusion criteria for this review. Studies were eligible if they met each of the following six criteria: 1) prospective or retrospective cohort design with a clearly defined source population and justifications for all excluded eligible cases; 2) assay of primary cutaneous tumor specimens; 3) clear descriptions of methods for tissue handling and IHC, including antigen retrieval, selection and preparation of both primary and secondary antibodies, as well as visualization techniques; 4) a clear statement on the choice of positive and negative controls and on the outcome of the assay to ensure that the primary antibody used was a well-validated reagent; 5) statistical analysis using multivariable proportional hazards modeling that adjusted for clinical prognostic factors; and 6) reporting of the resultant adjusted hazard ratios (HRs) and their 95% confidence intervals (CIs). Because acral lentigionous melanomas, mucosal melanomas, and ocular melanomas display different clinical courses and molecular phenotypes from the more common cutaneous superficial spreading and nodular histological subtypes (20–22), studies describing results on non-Caucasian populations as well as those specific for acral lentiginous, mucosal, choroidal, or uveal melanomas were excluded. Studies were also excluded if they did not describe protein expression levels in melanoma cells and limited analysis to the associated stroma or vasculature. Within each study, only assays that evaluated proteins corresponding to mapped genetic loci were included; IHC reagents that targeted nonspecific “activities” or uncharacterized antigens were eliminated from further consideration. When authors described having assessed multivariable proportional hazards but the manuscript did not meet inclusion criteria because details describing the cohort, IHC methods, or the hazard ratio and 95% confidence interval were omitted, the corresponding author was contacted in an attempt to obtain the missing information. Letters were sent to 26 investigators, and responses were received from nine of them. Six responses provided missing IHC methods and/or risk estimate information for seven manuscripts (23–29), one additional response reported an indeterminate risk estimate that could not be used in meta-analysis (30), and one response (31) indicated that the authors no longer had the information that we had requested.
One investigator (B. E. G. Rothberg) reviewed each eligible manuscript and extracted data on the characteristics of the study, including number and type of melanoma tumors assayed, IHC methodology, and results. The data recorded about each study for metrics included first author’s name, institution, and country of origin; journal and year of publication; sample size; starting material (frozen vs paraffin embedded, whole slides vs TMA); clinical covariates incorporated in the multivariable statistical analysis; outcomes assessed; mention of blinding of those who assessed IHC staining to outcome; and the set of candidate proteins selected for analysis. We also redacted additional data concerning methods within each study, including primary antibody and dilution used, secondary signal amplification and coloration methods, IHC stain scoring scheme, survival analysis cut points and reference group, the computed multivariable hazard ratio and its 95% confidence interval, and the corresponding P value. When results were presented without confidence intervals or SEs, the P value was used to estimate the SE via the z-statistic.
All eligible individual protein assays were first sorted according to outcome and then according to the protein's major biological function. Protein function was determined following comprehensive review of the current scientific literature and classified according to the six acquired capabilities of cancer as defined by Hanahan and Weinberg (32): limitless replicative potential, evading apoptosis, insensitivity to antigrowth signals, self-sufficiency in growth signals, tissue invasion and metastasis, and sustained angiogenesis. To accommodate melanoma-associated antigens (eg, gp100, MelanA/MART-1) and immunomodulatory molecules (eg, major histocompatability complex class II), the Hanahan–Weinberg classification system was supplemented by two additional melanoma-specific functional categories: altered immunocompetence and melanocyte differentiation. For the set of proteins evaluated in a single study within one of the study outcomes, the summary hazard ratio (95% confidence interval) represents the value reported in that study. For proteins assayed in multiple studies, fixed effects summary hazard ratio and 95% confidence interval were calculated using the generic inverse variance method (33) and random effects models according to the Der Simonian–Laird method (34). Interstudy heterogeneity was assessed using the I2 statistic (35). An observed hazard ratio of more than 1 implied a worse outcome for the test group relative to the reference group and would be considered statistically significant if the 95% confidence interval did not overlap with 1 (P < .05). Meta-analyses were conducted using the REVMAN systematic review and meta-analysis software package, version 4.2 (Cochrane Collaboration; www.cochrane.org). To determine whether the proteins demonstrating statistically significant associations with outcome were equally distributed across the Hanahan–Weinberg acquired capabilities of cancer, the proportion of such proteins in each individual functional group was compared with their overall proportion using the one-sample test for a proportion (36).
The literature search strategy identified 1797 manuscripts for consideration (Figure 1; Supplementary Table 1, available online). Following the title and abstract search of these, as well as the supplemental-directed author searches, 515 manuscripts were identified that suggested the execution of IHC experiments on cutaneous melanoma samples, and full-text versions of these were retrieved. Of these, 30 studies recruited non-Caucasian patients’ samples, six solely evaluated staining of stromal or vascular elements and 24, after careful review of the study methods, did not perform IHC on human melanoma samples, so all 60 of these studies were excluded from further analysis. The remaining 455 studies, which collectively described IHC results for 387 unique proteins, were first triaged according to study design. Whereas 102 met the criteria for cohort study, 353 manuscripts were excluded for inappropriate study design. Among those excluded, 16 were case–control studies, 284 were cross-sectional analyses limited to determining the association between levels of marker expression with melanocytic lesion progression or with clinicopathologic parameters, and 53 were classified as case series for which the investigators failed to provide details on either the source population of melanocytic tumors or the sampling strategy. This latter group included 14 reports that performed multivariable proportional hazards modeling of survival outcomes (37–50), of which four included sample size greater than 70 (37,41,44,45) and one that met all other inclusion criteria, except for study design (43).
Among the 102 cohort studies, an additional 65 studies were excluded according to methodological or statistical criteria. Twenty-seven studies failed to completely describe their IHC methods and enumerate their positive and negative controls for antibody specificity validation (5,51–76), and an additional 21 methodologically robust manuscripts limited their analysis to univariate log-rank or proportional hazards computations (77–97). Eleven studies conducted multivariable analyses on methodologically robust data but failed to publish a hazard ratio (95% confidence interval) (31,98–107). One study restricted its analysis to metastatic lesions (108), and four studies had data that were completely redundant with larger included studies (109–112). We also excluded the study by Mihic-Probst et al. (30). In this otherwise eligible study, the choice of a p16/INK4A cut point at 50% of cells stained led to no events among the patients with high expression to yield an indeterminate multivariable hazard ratio that cannot be combined in meta-analysis.
Thirty-seven high-quality cohort studies from 21 independent research groups met the eligibility criteria for this systematic review by presenting multivariable survival estimates for differential levels of candidate protein expression as measured by IHC on primary cutaneous melanoma samples (Table 1). The included studies consist of one prospective cohort study (29) and 36 retrospective cohort studies (23–28,113–142). All 37 studies sampled archival formalin-fixed, paraffin-embedded tissue blocks. Of these, in 23 studies, IHC was performed on individual whole-slide tissue sections, and in 14 studies, TMAs were created using 1.5-mm- (113,114), 1.0-mm- (134,135), or 0.6-mm- (23,27,115–117,128–132) diameter cores from representative tissue regions. Among the TMA subset, the five studies performed at the Restoration of Appearance and Function Trust Institute (Middlesex, UK) included redundant sampling of individual tissue blocks (128–132). Third studies (23,115,117) used immunofluorescence-based staining, with the remaining 34 studies reporting data obtained from chromogenic stains. Twenty-three studies (23,29,114–118,122–125,129–132,134–136,138,139,141,142), including four that documented automated image capture and staining analysis (23,115,117,125), indicated that staining assessment was blinded to outcome status, but blinding status was unknown for the remaining 14 studies. Effective sample size of included melanoma patients ranged from 37 to 1270, with six studies including 75 or fewer individuals (26,28,114,133,136,141), 16 including 76–150 individuals (24,25,27,113,118,122,125–132,138,139), and 12 including between 151 and 300 individuals (23,115–117,119–121,123,124,137,140,142). Three studies (29,134,135), among them the prospective cohort that enrolled 1270 individuals (29), included more than 300 individuals.
Fifteen unique clinicopathologic factors were incorporated in one or more of the eligible multivariable analyses (Figure 2, A). Breslow thickness as measured in millimeters (143), the strongest and most reproducible clinical prognostic factor (4,144), was the most commonly occurring clinical covariate, with inclusion in 34 analyses. All five studies that included a single clinical covariate adjusted for Breslow thickness (24,113,114,125,133). Clark level of dermal invasion (145), which overlaps with (146) and, in smaller populations (n ~ 1000), can be collinear with Breslow thickness (147), was considered in 18 studies, of which 17 simultaneously adjusted for Breslow thickness, and one study (137) used Clark level of invasion as the exclusive measure of tumor thickness. Two studies (118,138) did not include any measure for tumor thickness. Ulceration, a validated prognostic factor that prompts tumor upstaging when present (4,5,148), was adjusted for in 21 of 37 studies. Other common adjustment parameters included gender (18 of 37 studies), age at diagnosis (17 of 37 studies), and anatomic location of the melanoma (12 of 37 studies). Twenty-one studies included three to five clinical parameters in their multivariable proportional hazards models; eight studies included less than three parameters, and another eight included more than five covariates (Figure 2, B).
Collectively, these 37 studies present data on 62 unique proteins. The majority of eligible manuscripts (n = 28) restricted their analysis to a single candidate protein marker and eight additional studies considered between two to five proteins. Only Alonso et al. (114) reported multivariable hazard ratios on a large series of proteins, with data available for 35 evaluated markers. Twenty-two of the 62 candidate biomarkers were evaluated for two outcomes, and two candidate biomarkers, Ki-67 and gp100, were evaluated across all three outcomes. Stratified by outcome, data were available on 43 proteins for ACM, 20 proteins for MSM, and 24 proteins for DFS. For 79 of the 87 unique marker–outcome combinations, a multivariable hazard ratio and associated 95% confidence interval were available only from a single study, with that value extracted as the corresponding summary estimate. For the remaining eight marker–outcome combinations, data were available from two or more studies and were combined using both fixed effects general inverse variance and Der Simonian–Laird random effects modeling to obtain a single summary hazard ratio and 95% confidence interval (Figure 3). For four of these associations (cyclin D1–ACM, Skp2–ACM, nm23–ACM, and metallothionein-1–DFS), which each combined two individual studies to create summary estimates, the fixed effects summary point estimate and 95% confidence interval were identical to the random effects summary statistic. For the remaining four studies, the random effects analysis yielded a more conservative result than the fixed effects estimate.
The 43 proteins evaluated for ACM were sorted among seven of the eight modified Hanahan–Weinberg functional capabilities (Table 2); no eligible assays were available for “sustained angiogenesis.” Thirteen (30.2%) of the 43 candidates had a statistically significant association with ACM at P < .05. Of the eight cell cycle proteins evaluated, only cyclin E (P = .03) had a statistically significant association with ACM, but two of four cell cycle regulators (p16/INK4A [P = .02] and p27/KIP1 [P = .02]) showed statistically significant associations. Four of eight DNA-damage checkpoint and repair proteins (Ki-67 [P = .002], PCNA [P = .03], Ku70 [P < .001], and Ku80 [P < .001]) also showed statistically significant multivariable associations. Among the regulators of tissue invasion and metastasis, chemokine receptor CXCR4 (P = .02), matrix metalloproteinase (MMP)-2 (P = .006), MCAM/MUC18 (P < .001), and tissue plasminogen activator (tPA; P = .04) were statistically significant. None of the four polycomb transcriptional repressor complex proteins assayed demonstrated any statistically significant associations. Overall, the proportions of biomarker candidates with statistically significant associations to ACM observed among evading apoptosis, insensitivity to antigrowth signals, limitless replicative potential, and tissue invasion and metastasis did not differ more than would be expected by chance (P > .05), with only the functional category of tissue invasion and metastasis (four proteins that had statistically significant associations among seven assays; 57.14%) approaching statistical significance (z-score = 1.55; P = .12).
Twelve of the 20 candidate biomarkers with eligible data for MSM demonstrated a statistically significant association with this outcome (Table 3). All eight modified Hanahan–Weinberg functional capabilities were represented, and all but altered immunocompetence possessed at least one candidate biomarker statistically significantly associated with MSM. Two functional categories, limitless replicative potential and self-sufficiency in growth signals, included statistically significant associations for more than 50% of assayed candidates with 3/4 and 3/3 proteins, respectively, showing statistically significant associations with MSM. The melanocyte differentiation category was unique in returning 33% or fewer statistically significant candidates, with only gp100 (P = .045) yielding a marginally statistically significant result. The small number of protein candidates within each functional category precluded analysis of proportions.
Eight MSM candidates were also evaluated for ACM. Among these, five proteins showed concordant associations between the two outcomes. Elevated p16/INK4A was protective for both ACM and MSM, whereas elevated MMP-2 or Ki-67 increased risk of both ACM and MSM. Changes in MelanA/MART-1 or double minute-2 (HDM-2) were not associated with either ACM or MSM. Discordant results were observed for three candidates: gp100, p53, and Bcl-2. In 2004, Alonso et al. (114) reported null associations for these with ACM, but other groups presented statistically significant associations with MSM in separate reports. To determine whether publication bias could have contributed to this discrepancy, results from the 14 methodologically robust studies that omitted the hazard ratio were reviewed. Two studies were identified that evaluated the association of p53 with MSM (100,106) and one with ACM (127). All three studies indicated that following adjustment for Breslow thickness, p53 was not statistically significantly associated with outcome. Whereas Niezabitowski et al. (n = 93) and Karjalainen et al. (n = 283) only indicated that P > .05, and Talve et al. (n = 80) specified P = .96. Without the actual point estimates and precise P values, however, we cannot determine whether the missing data would be sufficient to cancel the strong association (HR = 8.9; 95% CI = 2.7 to 29.0) observed by Straume et al. (137).
Twenty-four proteins representing six of eight functional capabilities were assayed for DFS, and 15 (62.5%) statistically significant associations were found (Table 4). Three categories, limitless replicative potential, self-sufficiency in growth signals, and tissue invasion and metastasis, each had four or more proteins assayed for DFS, and the number of statistically significant candidates was equally distributed among these categories (50%–75%; P = .70) and not different from the overall proportion of statistically significant markers (P > .40). Seventeen DFS candidates were also evaluated for either or both of the survival outcomes, with nine yielding concordant results for disease-free and overall survival. Among these, Id1, cyclin D1, and cyclin D3 were not associated with either outcome, whereas differential levels of PNCA, NCOA3/AIB-1, AP-2α, CXCR4, MCAM/MUC18, and metallothionein were predictive of both overall and DFS with similar directionality and magnitude of each association. The remaining eight proteins yielded discordant results, and statistically significant results were only observed for a subset of the evaluated outcomes. Survivin (P = .017), cyclin A (P < .001), and tenascin-C (sp) (P = .04) were statistically significant for DFS but not for mortality outcomes. Conversely, osteopontin (P = .10) and tPA (P = .15) were not associated with DFS but achieved statistical significance for mortality. Ki-67, which was independently statistically significant for both mortality outcomes (ACM, P = .002; MSM, P < .001), did not achieve statistical significance for DFS (P = .26) among eligible studies, and gp100 was statistically significantly associated with both MSM (P = .045) and DFS (P = .01) but not with ACM (P = .43) despite similar selection of cut points. Qualitative discordance, in which a protein would be protective for DFS but promoting of either mortality endpoint, was not observed.
In response to the need for independently prognostic molecular markers for CMM that are readily assayable on routinely acquired clinical specimens, we conducted a systematic review and meta-analysis of the published melanoma IHC literature to identify the subset of proteins for which the data support validation as prognostic biomarkers of melanoma outcomes. Using stringent inclusion and exclusion criteria that examined patient selection, as well as laboratory and statistical methods (17,19), we identified 37 high-quality cohort studies that published multivariable survival point estimates and SEs for 62 unique proteins. Individual biomarker assay data were organized according to outcome (ACM, MSM, or DFS) and, within each outcome, according to functional groupings that reflected the acquired capabilities of cancer as defined by Hanahan and Weinberg (32).
In terms of functional capabilities, proteins that facilitate tissue invasion and metastasis were most likely associated with melanoma prognosis as numerous subclasses displayed statistically significant results with one or more outcome. Increased expression of three cellular adhesion molecules, melanocyte-specific MCAM/MUC18, neuron-specific L1-CAM, and glandular tissue–associated CEACAM-1, was statistically significantly associated with worse DFS. MCAM/MUC18 was also evaluated for mortality and yielded concordant results. Overexpression of CEACAM-1 and L1-CAM typically occurs at the leading edge of tumors (139,149), and both CEACAM-1 and MCAM interact with β3-integrin (150,151). Thus, both findings support the involvement of these molecules with abnormal tumor–stroma interactions. We also found statistically significant results for multiple members of the matricellular protein family, which consists of secreted molecules that interface between the extracellular matrix and the cell surface receptors (152). Increased levels of osteopontin expression were statistically significantly associated with worse MSM and trended toward statistical significance for DFS. Increased levels of tenascin-C were associated with worse DFS and trended toward worse MSM. Elevated osteonectin expression trended toward, but did not achieve, statistical significance for worsened DFS. Among the proteases, increased levels of tPA (P = .04) and MMP-2 (72 kDa type IV collagenase; P = .006), but not of MMP-9 (92 kDa type IV collagenase; P = .46), were statistically significant for mortality outcomes. Although only two MMPs were addressed in studies eligible for this analysis, the observed differences in their prognostic value suggest that only a subset of melanoma-expressed MMPs affect outcome. Evaluation of additional MMPs will be necessary to test this hypothesis. Available data as well as data from a recent, otherwise eligible study published after the cutoff date for inclusion do not support independent prognostic roles for cadherins or catenins (113,130,153).
Among proteins that contributed to limitless replicative potential, effectors of DNA replication and repair (eg, Ki-67, PCNA, metallothionein, Ku70, Ku80, and microtubule-associated protein-2) were most consistently associated with disease-free and overall survival. In contrast, cyclins and cyclin-dependent kinases were not associated with melanoma prognosis. Cyclin E was the only cyclin among five cyclins examined to achieve statistical significance, but because these data were from a single small cohort study (114), validation in an independent, larger cohort is needed. The statistically significant association between cyclin A and DFS was mitigated by three separate studies showing no association between cyclin A and mortality. Cyclin-dependent kinase inhibitors were statistically significantly associated with mortality. Elevated levels of p16/INK4A demonstrated protective effects for both ACM (P = .02) and MSM (P = .007), consistent with the established role for p16/INK4A in regulating aberrant cell proliferation in cells of melanocytic origin (154). The paradoxically increased mortality observed with elevated p27/KIP1 levels (P = .02) supports recent observations that p27/KIP1 dysregulation occurs through its cytoplasmic sequestration rather than through protein degradation; cytoplasmic accumulation of p27/KIP has been associated with increased metastatic potential (155).
Although strengths of our study include a broad, unbiased survey of the available literature and the application of standard systematic review and meta-analysis methods to objectively identify the subset of studies with robust data for summarization, there were several limitations inherent to our approach. We did not extend our search criteria to meeting abstracts or other sources of unpublished data that may contain increased proportions of null results. Although limiting our search to published manuscripts risks publication bias for studies with statistically significant associations, these alternate sources likely contain inadequate methodological descriptions to satisfy our inclusion criteria. We also elected to divide the standard oncological endpoint of overall survival into ACM and MSM. Whereas ACM is considered more robust because it avoids nondifferential outcome misclassification due to cause-of-death misadjudication (156), it also requires adjustment for age where MSM does not (157). We separated ACM and MSM outcomes in anticipation of different adjustment covariates and potential sources of outcome measurement error. In doing so, we were not able to combine biomarker data across the two mortality outcomes, which compromised our ability to calculate robust summary estimates of individual biomarkers through meta-analysis.
This study is also limited because, for 38 of the included proteins, summary data across all outcomes were derived from association data presented in a single study, which, in 29 cases, included fewer than 100 samples. False-positive as well as false-negative results, the latter due to insufficient statistical power, cannot be ruled out. Validation of these results in additional, independent studies is warranted. For the subset of proteins that were evaluated in two or more studies, the cross-study heterogeneity in the execution of IHC experiments as well as the categorization and statistical adjustments for the clinicopathologic criteria may also contribute to measurement error of biomarker to outcome associations. Although the authors of the majority of these manuscripts adjusted for Breslow thickness, their approaches to parameterization varied, ranging from continuous assessment to binary categorizations. Both positive and negative confounding of risk estimates could arise from inconsistent adjustment for other accepted clinicopathologic prognostic factors such as ulceration, gender, age, and stage at diagnosis.
Variability in assessment of protein expression and subsequent cut-point selection across studies must also be considered as a potential source of bias. First, although four studies reported automated image capture and digitized assessment of candidate biomarker expression, in the remaining studies, one or more of the investigators visually determined levels of protein expression, which could contribute to misclassification, especially among the 14 studies for which blinding status of these pathologists was not known. Next, for the majority of markers, selection of cut points to define categories of protein expression was arbitrary and could vary from study to study. For Ki-67, two studies (114,127) selected a cut point at 20% cells staining positively, one study selected 16% (137), and one study selected 5% (120). Similar variability was also observed for p16/INK4A, gp100, MMP-2, and osteopontin. Validation and adoption of consensus cut points across the melanoma community could facilitate replication of results. Finally, as automated image capture platforms that calculate expression as a continuous parameter gain popularity, the challenge of combining results reflecting dose–response relationships must be addressed. Of the eligible data in this review, six of 87 associations relied on quantiles of expression (23,117), reported a dose–response (119,124), or defined expression based on ratios of subcellular localizations (115,116). Reporting of such results requires extra rigor because meta-analysis of these data is hampered if the reporting is not done correctly or consistently. Although the most simple and straightforward method to report such data for combination in a meta-analysis is categorical parameterization and estimation of hazard ratio for all categories relative to a baseline category, this approach consumes more df than estimation of dose–response from categorized data. If dose–response is estimated from categorized data, the data must be accompanied by the exposure value assigned to each category so that the hazard ratio per unit increase can be extracted for meta-analysis (158).
The execution of this systematic review and meta-analysis has illuminated gaps in IHC-based melanoma prognostic biomarker research. Most notable is the limited and highly selective number of proteins with eligible data. Several factors may have contributed to the paucity of rigorously studied candidate proteins. First, unlike genome-wide massively parallel genomics or proteomics platforms, IHC analyses must begin with candidate nominations that are based on a priori biological rationales, followed by their prioritization for execution in serial assays. The strong influence of research trends leads to significant selection bias in candidate prioritization. The rather comprehensive evaluation of cell cycle proteins and their regulators originates from the well-characterized increased risk of familial melanoma in individuals with heritable mutations in the gene encoding the p16/INK4A cyclin-dependent kinase inhibitor (159). Examination of the proliferation markers Ki-67 and PCNA as well as the DNA-damage regulator p53 was supported by their long-standing roles in regulating the progression of many cancers (160–162). Conversely, proteins that have not been often linked to direct involvement in melanoma are less likely to have been rigorously examined as potential prognostic biomarkers. For example, selective expression of chemokines and their cognate receptors on tumor cells contributes to metastasis during both initial invasion and selective homing to distinct target organ sites (163,164). Although basic research has associated expression of chemokine receptors CXCR1, CXCR2, CXCR3, CXCR4, CCR5, CCR7, and CCR10 with metastatic behavior of melanomas (165–171), rigorous prognostic data are only available for CXCR4, for which increased expression is associated with poorer outcome (136). Although CXCR1, CXCR2, CXCR3, CCR9, CCR10, and CCXR1 have been evaluated, these analyses were either limited to correlations with progression (172–174) or did not meet the criteria defining a cohort (175).
Another factor driving candidate selection is reagent availability. Even if mRNA expression profiling were used to generate an unbiased list of candidate genes for subsequent independent validation by IHC, such an approach would be thwarted if no validated antibodies against selected candidates were to exist. Many transcripts that are highlighted in microarray experiments lack functional characterization or are only annotated according to their clone identifier from high-throughput transcript sequencing projects (eg, KIAA, German Cancer Research Center [DKFZ]), and the corresponding proteins, if they exist, are least likely to have commercially available antibodies. For example, a comparative analysis of DNA microarray data suggested that CITED, an X-linked gene that regulates the transcription of tyrosinase and dopachrome tautomerase, was one of few genes consistently associated with melanoma progression across multiple studies (176). Despite these results, IHC correlates are limited to a single descriptive cross-sectional study that considered a small sample of eight nevi and 14 primary melanomas using a proprietary rabbit polyclonal antibody obtained from a collaborating laboratory (177).
The attrition of IHC-based studies lacking one or more inclusion criteria most severely limited the number of analyzable proteins. Our three parallel keyword-based PubMed searches identified 455 manuscripts describing IHC staining patterns for 387 distinct proteins. Yet, only 37 studies that collectively published 87 assays on 62 unique proteins met all eligibility criteria for inclusion in this systematic review. For 173 proteins, best analysis was restricted to cross-sectional correlations with melanocytic lesion progression or clinicopathologic criteria, and many proteins showed statistically significant associations with these endpoints. Although statistical significance in such cases does not guarantee prognostic relevance, none of these candidates were pursued in prognostic experiments. For example, among growth signaling proteins, statistically significant associations were most frequent among either transcription factors (ATF-2, AP-2α) or transcriptional coactivators (NCOA3/AIB-1), suggesting altered transcriptional regulation as a pivotal step in regulation of melanoma-specific survival. Because only five additional high-quality assays across all three outcomes reported associations for growth factor receptors and intermediate signal transduction molecules, we cannot rule out the possibility that upstream signaling components share a similar role. Eleven studies reported data on c-Kit (31,109,114,178–185), with only one eligible for inclusion in this review (114). Additional signal transduction components with melanoma IHC data that did not meet eligibility criteria included c-Met (79,186,187), epidermal growth factor receptor (188–190), fibroblast growth factor receptor-1 (95,179,191), trk-C (192,193), akt (54,93,194), PTEN (93,195,196), p42/22 extracellular signal–related kinases (85,97), p38 mitogen–activated protein kinase (84), jun amino-terminal kinase (84), and c-myc (69,197–201). Another functional capability lacking eligible data despite numerous published experiments is sustained angiogenesis. Although only a single protein from this group, iNOS (118), was available for inclusion in our study, all 18 reports regarding VEGF (31,94,95,98,110,182,202–213) as well as those that evaluated VEGF receptors (94,95,208), ephrins and their receptors (95,214,215), or hypoxia-inducible transcription factors (98) as melanoma biomarkers did not meet inclusion criteria.
Of greatest concern is the subset of 125 proteins for which best evidence came from a study that described a prognostic endpoint but was dropped from this analysis due to methodological inadequacies; this included 13 proteins from case–control studies, 39 from case series and 73 from cohort studies that did not meet all the prespecified inclusion criteria. Despite the fact that the REMARK guidelines outlining minimum reporting criteria for molecular prognostic studies had been published in seven cancer-based peer-reviewed journals from 2005 to 2006, 14 of the 27 cohort studies that failed to adequately describe their IHC methods were published since 2005. An additional 96 potential biomarkers from 35 otherwise robust cohort studies (nine published since 2005) were excluded because either only univariate survival data were published (n = 44) or a multivariable analysis was executed, but the actual hazard ratio and 95% confidence interval were omitted (n = 52). REMARK guidelines state that the investigators must execute a multivariable analysis that includes the marker with all standard prognostic variables and must report this hazard ratio and associated confidence intervals regardless of statistical significance (17). Because of the volume of pre-REMARK manuscripts, we sent letters to 26 investigators requesting methodological details that had been omitted, or, for those who had reported multivariable statistical analysis, the missing point estimate and confidence intervals. Responses were received from nine groups; three groups indicated that they no longer had our requested information, and the remaining 17 queries went unanswered. Taken together, these findings suggest slow uptake and implementation of the REMARK guidelines, at least in the melanoma research community.
Exclusion of the 52 otherwise eligible biomarker assays in which estimated effects and confidence intervals were omitted, of which all but two predated the REMARK guidelines, constitutes an important source of publication bias because 44 described results that were not statistically significant, three indicated indeterminate results, and only five reported statistically significant associations. Omission of these data may contribute to overestimation of the prognostic utility for these markers and for their assigned functional pathways. Four excluded assays described associations between mortality and Ki-67, with three (total n = 206) yielding results that were not statistically significant (28,104,107) and one (105) demonstrating a statistically significant relationship. Although summary estimates among the eligible data were statistically significant, substantial interstudy heterogeneity was also observed, which suggests that these omitted studies will likely influence the true relationship between Ki-67 expression and ASM or MSM.
Finally, IHC-based prognostic marker studies, by serially investigating individual candidates and estimating their independent effects, evaluate only the marginal effects of individual proteins on prognosis and overlook the complex interplay between molecular pathways and their constituent proteins to support tumor progression. Modeling joint effects for complimentary proteins requires evaluation on the same cohort, entry into a single statistical model, and analysis for effect modification. Third or higher order interactions typically require sophisticated statistical models such as regression tree (CART) analysis for survival outcomes (216).
This systematic review of published IHC-based CMM molecular prognostic marker research supports involvement of cyclin-dependent kinase inhibitors, effectors of DNA replication and cell proliferation, growth-promoting transcription factors, and multiple regulators of tissue invasion and metastasis (the latter including cell adhesion molecules, matricellular proteins, and selected matrix metalloproteinases) in modulating melanoma outcomes. These results, however, need to be validated in adequately powered prospective studies designed to test both joint and marginal effects. At the same time, this study revealed substantial limitations in areas ranging from the choice of assayed proteins to the consistency and quality of published studies that strongly impacted the set of candidates available for consideration. The persistence of incomplete adoption of the 2005 REMARK guidelines should be addressed by the collective melanoma research community. This list of shortcomings may explain why molecular prognostic markers have largely failed to be incorporated into guidelines, staging systems, or the standard of care for melanoma patients.
National Institutes of Health (CA R01 CA 114277 to D.L.R., P50 CA121974 to Ruth Halaban).