|Home | About | Journals | Submit | Contact Us | Français|
Personalized cancer risk assessment remains an essential imperative in post-genomic cancer medicine. In hereditary melanoma, germline CDKN2A mutations have been reproducibly identified in melanoma-prone kindreds worldwide. However, genetic risk counseling for hereditary melanoma remains clinically challenging. To address this challenge, we developed and validated MelaPRO: an algorithm that provides germline CDKN2A mutation probabilities and melanoma risk to individuals from melanoma-prone families. MelaPRO builds upon comprehensive genetic information, and uses Mendelian modeling to provide fine resolution and high accuracy. In an independent validation on 195 individuals from 167 families, MelaPRO exhibited good discrimination with a concordance index (C) of 0.86 (95% CI: 0.75–0.97) and good calibration, with no significant difference between observed and predicted carriers (26; 95% CI: 20–35, as compared to 22 observed). In cross-validation, MelaPRO outperformed the existing predictive model MELPREDICT (C: 0.82; 95% CI: 0.61–0.93), with a difference of 0.05 (95% CI: 0.007 to 0.17). MelaPRO is a clinically accessible tool that can effectively provide personalized risk counseling for the hereditary melanoma family members.
In 2009, there will be an estimated 68,720 new cases of melanoma with 8,650 deaths (1). Despite decades of therapeutic investigation, metastatic melanoma is still considered incurable, thereby making identification of high-risk individuals with an eye towards early detection a cornerstone in the strategy for cure.
A fundamental goal of personalized medicine is to uncover germline variants that identify individuals at the greatest risk for disease. For melanoma, the first such mutations were found in CDKN2A over a decade ago (2). Since then, heritable alterations in CDKN2A (encoding two proteins: p16/Ink4a and p14/ARF) and CDK4, the inhibitory target of p16/Ink4a, have also been found in a significant subset of melanoma-prone families (2–8). Earlier validation of a computational tool – MELPREDICT, for estimating CDKN2A carrier probability, showed reasonable performance in ranking carriers higher than noncarriers among melanoma patients (9). However, MELPREDICT is based on logistic regression models and therefore cannot effectively incorporate crucial biological information embedded within the pedigree structure. Moreover, it lacks the flexibility to account for variations in CDKN2A mutation prevalence and penetrance across geographical regions (3,4).
To this end, we developed MelaPRO, a new model to estimate the probability of carrying a mutation in CDKN2A in melanoma families, using a general Mendelian risk prediction approach (10) that integrates Mendelian inheritance and Bayesian probability theories. This computational strategy effectively translates genetic information into a clinically useful algorithm for carrier probability estimation and has been successfully applied to develop BRCAPRO (11–14) for the breast and ovarian cancer syndrome, MMRpro (15) for the Lynch syndrome and PancPRO (16) for familial pancreatic cancer. In this initial validation, we show that MelaPRO exhibits strong discrimination and calibration ability, and outperforms the regression model, MELPREDICT.
MelaPRO translates population estimates of the mutation prevalence and penetrance of CDKN2A into mutation prediction for any designated family member (the counselee), given his or her family history and assuming autosomal dominant inheritance. The penetrance refers to the age-specific risk of developing cutaneous melanoma depending on CDKN2A carrier status and gender.
The carrier probability is modeled via Bayes’ rule as follows (10):
Here, Pr denotes probability, genotype denotes whether the counselee carries a deleterious mutation in CDKN2A, and history denotes family history (as detailed in Table 1). The Prgenotype term is the mutation prevalence; the Prhistory|genotype is a weighted average of the probabilities of family history given each possible genotype configuration of all relatives, where the weights are the probabilities of the genotype configuration based on Mendelian transmission. This step uses the Elston-Stewart algorithm (17), as implemented in the latest version of the R package BayesMendel1. The probability of family history given each genotype configuration can be broken down into the product of each relative’s probability of phenotype given genotype, assuming conditional independence. Here, each probability term is calculated as either the cumulative penetrance (age specific) for affected relatives (MPM or SPM) or 1-cumulative penetrance for unaffected relatives. The Prhistory is the sum of terms like Prgenotype X Prhistory|genotype across all possible genotypes of the counselee. Risks of developing SPM and MPM for unaffected individuals are estimated by a weighted average of the carrier’s and noncarrier’s penetrance, where the weights are the carrier probabilities.
MelaPRO incorporates three distinct penetrance estimates. The GenoMEL (3) consortium collected high-risk families (>2 affected family members) and estimated separate penetrances for areas with high baseline incidence (HBI) and low (LBI) up to age 80 using logistic regression. Alternatively, the GEM (4) Study Group collected melanoma patients from the general population and estimated the penetrance in 5-year age intervals using the nonparametric kin-cohort method. We extrapolated the GenoMEL data and interpolated the GEM data using estimates from the SEER DevCan Software2 as a reference, to establish the age-specific penetrance between ages 1 and 110. We then calculated the mutation prevalence indirectly from the penetrance estimates. By Bayes’ rule, we have Pr(G)=Pr(G|B)XPr(B)/Pr(B|G), where G denotes being a CDKN2A carrier, and B denotes new cases per year between 2001 and 2005. From previous studies, we obtained Pr(G|B) = 0.0179 (4) and Pr(B) = 19.4/100000‡ for the North American population. Pr(B|G) is a weighted average of probability distribution function of penetrance for CDKN2A carrier, where the weights are melanoma incidence within 10-year age interval3.
We used X to indicate number of primary melanomas and G to indicate carrier status, with X=1 for SPM and X≥2 for MPM. The published penetrance estimates are P0 = Pr(X≥1|G = 0) and P1 = Pr(X≥1|G = 1). The relative risk of MPM for carriers and noncarriers among melanoma cases is Pr(X≥2|X≥1,G=1)/Pr(X≥2|X≥1,G=0)=1.8 (18), and the risk ratio of having MPM versus SPM for carriers is Pr(X≥2|G=1)/Pr(X=1|G=1) = 1.14 (by age 50, ref. 4). Based on these numbers we estimated the MPM and SPM specific penetrances.
For the genetic results, the default specificity was set at 1.0, because only known mutations were included in the analysis; putative polymorphisms (e.g. Ala148Thr) and variants of unknown significance (VUS) were excluded since accurate penetrance data are not available for these alterations. As such, the model does not currently calculate the probability of detecting established polymorphisms or variants of unknown significance (VUS). Other possibilities for a false positive, such as sample confusion, can be considered negligible. Since our mutational screen does not detect deep intronic mutations and large chromosomal deletions, we set our sensitivity at 0.9 presuming that these types of deleterious changes occur in no more than 10 percent of the cases. It is straightforward for users to replace these estimates with different ones.
The MELPREDICT model is a multiple logistic regression, in which the estimated carrier probability of the counselee being a mutation carrier is given by eL/(1+eL). L= β0 + β1 × (no. of counselee primaries) + β2 × (no. of additional family primaries) − β3 × (ln(counselee age)).
We used data from the Massachusetts General Hospital Melanoma and Pigmented Lesion Center (PLC). This series was not used in the development of MelaPRO and provides an independent validation. This study was performed in accordance with a protocol approved by the MGH Institutional Review Board. From April 2001 to January 2008, all patients with invasive or in-situ melanoma evaluated at the PLC were screened for eligibility as follows: (1) ≥1 first-degree relatives with melanoma, or (2) ≥2 affected relatives with melanoma on one side of the family (first- or second-degree), or (3) ≥3 primary cutaneous melanomas irrespective of family history. The presence and number of melanomas for counselees were confirmed via pathology reports, except for a small number of cases (<10%, data not shown). Medical record confirmation of reported family histories was pursued but limited to relatives who provided prior consent to participate in the study. We excluded two families: one because it lacked counselee information and the other because the counselee is unaffected and therefore ineligible for comparison with MELPREDICT.
CDKN2A exons 1α, 1β and 2 were screened for sequence variants as previously described (9).
All analyses were performed in R4. Within each family, we assigned each CDKN2A-tested individual in turn as the counselee and calculated the probability of detecting a CDKN2A mutation using MELPREDICT (9) and all modules of MelaPRO. For the MelaPRO modules, this probability is obtained by multiplying the probability of carrying a CDKN2A mutation, provided by the model, by the sensitivity of the mutation analysis (default=0.9). The comparison between models required additional exclusion of four cases in which tested individuals were unaffected, as MELPREDICT does not apply. We evaluated the discrimination, calibration and accuracy performance of each model by comparing the calculated probabilities with the observed mutational status. Discrimination reflects a model’s ability to differentiate individuals with positive outcomes from those with negative outcomes. It can be visualized using Receiver Operating Characteristic (ROC) curves and summarized by the underlying area, or concordance index (C). Calibration is a model’s ability to make unbiased estimates of the proportion of carriers. We also used positive predictive value (PPV) and negative predictive value (NPV) to measure accuracy, and mean squared error (MSE) for an overall comparison of performance. MELPREDICT was developed based on a subset of our validation set. Therefore, we used cross-validation (leave-one-out) to obtain evaluation statistics for MELPREDICT. In our cross-validation, we fixed the covariates selected by the original MELPREDICT model, but re-estimated the coefficients in each training set. We obtained 95% confidence intervals using the bootstrap (19). We also evaluated sensitivity and specificity for the descriptive classifier (FH) defined by having at least 2 affected relatives. We present hypothetical but realistic family history scenarios for illustration.
The MelaPRO model treats melanoma family history as a diagnostic test or profile, and CDKN2A genotype as an occult condition to be diagnosed. To use MelaPRO during a typical counseling session, the counselor collects the counselee’s family history information, and enters it into MelaPRO to obtain a carrier probability and an estimate of future risk if the counselee is still free of the disease. The family history information is detailed in Table 1, and it includes family members’ relationship, occurrence of cutaneous melanoma (including whether single or multiple primaries were found), age of diagnosis, or age at last contact for unaffected family members, and earlier germline testing results of any family members, if available. There is no restriction to which family member can be designated the counselee and no limit to the size of the family tree that can be processed, as long as there is no inbreeding. Predictions can be obtained using any subset of the information in Table 1.
Figure S1 shows the penetrance estimates from GenoMEL (3) and GEM (4), which applied to all melanoma diagnoses combined. We estimated the allele frequency (mutation prevalence) as 0.00015 using the HBI penetrance; 0.0003 using the LBI penetrance; and 0.0004 using the GEM penetrance. Additionally, MelaPRO incorporated an estimate that 53% of diagnoses in carriers are MPM compared to 30% for noncarriers.
MelaPRO provides three modules: MelaPRO-HBI (HBI), MelaPRO-LBI (LBI) and MelaPRO-GEM (GEM) reflecting different penetrances, now adjusted to be MPM/SPM-specific. Users choose the module that best matches the population where the model is used to the characteristics of the original studies. To illustrate, in Figure 1, for scenario 1 (Table 2), MelaPRO gave a probability estimate of 0.43-HBI, 0.90-LBI and 0.85-GEM. For comparison, the probabilities without the MPM/SPM adjustment are: 0.27-HBI, 0.83-LBI and 0.74-GEM.
Users can also specify the sensitivity (default=0.9) and specificity (default=1) of the germline testing method when results are available for some family members.
MelaPRO is open source and freely available as part of the BayesMendel (10) risk prediction package at http://astor.som.jhmi.edu/BayesMendel/ and the Cancer-Gene (20) counseling package at http://www4.utsouthwestern.edu/breasthealth/cagene/.
Figure 1 illustrate how MelaPRO provides high-resolution information to support clinical counseling, by presenting carrier probability estimates for several hypothetical, but realistic, scenarios. We compared our results to the descriptive classifier FH, and to MELPREDICT (9), a logistic regression model based on number of primary melanomas in the counselee, in all other family members and the counselee’s age at diagnosis.
In the pedigree, MELPREDICT estimated a carrier probability of 0.24 as compared to the MelaPRO’s estimate of 0.43 (HBI, see other modules in Table 2). MelaPRO captured the two relatives’ earlier disease onset (59 years in general population5) as additional indication of carrier status. It also responded, with considerable increase in probabilities, to modification of the father’s disease history, while the total number of familial melanomas remained the same (HBI: 0.43 to 0.77 and 0.43 to 0.82). The additional scenarios demonstrate how carrier probabilities varied as the number of affected individuals and patients’ relationship to counselee were changed (i.e. aunt healthy or brother affected).
We assembled a validation set containing 167 families with an average of 29 members. There were, in total, 26 carriers, 22 of which were affected with melanoma, and 603 primary melanomas. The mean number of primary melanomas in families of carriers and noncarriers was 7.9 (95%CI: 5.5–10.3) and 3.2 (95%CI: 2.9–3.5), respectively. There were 207 genotyped individuals within the 167 families. Among these, 195 were cases, with 85 males and 110 females. The mean age at diagnosis was 46.4 years (95% CI: 43.5–49.3) for males and 41.3 years (95% CI: 38.6–43.9) for females, and it was 36.6 years (95% CI: 30.7–42.6) for affected carriers and 44.4 years (95% CI: 42.3–46.4) for affected noncarriers. The proportion of mutation carriers increased with the number of primary melanomas in the counselee, the number of affected relatives, and the number of primary melanomas in relatives (Table S1). There were a total of eight relatives from seven families affected with pancreatic cancer.
The Boston validation cohort is derived from a relatively high incidence region (3) and is familial in ascertainment. We deployed MelaPRO-HBI and predicted the presence of approximately 26 mutations (95%CI: 20, 35); the MelaPRO-GEM module predicted 41 mutations (95%CI: 31, 58), and MELPREDICT predicted 20 mutations (95%CI: 19, 24, see Observed/Expected (O/E) ratios in Table 3). Both MelaPRO-HBI and MELPREDICT showed a close correspondence with the observed 22 mutations. MelaPRO-GEM and MelaPRO-LBI predicted a substantially higher number of mutations than was observed, likely because their parameter estimates do not fit our cohort profile.
MelaPRO shows good discriminatory ability with all three modules. Figure 2 shows the ROC curves for MelaPRO and MELPREDICT, as well as the sensitivity and specificity based on the summary family history criterion FH. The corresponding AUCs are presented in Table 3. The difference between the AUC for MelaPRO-HBI and that for MELPREDICT is 0.05 (95% CI: 0.007 to 0.17). Part of this difference is attributable to the gap visible at the top right of Figure 2: MelaPRO achieved an estimated sensitivity of 90% at the cost of about 70% false positives, while MELPREDICT provided limited discrimination at this level of sensitivity. The point corresponding to the sensitivity and specificity based on FH lay below the ROC curves, with an 81% sensitivity at the cost of a >40% false positive rate, while model-based prediction achieved higher sensitivity with 10% fewer false positives.
We also investigated the accuracy of MelaPRO and MELPREDICT predictions associated with a carrier probability cutoff of 50%. The positive predictive value (PPV) was 0.70, 0.57 and 0.44 for MelaPRO-HBI, MelaPRO-GEM and MELPREDICT, respectively. The negative predictive values (NPVs) were 0.97, 0.97 and 0.90 for the same three models. The mean squared error of prediction, which evaluates the overall performance of the algorithm, was significantly better in the MelaPRO-HBI (0.06, 95%CI: 0.03, 0.08) and MelaPRO-GEM (0.08, 95%CI: 0.06, 0.11) modules than the MelaPRO-LBI (0.19, 95%CI: 0.15, 0.22) module, with the former two slightly better than MELPREDICT (0.09, 95%CI: 0.04, 0.12, see Table 3). We then considered how often MelaPRO led to a re-classification compared to MELPREDICT and FH. As shown in Table 4, the re-classification fraction ranged from 4% to 34%. MelaPRO-HBI re-classified correctly 5, 10 and 65 more individuals than MelaPRO-GEM, MELPREDICT and FH respectively. The 50% threshold was chosen for illustrative purposes only and is not based on any clinical recommendations.
One’s ability to create a personalized risk portfolio for patients with hereditary melanoma remains a formidable challenge. To this end, we have developed and successfully validated MelaPRO for individualized CDKN2A carrier estimation. This open-source tool delivers a useful and easily deployable instrument for cancer risk counselors who wish to frame a more informative discussion for individuals pursuing CDKN2A genetic testing. Our results indicate that MelaPRO provides high resolution and accurate risk assessment, discriminating between individuals with or without germline mutations in CDKN2A.
An ideal personalized risk model would rely on a menu of modules that best fit the clinical profile. Since geographical location and other unknown genetic factors which may co-segregate with melanoma families appear to influence both penetrance and prevalence of CDKN2A mutations (4), we constructed three distinct MelaPRO modules based on separate penetrance estimates: GenoMEL-HBI, GenoMEL-LBI and GEM. This is a first step towards accounting for both genetic and environmental factors. We also derived the corresponding mutation prevalence of CDKN2A based on penetrance. As more data emerges, these estimates can be easily updated, so that the model will continue to operate using the best available information. For example, using allele frequency that is specific to Europe might further improve the performance of GenoMEL-LBI in this population. The prevalence estimate (0.00015) for MelaPRO-HBI matches that from Bishop et al. (3). Sensitivity analysis with variation in prevalence (between 0.00015 and 0.0004) showed similar discrimination performance in all modules, and higher O/E ratios with lower prevalence values for GEM and LBI. In the Boston validation set, the GenoMEL-HBI presented significantly better performance than others (Table 3), suggesting the utility of the geography and ascertainment specific modules.
MelaPRO treats individuals with SPM and MPMs differently, providing higher resolution and more accurate CDKN2A mutation risk. With the Boston validation data, the MelaPRO-HBI model without MPM adjustment gave higher probabilities to SPM families, where the counselees are often noncarriers. Overall, it gave a lower O/E ratio: 0.73, and a slightly lower C-index: 0.84. Our assumption of constant MPM/SPM risk ratios across ages can be modified as more data becomes available. Similarly, the current MelaPRO provides the basis for more refined models incorporating polygenic effects, risk modifiers and biomarkers as their role becomes clarified. Future iterations of MelaPRO will also incorporate two known risk factors: MC1R status and history of pancreatic cancer in the family.
MelaPRO captures the full pedigree data, including information on affected and unaffected family members, and is therefore able to further discriminate between individuals at higher and lower risk with the same number of affected family members. Part of the Boston validation set was used to develop MELPREDICT, specifically for choosing the covariates in the final model. Although the cross-validation should correct for part of the optimism that is associated with internal validations, it does not account for variability across studies. Therefore the gap between MelaPRO’s and MELPREDICT’s performances would likely be wider in a new independent set of families. The improvement of 0.05 in the concordance index C corresponds to real advances clinically at a personal level, as evidenced by the PPV/NPV and reclassification results. Lastly, MELPREDICT is not applicable to unaffected individuals in melanoma-prone families. MelaPRO is more powerful as a clinical instrument because it is applicable to the entire family and calculates pre-disease estimates of carrier probability and melanoma risk.
In the current study, we built and evaluated MelaPRO on known deleterious mutations while excluding known polymorphisms (e.g. Ala148Thr). However, MelaPRO quantifies degree of genetic segregation in melanoma families and may give high probabilities to carriers of variants of unknown significance (VUS) that have similar effects to the known variants in CDKN2A. Going forward, MelaPRO can further accommodate errors in classification of variants as deleterious or polymorphic, by changing the sensitivity and specificity accordingly. In broader terms, what is critically needed is a robust biochemical or genetic assay for p16/Ink4a and p14/ARF functionality, which will fundamentally improve the accuracy of risk predictions.
From the clinical perspective, MelaPRO can be easily incorporated into any genetic counseling session. Most melanoma clinicians appreciate the importance of family history in a qualitative, but not necessarily quantitative sense. However, in clinics, individuals are frequently referred for genetic counseling without regard for pedigree structure or counselee affection status - all features that can be captured for better estimation through MelaPRO. The current consensus is that melanoma patients who have either an affected first-degree relative or more than one affected relative on one side of the family, or unaffected individuals with two or more cases of melanoma in close relatives may benefit from a genetic risk assessment. In some situations, MPM patients without family history may also consider counseling. In contrast, unaffected relatives from single-case kindreds who present to their physicians for routine mole checks comprise the largest group of at-risk individuals with a “family history” of melanoma; but, these individuals in general do not need genetic consultation since the likelihood of harboring a germline CDKN2A mutation is likely to be close to 1% (18). Likewise, a melanoma patient with a single, distant family history of melanoma, especially if not substantiated by a medical record, would not routinely need genetic risk counseling unless special circumstances exist. Beyond the valuable exercise of counseling, the decision to undergo CDKN2A germline testing should be made in conjunction with a trained professional who can integrate the genetic, psychological and social implications of genetic testing.
Our ability to assess calibration and discrimination is limited by the 22 CDKN2A mutation carriers found in the validation set. We excluded four carriers, as they were ineligible for MELPREDICT analysis. We also evaluated the performance of MelaPRO alone, using all carriers and obtained similar results. Although suboptimal for analysis, the 11% mutation rate among all individuals with a family history of melanoma is probably appropriate for general clinical use (3, 4, 21). In addition, since most counselees in the validation set were melanoma patients, we could not properly evaluate the model on unaffected individuals. Finally, MelaPRO does not explicitly account for germline CDK4 variants. However, since CDK4 mutations are thought to be exclusive to CDKN2A mutations and since extant data suggest that the CDK4-mutation phenotype is identical to the CDKN2A-mutation phenotype (9), MelaPRO assumed CDKN2A and CDK4 as a single genetic unit without resorting to a two-locus model. There were no CDK4 kindreds among our families.
In summary, we have developed and validated a risk prediction model, MelaPRO, whose central goal is to enhance melanoma risk counseling by providing accurate pre-test assessment of CDKN2A carrier probability. MelaPRO’s architecture is of such flexibility that when data become available through ongoing and proposed gene/gene and gene/environment studies, such biological information can be readily assimilated into the model, achieving higher resolution and higher accuracy in risk assessment.
This work was supported in part by American Cancer Society grant RSG MGO-112970 (to H.T. and G.P.), NCI Grant No's P50 CA-93683 (to H.T.), R01CA105090-01A1 (to G.P.) and the generous philanthropic donors to the Massachusetts General Hospital.
There is no potential conflict of interest with any author. A preliminary version of this work was presented at the 2006 International Genetic Epidemiology Society Meeting.
2DevCan: Probability of Developing or Dying of Cancer Software, Version 6.1.1. Statistical Research and Applications Branch, National Cancer Institute, 2005. URL http://srab.cancer.gov/devcan.
3Res LAG, Melbert D, Krapcho M, et al. (eds). SEER Cancer Statistics Review, 1975–2005, National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2005/, based on November 2007 SEER data submission, posted to the SEER web site, 2008
4R Development Core Team: A language and environment for statistical computing. R Development Core, Vienna, Austria. 2006 URL http://www.R-project.org
5Res LAG, Melbert D, Krapcho M, et al. (eds). SEER Cancer Statistics Review, 1975–2005, National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2005/, based on November 2007 SEER data submission, posted to the SEER web site, 2008.