|Home | About | Journals | Submit | Contact Us | Français|
AUTHOR CONTRIBUTIONSConception and design: Wenyi Wang, Sining Chen, Giovanni Parmigiani, Alison P. Klein
Financial support: Ralph H. Hruban, Alison P. Klein
Provision of study materials or patients: Kieran A. Brune, Ralph H. Hruban, Alison P. Klein
Collection and assembly of data: Kieran A. Brune, Alison P. Klein
Data analysis and interpretation: Wenyi Wang, Sining Chen, Giovanni Parmigiani, Alison P. Klein
Manuscript writing: Wenyi Wang, Sining Chen, Kieran A. Brune, Ralph H. Hruban, Giovanni Parmigiani, Alison P. Klein
Final approval of manuscript: Wenyi Wang, Sining Chen, Kieran A. Brune, Ralph H. Hruban, Giovanni Parmigiani, Alison P. Klein.
The rapid fatality of pancreatic cancer is, in large part, the result of an advanced stage of diagnosis for the majority of patients. Identification of individuals at high risk of developing pancreatic cancer is a first step towards the early detection of this disease. Individuals who may harbor a major pancreatic cancer susceptibility gene are one such high-risk group. The goal of this study was to develop and validate PancPRO, a Mendelian model for pancreatic cancer risk prediction in individuals with familial pancreatic cancer, to identify high-risk individuals.
PancPRO was built by extending the Bayesian modeling framework developed for BRCAPRO, trained using published data, and validated using independent prospective data on 961 families enrolled onto the National Familial Pancreas Tumor Registry, including 26 individuals who developed incident pancreatic cancer during follow-up.
We developed a risk prediction model, PancPRO, and free software for the estimation of pancreatic cancer susceptibility gene carrier probabilities and absolute pancreatic cancer risk. Model validation demonstrated an observed to predicted pancreatic cancer ratio of 0.83 (95% CI, 0.52 to 1.20) and high discriminatory ability, with an area under the receiver operating characteristic curve of 0.75 (95% CI, 0.68 to 0.81) for PancPRO.
PancPRO is the first risk prediction model for pancreatic cancer. When we validated our model using the largest registry of familial pancreatic cancer, our model provided accurate risk assessment. Our findings highlight the importance of detailed family history for clinical cancer risk assessment and demonstrate that accurate genetic risk assessment is possible even when the causative genes are not known.
Pancreatic cancer is the fourth leading cause of cancer death in the United States. In 2007, an estimated 37,170 new patients will be diagnosed, and 33,370 individuals will die from pancreatic cancer.1 Pancreatic cancer is attributed to smoking in 25% of patients, and 7% to 10% of patients have a family history of pancreatic cancer.2 Genetic factors, including germline mutations in the p16/CDKN2A,3 PRSS1,4,5 BRCA2,6–8 and STK119 genes, increase the risk. For example, BRCA2 germline mutations are found in 12% to 17% patients with familial pancreatic cancer.7,8 Combined, these known genetic factors account for less than 20% of the observed familial aggregation, suggesting that additional susceptibility genes may exist. Segregation models support a dominant susceptibility gene carried by approximately seven of 1,000 individuals.10
In addition to gene identification studies, several screening trials aimed at identifying early pancreatic neoplasia are underway.11–18 For example, Canto et al13 screened 72 asymptomatic individuals with a strong family history of pancreatic cancer using endoscopic ultrasound and computed tomography scan and identified seven patients (10%) with pathologically confirmed pancreatic neoplasia. As screening programs such as this are developed, the identification of the appropriate population to screen is critical because the positive predictive value of a screening test is dependent on the disease frequency in the population screened.
Risk prediction models, such as the Gail model19 and BRCAPRO,20 have been used to help determine appropriate target patient populations. In the absence of risk modeling for pancreatic cancer, early detection studies have relied on crude predictors of risk, such as counts of affected relatives without regard for the number and age of unaffected family members. Accurate risk prediction models can also serve as useful surrogates for clinical genetic testing,21–23 as demonstrated by the Claus model21 before the availability of BRCA1/2 testing.
We developed PancPRO, a Mendelian risk prediction tool for pancreatic cancer. PancPRO is built on the BRCAPRO24 framework, the predictive performance of which is well documented.20,25 Using family history of pancreatic cancer, PancPRO estimates the probability that an individual carries a pancreatic cancer susceptibility gene and the future probability that an asymptomatic individual will develop pancreatic cancer. We validated the PancPRO model using data on 961 families enrolled onto the National Familial Pancreatic Tumor Registry (NFPTR). In this article, we detail the development and validation of PancPRO and demonstrate that PancPRO provides far more accurate risk assessment than summary counts of affected relatives, highlighting the clinical utility of this model and the utility of obtaining accurate family histories.
The PancPRO model is built using a general Mendelian risk prediction approach and is grounded in human genetics and probability theory. This approach has been previously applied successfully to developing the BRCAPRO24 model. The underlying theory and its open-source software implementation are described in Chen et al.26 The application of this approach to this project can be summarized in the following four major steps: (1) specification of a genetic model for susceptibility mutations; (2) estimation of penetrance and prevalence associated with mutations; (3) use of Bayes’ rule to convert these into the probability of genotype given phenotypes; and (4) derivation of cancer risk from a weighted average of net penetrances for mutation carriers and noncarriers, with estimated carrier probabilities as weights. We extended the Chen et al26 algorithm to allow for half-siblings and extended pedigrees via the Elston-Stewart algorithm.27 Estimates from our previously reported segregation analysis of 287 families were used to define the underlying genetic model of autosomal dominant inheritance and to specify a susceptibility allele frequency of 0.0034 and a penetrance, by age 70 years, of 19% for mutation carriers and 0.3% for others10 (for details, see Klein et al10). This estimate of penetrance is the net cumulative probability, that is, the hypothetical distribution that would be obtained if all other death hazards were removed. Clinically, we wish to provide the absolute cumulative probability, which accounts for the chance of dying of other causes. This is also the probability that is directly observed when validating the model. We converted net to absolute probabilities using 2000 to 2002 Surveillance, Epidemiology, and End Results mortality estimates.28
PancPRO provides the probability that an individual (ie, counselee) carries a deleterious mutation in a pancreatic cancer susceptibility gene and, if unaffected, the counselee’s future probability of developing pancreatic cancer. Probability estimates are obtained from information on family history, including, for the counselee and each of his or her relatives, exact relation to the counselee, pancreatic cancer diagnosis (yes or no), age at diagnosis, and current age or age at last follow-up if unaffected. Software is open source and, on publication, will be made available free of charge via both the BayesMendel26 risk prediction package and the CancerGene29 counseling package.
We used independent data from the NFPTR at Johns Hopkins. This study was reviewed and approved by the Institutional Review Board of the Johns Hopkins Medical Institutions. Informed consent was obtained from study participants. The NFPTR is one of the largest registries of familial pancreatic cancer in the world and recruits patients from the following two sources: patients treated for pancreatic cancer at the Johns Hopkins Hospital are invited (by in-person visit or mail) to participate; and individuals with a personal or family history of pancreatic cancer are either self-referred through the Internet (http://pathology.jhu.edu/pancreas) or referred by their health care provider. Data collection methods have been described in detail elsewhere.10,30 In brief, information is obtained via a questionnaire on the patient’s first-degree relatives, grandparents, aunts, and uncles including dates of birth, dates of death (if deceased) or dates of last contact if living, and any cancer diagnoses. Questionnaires are completed by the patient or by their proxy if the patient is deceased. Responses are clarified via telephone if necessary. Enrolled families are contacted annually by mail to update health information on family members included in the questionnaire. Recruitment began on January 1, 1994 and has grown each year. Follow-up time was censored on January 1, 2005, 1 month after the annual follow-up cards were mailed to all families, or at the date of last contact for the family.
A total of 6,134 individuals in 961 families met the following inclusion criteria for the validation study: alive and clinically free of pancreatic cancer at baseline; prospective follow-up data available; and not included in our prior segregation analysis used for model building. Twenty-six individuals in 24 families developed incident pancreatic cancer. Of these 26 patients, all were diagnosed based on symptoms and not as a result of screening protocols. Individuals who were under evaluation for pancreatic cancer at the time of enrollment and who were found to have disease are considered baseline patient cases. Family members reported to have pancreatic cancer but with unknown age at onset (80 of 1,821 participants) were assumed to have an age at onset of 75 years, which is a conservative estimate given that the median age at onset in the United States is 72 years.
To confirm cancer diagnoses, we obtained pathology specimens (slides, blocks, and reports), medical records, and/or death records, when available. Diagnoses were confirmed in 64% of the prevalent patient cases and 77% of the incident patient cases; for the remainder, records were unavailable.
At baseline, each individual’s estimated probability of developing cancer during follow-up was calculated using PancPRO. Follow-up spanned from the date of enrollment to the potential end of follow-up for each individual, which was defined as the end of follow-up for the family unit regardless of the disease or vital status of individual members.
Model calibration was evaluated by comparing the expected number of incident cases to the observed number of incident cases, and model discrimination was evaluated using the receiver operating characteristic (ROC) curve. Each individual’s estimated pancreatic cancer probability over the follow-up interval was divided by the follow-up time to obtain an average annual risk. Bootstrap 95% CIs were calculated31 by resampling families to account for correlation between family members. For each bootstrap sample, we calculated the area under the ROC curve (AUC) for both PancPRO and baseline family history and tested the difference in mean AUC. The 95% CIs for the estimated ROC were assessed by fixing the true/false-positive rate and then determining the bootstrap variability of the corresponding false/true-positive rate. We chose false-positive rates of 45% and 10% and true-positive rates of 88% and 23%, obtaining four sets of intervals. All analyses were performed using R software.32
Additionally, a Markov Chain Monte Carlo procedure was developed to test the robustness of PancPRO to the assumed allele frequency estimate and correlation between family members. This procedure yielded similar results (data not shown).
Table 1 illustrates how PancPRO provides high-resolution information to support clinical decisions by presenting risk estimates for several variations of the hypothetical pedigree in Figure 1. General population estimates of pancreatic cancer risk from Surveillance, Epidemiology, and End Results28 are provided for comparison. In the pedigree, the counselee’s estimated carrier probability is 15%, and her probability of developing pancreatic cancer between ages 60 and 75 years is 2.9%, which is approximately six times greater than the general population risk of 0.49% for an individual of the same age. As shown, the counselee’s father is unaffected at age 87 years, suggesting that he may not have a pancreatic cancer gene. Alternatively, if the father’s disease status and age is unknown, the carrier probability increases to 20% and cancer risk increases to 3.5% because the father is a potential carrier. The additional scenarios demonstrate how carrier probabilities and, therefore, cancer risks increase with a third diagnosis of pancreatic cancer in the family (ie, sister or daughter affected) or when a closer relative is diagnosed (ie, father v paternal aunt). In particular, PancPRO gives estimates of 89% for carrier probability and 15% for cancer risk when the counselee’s daughter is affected at age 32 years.
Table 2 presents an overview of the validation population, showing the entire at-risk population as well as the incident pancreatic cancer patients stratified by baseline family history. The mean age at onset among incident pancreatic cancer patients was 72.3 years (standard deviation [SD], 10.0 years), which is similar to the general US population. On average, at-risk individuals were observed for 3.1 years (SD, 2.3 years), and incident patients were diagnosed 2.3 years (SD, 2.2 years) after enrollment. More than 92% of the study population and 96% of the incident patients are non-Hispanic white, reflecting registry referral patterns.
Figure 2 displays risk predictions stratified by number of affected relatives at baseline, demonstrating two points. First, the estimated risk of developing cancer during follow-up was higher among individuals who developed pancreatic cancer than for the individuals who remained disease free (Kolmogorov-Smirnov P < .01). Second, in each strata, individuals who developed pancreatic cancer tended to have higher predicted probabilities than individuals who remained disease free (P = .01, .09, .006, and .005 for one, two, three, and > three affected relatives at baseline, respectively). This indicates that PancPRO has the ability to identify likely patient cases, even within strata, as a result of the added resolution provided by pedigree structure and detailed age information. Also, across strata, the estimated risk remained relatively consistent for disease-free individuals, but estimated risk increased for individuals who developed disease.
We also stratified analysis by age at enrollment and found significantly different distributions in risk between patient cases and non–patient cases in the group ≤ 65 years of age (P = .04) but not for the other age groups (P = .15 and .94 for 66 to 75 years and > 75 years, respectively). The decrease in discrimination with increasing age occurs because the sporadic pancreatic cancer rate increases with age. Because our model predicts inherited pancreatic cancer, some of these older patient cases with sporadic disease would not be identified at higher than population risk by our model.
Quantification of whether predicted risk discriminates between the individuals who developed disease and the disease-free individuals is shown in a ROC curve (Fig 3), which displays the sensitivity and specificity for each cutoff of the estimated probability of developing cancer. The AUC is 0.75 (95% CI, 0.68 to 0.81). Also shown is the AUC based on the number of affected family members. This AUC is 0.61 (95% CI, 0.51 to 0.71), which is significantly less than PancPRO (P < .01).
PancPRO calibration yielded an overall observed-to-expected ratio of 0.83 (95% CI, 0.52 to 1.20). With three risk strata for the average annual risk (bottom 50%, low risk; 50% to 75%, moderate risk; and top 25%, high risk), the calibration was 0.80 (2 observed/2.50 expected; 95% CI, 0.0 to 2.10) for the low-risk group, 1.59 (11 observed/6.91 expected; 95% CI, 0.70 to 2.66) for the moderate-risk group, and 0.60 (13 observed/21.8 expected; 95% CI, 0.30 to 0.94) for the high-risk group.
PancPRO is the first risk prediction model for familial pancreatic cancer and provides mutation carrier probability and absolute risk for a specified age interval. Our validation indicated that PancPRO provided accurate risk assessment, discriminating between individuals with and without incident pancreatic cancers. Although we have previously reported risk ratios of incident pancreatic cancer as a function of the number of affected family members,30,33 PancPRO can further discriminate between individuals at higher and lower risk with the same number of affected family members. PancPRO does this by using full pedigree data (information on affected and unaffected family members) and age of family members combined with knowledge of the genetic transmission of pancreatic cancer. However, because of our validation timeframe, we present single-year risk estimates. These small annual risks can translate into a substantial risk of developing a fatal cancer over an individual’s lifetime.
A number of factors, including gene carrier status, environmental exposures, and chance, contribute to the development of disease and the age at which disease develops. Therefore, prognostic models and carrier models have less inherent variability than disease onset prediction models. Yet PancPRO performs comparably to prognostic34 and carrier models,25,35 as well as the most successful disease onset prediction models.36 The inclusion of complete family history data in PancPRO may explain the strength of this model. Although the NFPTR represents a select study population, our validation used prospective data; therefore, the factors that motivate families to enter the registry should not impact the results of our validation. Furthermore, NFPTR participants often learn of the registry while seeking information on hereditary pancreatic cancer; therefore, they are likely similar to families seeking clinical risk assessment.
Our validation confirms that this is a well-formed model likely to have a positive impact on clinical practice.37 However, our ability to assess calibration, especially in subgroups, is limited by the rarity of pancreatic cancer, such that only 26 incident patient cases were observed in this large high-risk cohort. It is worth noting that the caveats that apply to the use of risk prediction models in general38 are also relevant here. When a model is used for selecting a subset of individuals or families by setting a threshold on risk, there is a possibility of both false negatives and false positives. The appropriate thresholds for clinical interventions should consider explicitly their associated risks and benefits, which may change with an individual’s personal values and circumstances. Although risk models can efficiently and objectively summarize relevant information for patients, decision making should take place in concert with a health professional.39
PancPRO estimates do not include patient-specific SEs. Although technically possible,22 risk prediction models rarely provide patient-specific SEs because a common difficulty within the risk prediction field is properly communicating these additional uncertainties to patients and health professionals.
Although clinical genetic testing for pancreatic cancer is currently limited, genetic counseling can still be of value.40 PancPRO can form the basis for cancer risk counseling and can guide the design of screening trials for early pancreatic cancer detection in asymptomatic individuals.14,15 For example, Canto et al12,13 successfully used endoscopic ultrasound in the screening of asymptomatic individuals with a family history of pancreatic cancer. In fact, our model offered the greatest discrimination among individuals aged 65 years and younger at baseline, the group most likely to undergo and benefit from early detection screening. Because PancPRO provides a quantitative assessment of risk, it can contribute to defining the high-risk population that would benefit most from investigational screening techniques.
PancPRO estimates risk using data on pancreatic cancer only. The validation data set did not exclude families with known genetic syndromes, nor did the segregation analysis from which the model estimates were derived. Aside from a handful of the validation families, clinical genetic testing for BRCA2 or p16 mutations was not performed, and only two of the 961 families presented with a family history or symptoms indicative of familial atypical multiple mole melanoma or Peutz-Jeghers syndrome. Collection of a complete family history, such as the data needed for the model, may alert clinicians to the presence of other cancer syndromes requiring additional investigation.
A natural next step in expanding PancPRO is to separately model the effects of mutations in BRCA2 or other known genes. Currently, precise estimates of the penetrance of pancreatic cancer in BRCA2 mutation carriers are not available, partly because most BRCA2 studies select for individuals who developed breast and/or ovarian cancer at an early age. Conversely, the NFPTR may under-represent BRCA2 families in which there is an excess of early-onset breast cancer because these families may already be included as part of high-risk breast cancer studies. Because the penetrance estimate derived in the segregation model is an approximation based on the combined penetrance of all pancreatic cancer susceptibility loci, we anticipate improvements in model accuracy as we identify and incorporate the direct effects of these loci. Similarly, risk factors, such as cigarette exposure, history of pancreatitis, and diabetes mellitus, may be included in PancPRO once estimates of risk are available for both gene carriers and nongene carriers (ie, does smoking cause a two- to three-fold increase in pancreatic cancer risk in gene carriers as it does in sporadic patients, or is the genetic effect so powerful that smoking has a much more limited role). Currently, PancPRO models a major gene responsible for strong familial clustering to distinguish between likely carriers and noncarriers of this major gene as a first but very important step in risk assessment.
In summary, we developed PancPRO, the first risk prediction model for pancreatic cancer, and successfully validated it using prospective (incident) pancreatic cancer data from one of the largest registries of familial pancreatic cancer. Our study highlights how detailed family history improves risk prediction. We hope PancPRO will be a useful tool to identify high-risk individuals for ongoing and future early detection trials.
Supported in part by the Specialized Programs of Research Excellence in Gastrointestinal Cancer Grant No. CA62924 from the National Cancer Institute, the Michael Rolfe Foundation, and Grant No. R01CA105090-01A1.
Presented in part at the 14th Annual Meeting of the International Genetic Epidemiology Society, October 23–24, 2005, Park City, UT; and the 55th Annual Meeting of the American Society of Human Genetics, October 26–29, 2005, Salt Lake City, UT.
AUTHORS’ DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST The authors indicated no potential conflicts of interest.