|Home | About | Journals | Submit | Contact Us | Français|
One of the challenges to implementing sensitivity analysis for exposure misclassification is the process of specifying the classification proportions (eg, sensitivity and specificity). The specification of these assignments is guided by three sources of information: estimates from validation studies, expert judgment, and numerical constraints given the data. The purpose of this teaching paper is to describe the process of using validation data and expert judgment to adjust a breast cancer odds ratio for misclassification of family breast cancer history. The parameterization of various point estimates and prior distributions for sensitivity and specificity were guided by external validation data and expert judgment. We used both nonprobabilistic and probabilistic sensitivity analyses to investigate the dependence of the odds ratio estimate on the classification error. With our assumptions, a wider range of odds ratios adjusted for family breast cancer history misclassification resulted than portrayed in the conventional frequentist confidence interval.
A standard quantitative analysis of epidemiologic data implicitly assumes the exposure (risk marker, risk factor) classification proportions (eg, sensitivity and specificity) equal 1.0 (ie, perfect classification). For many studies, however, this assumption may not be justified. Epidemiologists are strongly encouraged to incorporate sensitivity analyses into the analysis for these situations.1–9
One of the challenges to implementing sensitivity analysis for exposure misclassification of a binary exposure variable is the process of specifying the sensitivity and specificity values. The difficulty lies in determining which values should be used and explaining why these values were used. The specification of these values is guided by three sources of information: estimates from validation studies, expert judgment, and numerical constraints given the data.10
These three sources of information can be used in both nonprobabilistic and probabilistic (Monte-Carlo) sensitivity analysis. When adjusting for exposure misclassification, nonprobabilistic sensitivity analysis11 uses multiple fixed values for the sensitivity and specificity proportions. In contrast, in probabilistic sensitivity analysis,5,7,11–15 an investigator specifies probability distributions for the classification proportions. Prior probabilities are not specified for the effect measure of interest or the exposure prevalence; thus the analysis corresponds to using noninformative priors for these parameters in Bayesian bias analysis.7,11,16–18
The goal of this teaching paper is to illustrate how to specify values of classification parameters for nonprobabilistic11 and probabilistic sensitivity analyses5,7,11–15 using two of the three sources of information: validation data and expert judgment. We will specify single-point estimates and probability distributions for classification parameters. Then we will use these estimates and distributions to adjust one odds ratio (OR) estimate for possible exposure misclassification.
For many types of cancer, an important predictor of a person’s cancer risk is an established family history of that cancer. While accurate reporting by affected relatives might be expected, in fact, validation studies have shown that self-reported history of cancer in family members is inaccurately reported.19–21
Epidemiologic studies that rely on these self-reports of cancer in family members without adjustment for classification errors can provide inaccurate results and underestimates of the true uncertainty. Adjusting relative-risk estimates for systematic error under such circumstances (eg, exposure misclassification) has been strongly encouraged.1–8,11,22,23
We selected breast cancer as our example because it is both prevalent and because a family history of breast cancer is an established predictor of breast cancer risk. We chose one case-control study24 that provided a 2 × 2 table of first-degree relative’s (FDR’s) breast cancer history and breast cancer risk. There were 316 exposed breast cancer cases, 1567 unexposed cases, 179 exposed noncases, and 1449 unexposed noncases, where exposure was any FDR’s breast cancer history. From these data, the calculated crude OR estimate associating FDR with breast cancer occurrence for women from Los Angeles County, California, was 1.63 (95% confidence limits: 1.34, 1.99). The OR adjusted for confounders was 1.68.
The observed exposure measure was self-reported breast cancer history in any FDR – a parent, sibling or child – by the index subject. “Gold standard” measurements used to verify the breast cancer status in FDRs were verbal confirmation by the FDR, medical records, pathology reports, cancer registries, and/or death certificates. While these are labeled “gold standard,” they are themselves likely measured with some error. We defined sensitivity as the proportion of FDRs reported as having breast cancer among those according to the gold-standard measurement, and specificity as the proportion of FDRs not reported as having breast cancer given it was absent from the gold standard measurement at the time of index subject’s interview.
With these criteria, we sought out articles that validated self-reported data on any FDR. Our approach was guided by a 2004 article by Murff and colleagues19 that summarized the results from validation studies that determined the accuracy of self-reported history of cancer in family members for colon, prostate, breast, endometrial, and ovarian cancers. The first author met with a research librarian for search-strategy assistance since medical subject headings change over time. In April 2008, after discussions with a librarian, AMJ performed a database literature search to find English-language articles that provided sensitivity and specificity values for classification of self-reported family breast cancer history. The following medical subject headings from PubMed were used: “sensitivity and specificity”, “breast neoplasms”, “reproducibility of results”, and “medical history taking”. A text-word search for “validation study” as well as the above terms was also performed. Article titles, abstracts, and text were reviewed for inclusion. Reference lists of identified articles were searched to identify additional studies.
We also performed a cited-reference search of the Murff and colleagues19 article to learn whether it was referenced in recently published studies. Studies that determined accuracy (eg, positive-predictive value) of family breast cancer history,25–29 expanded first-degree relatives to include aunts,30 did not distinguish between FDRs and second-degree relatives,31 validated bilateral breast cancer,32 or were a sub-study of a larger included validation study33 were not used. Five publications20,21,34–36 met our criteria.
We assumed the data from the five validation studies (Table 1) to be appropriate for adjusting the OR for misclassification. Using these data, we explored various scenarios for possible classification error. The scenarios involved differential classification error because the validation data (Table 1) indicated the classification processes were differential.
We specified single-point values as scenarios for possible classification proportions. Since Kerber and Slattery34 reported classification proportions for both cases and noncases (Table 1), we considered this validation study as one scenario (scenario 2, Table 2). Then we combined the noncase sensitivity and specificity values from Chang and colleagues20 with the breast cancer case sensitivity and specificity values from Verkooijen and colleagues35 and Ziogas and Anton-Culver36 for scenarios 3 and 4 (Table 2), respectively. Similarly, we combined the noncase classification proportions from Soegaard and colleagues21 with case classification proportions from Verkooijen and colleagues35 and Ziogas and Anton-Culver36 for scenarios 5 and 6 (Table 2), respectively. We also defined scenarios for the lower (scenario 7, Table 2) and upper (scenario 8, Table 2) extreme values from all five studies. Finally, we investigated a scenario within the ranges of validation data (scenario 9, Table 2) and other combinations from the validation data (scenarios 10 and 11, Table 2).
To assign probability distributions to the classification parameters, we examined each column of sensitivity and specificity data in Table 1 for cases and noncases separately. Although we assumed the ranges of validation data to be adequate for our probability distributions, we were not 100% confident in the distributions’ shapes. As a result, we constructed different distribution scenarios to determine the dependence classification error had on the crude OR.
To allow each value within the range an equal probability of occurring, we began by specifying continuous uniform distributions informed by the lower and upper values of the validation data values (scenario 13, Table 3). Since the case and noncase classification proportions each had three values, triangular distributions were then used for both cases and noncases (scenarios 14 and 15, Table 3). That is, we specified triangular distributions using the lower and upper validation data values as the minimum and maximum, respectively, and the middle value (scenario 14) and average value (scenario 15) as the modes for each distribution.
We changed the upper limit to 1.00 (perfect sensitivity and specificity) in scenarios 13–15 (Table 3), because we cannot rule out the possibility that all individuals with and without breast cancer may be correctly classified.
Adjustment for misclassification may result in negative cell frequencies when certain combinations of observed data and classification proportions are used. However, negative cell frequencies are impossible. Therefore, combinations of values yielding negative corrected cell frequencies are impossible and should be excluded from the sensitivity analysis. In our sensitivity analyses, no combinations of values assigned to sensitivity and specificity resulted in adjusted-cell frequencies that were negative. Therefore, no values were excluded within the explored ranges of values.
For each of the 11 scenarios (Table 2), we calculated an OR adjusted for family breast cancer history misclassification (OR adjusted) using the exposure misclassification adjustment methods of Greenland and Lash.11 Briefly, we used the observed cell frequencies of data along with sensitivity and specificity values for cases and noncases (Table 1) to calculate a 2 × 2 table of cell frequencies adjusted for exposure misclassification and an odds ratio adjusted for exposure misclassification (Table 4).
We employed probabilistic sensitivity analysis based on published methods.5,11,37 In short, we used equations in Table 4 to adjust the observed cell frequencies for exposure misclassification and substituted the probability distributions from Table 3 for the sensitivity and specificity values. We also included a correlation11,37 value of 0.80 between the sensitivities for cases and noncases and between the specificities for cases and noncases to prevent extreme differentiality on any particular simulation trial. As a last step, we incorporated random error to obtain an OR estimate adjusted for exposure misclassification and random error. Adjustment for random error requires specification of a random error distribution for the data-generating process.38 We used the following formula, exp In(OR adjusted)–z SE, which assumes that random error is modeled by a standard normal deviate (z) and the standard error (SE) of the original (misclassified) cell frequencies.11,22,23
For each scenario, we graphed a frequency (uncertainty) distribution of the odds ratio adjusted for exposure misclassification only and for exposure misclassification and random error. These frequency distributions are dependent on our assumptions for the classification proportions and random error parameters. We also calculated 95% uncertainty limits by taking the lower 2.5 and upper 97.5 percentiles of the frequency distribution. These percentiles provide the lower and upper limits for the odd ratio adjusted for our beliefs about the relative proportions of the exposure-classification values (ie, uncertainty-analysis-parameter values).8 Crystal Ball (version 7.3; Oracle, Redwood Shores, CA, USA) software was used to run 50,000 simulation trials for the four simulation experiments.
Table 2 presents the results of the nonprobabilistic sensitivity analyses. The OR adjusted for misclassification resulted in a wide range of values, assuming the OR adjusted for misclassification is the true value, our assumptions are correct, and no other systematic errors exist. Some combinations of classification proportions (scenarios 2, 7, and 11, Table 2) gave ORs adjusted for misclassification that were much greater than the crude OR of 1.63, other combinations resulted in ORs between 1 and the crude OR (scenarios 3–6, 8, and 9, Table 2), and one combination produced a protective effect (scenario 10, Table 2). Thus, demonstrating that differential classification error can cause error toward (scenarios 2, 7, and 11, Table 2), away from (scenarios 3–6, 8, and 9), or past the null value of 1 (scenario 10, Table 2).39 Approximately nondifferential misclassification (scenario 9, Table 2) resulted in an OR adjusted for exposure misclassification that was less than the crude value.
The probabilistic sensitivity analyses results are found in Table 5 and Figures 1 and and2.2. The geometric means and medians are greater than the crude OR value of 1.63 for scenarios where classification was imperfect, and over half of the simulation trials resulted in ORs adjusted for exposure misclassification greater than the crude OR. The 95% uncertainty limits are wider than the conventional limits (1.34, 1.99). Compared to the conventional analysis (scenario 12, analysis b, Table 5), the ratio of the upper 95% uncertainty limit to the 95% lower uncertainty limit was largest for the uniform scenario (scenario 13, analysis b, Table 5). Minor changes in the modal values shifted the distribution of ORs adjusted for exposure misclassification further away from the crude OR for scenario 15 compared with scenario 14 because scenario 15 is slightly more differential than scenario 14.
We performed partial sensitivity analyses to adjust a breast cancer OR estimate for misclassification of family breast cancer history. In general, three sources10 of information are used to specify scenarios for sensitivity analysis: validation data (we found existing data in the literature20,21,34–36); expert judgment (we modified ranges of values from the validation studies based on our expert judgment of sensitivity and specificity for family history of breast cancer); and numerical constraints given the data (we were prepared to exclude values assigned to classification proportions that yielded negative cell frequencies). For all sensitivity analyses we further assumed that the OR estimate adjusted for exposure misclassification was not affected by other systematic errors.
We used both nonprobabilistic and probabilistic sensitivity analyses because they are complementary yet imperfect techniques. Since no likelihood (probability) is associated explicitly with each scenario in the nonprobabilistic sensitivity analysis, the results should not necessarily be viewed as having equal probability. The nonprobabilistic sensitivity analyses resulted in a wide range of ORs adjusted for exposure misclassification: from less than 1 to almost six times the crude OR value. Similar results were found using probabilistic sensitivity analyses.
As guided by the literature, classification errors were differential for all scenarios. It is well known that the effect of differential misclassification on study results is unpredictable. Both our nonprobabilistic and probabilistic sensitivity analysis results show the wide range of values that are possible. Importantly, approximately nondifferential misclassification resulted in an OR adjusted for misclassification that was less than the crude (Table 2, scenario 9). Thus, the sensitivity analysis results demonstrate the importance of quantitatively evaluating the effect of differential misclassification. Nevertheless, nondifferential misclassification only biases the expected value of an OR estimate toward the null value under very specific conditions.39
When available, internal validation data from the study of interest are the recommended data to inform the values used for sensitivity analysis, so long as the internal validation study itself was not biased by, for example, selection of subjects into the validation substudy. When such unbiased validation data are available, we specify sampling-error distributions for the classification probabilities observed in the validation substudy. Since we did not have internal validation data for the sensitivities and specificities from the study of interest,24 we could not use this approach.
We were able to find external validation data to inform the values assigned to classification proportions in our sensitivity analyses. The validation data, however, were not generated from the same population as that from the crude OR data. Therefore, these external validation data may not be generalizable across different populations. Further, the classification proportions were not calculated by first-degree relative status (eg, grandmother, sister, and daughter), which may differ by generation. Nonetheless, we know of no existing methodology that incorporates selection forces into the classification proportions for sensitivity analyses.
When only external validation data for the classification proportion estimates are available, it is difficult to know which of these estimates to use. Therefore, we varied our probability distributions by specifying several different distributions. In addition, it is not recommended to pool the results from multiple-validation studies or to use the variance of the pooled result to parameterize a distribution. Instead, it is usually better to use the range of classification proportion values to parameterize a probability distribution (eg, triangular) or to use the range of values to conduct a multidimensional bias analysis. Further, we did not specify a probability distribution for each classification probability reported in external validation studies (a complete sensitivity analysis that takes into account the uncertainty in the classification proportions is the best route for funded analyses). Rather, we used the reported classification proportions to construct one composite probability distribution for each scenario.
The specification of the shape and range of the probability distribution is often difficult in light of internal or external validation data. In this research, we specified one uniform and two triangular distributions out of an infinite number of possibilities. Other probability distributions that can be used include the trapezoidal, logit-normal, logit-logistic, and beta.11,14
When validation data are unavailable or inapplicable, investigators must assign values to the classification parameters based on expert judgment and numerical constraints given the data. This option, while perhaps suboptimal, has two advantages over conventional analyses that ignore quantitative estimates of uncertainty from classification errors. First, it emphasizes the absence of reliable validation data and identifies that absence as a research gap that should be a priority to fill. Second, conventional analyses implicitly treat the classification as perfect, and substituting expert judgment about actual classification errors for this often untenable assumption at least allows a quantitative assessment of the uncertainty arising from these errors.
The authors thank Dr Sander Greenland and the anonymous reviewers for helpful comments on an earlier draft. The authors report no conflicts of interest in this work. This study was supported in part by the Children’s Cancer Research Fund, Minneapolis, MN, USA (to AMJ).