|Home | About | Journals | Submit | Contact Us | Français|
Bayesian data mining methods have been used to evaluate drug safety signals from adverse event reporting systems and allow for evaluation of multiple endpoints that are not pre-specified. Their adaptation for use with longitudinal data such as administrative claims has not been previously evaluated or validated.
In this pilot study, we evaluated the feasibility of adapting data mining methods using the empirical Bayes Multi-Item Gamma Poisson Shrinker (MGPS) algorithm to longitudinal administrative claims data. The Medicare Current Beneficiary Survey (MCBS) was used to identify a cohort of Medicare enrollees who were exposed to cyclooxygenase selective (coxib) or non-selective non-steroidal anti-inflammatory drugs (NS-NSAIDs) from 1999-2003. Empirical Bayes MGPS algorithm was used to simultaneously evaluate 259 outcomes associated with current use of coxibs vs. NS-NSAIDs while adjusting for key covariates and multiple comparisons. For comparison, a parallel analysis used traditional epidemiologic methods to evaluate the relationship between coxib vs. NS-NSAID use and acute myocardial infarction (AMI), with the goal of establishing the concurrent validity of the data mining approach.
Among 9431 Medicare beneficiaries using NSAIDs and considering all 259 possible outcomes, empirical Bayes MGPS identified an association between current celecoxib use and AMI (Empirical Bayes Geometric Mean ratio 1.91) but not other outcomes. Rofecoxib use was associated with acute cerebrovascular events (EBGM ratio 1.85) and several other diagnoses that likely represented indications for the drug. Results from the analyses using traditional epidemiologic methods were similar and indicated that the data mining results were valid.
Bayesian data mining methods appear useful to evaluate drug safety using administrative data. Further work will be needed to extend these findings to different types of drug exposures and to other claims databases.
The assessment of pharmaceutical safety after product licensure is of great interest to clinicians, patients, pharmaceutical companies, regulatory agencies, and policymakers. Recent and high-profile examples of drug withdrawals after recognition of safety problems have highlighted existing deficiencies in the current mechanisms by which medication safety is evaluated. The phase 3 studies required for drug approval are rarely powered to detect uncommon adverse events and lack generalizability with respect to the majority of people who eventually receive these medications. Unfortunately, relatively few tools are available to provide rapid detection of previously unrecognized or underappreciated safety signals.
In the U.S., the Adverse Event Reporting System (AERS) is an important mechanism by which hitherto unknown safety concerns are recognized. However, analyses of the voluntary reports submitted through this mechanism have a number of limitations. These include under-reporting, distortion due to reporting trends, biases such as the Weber effect (1), and lack of information on the total number of exposed persons, all of which preclude calculation of valid incidence rates. Despite these limitations, the AERS system is a useful resource that has added substantially to the evaluation of drug safety. There are various mechanisms by which AERS data can be analyzed, including qualitative review and more quantitative methods such as proportional reporting ratios (PRRs) and empirical Bayes methods. These quantitative disproportionality methods compare n, the number of observed reports of a drug-adverse event combination, with e, the number expected under an assumption of independence between drugs and events in the database.
Analyses that use an Empirical Bayes technique, the Multi-item Gamma Poisson Shrinkage (MGPS) algorithm, have advantages over alternative methods such as PRR. First, PRR has the disadvantage of being hard to interpret without simultaneously considering the significance of an associated chi-squared statistic; there is no chi-squared test used in conjunction with MGPS. Additionally, MGPS ‘shrinks’ the values of the Bayesian observed-to-expected ratios toward the null hypothesis value of 1 by an amount that depends on their statistical variability. MGPS produces an empirical Bayes geometric mean (EBGM) estimate with a surrounding confidence interval ([EB05, EB95]), which is designed to be resistant to the post-hoc selection fallacy caused by looking at many highly variable statistics. This results in the convenient property of being able to sort many different drug—event combinations in a single dimension for rankings and comparisons. A single ratio incorporates information both about the value of n/e and its variability. Finally, extensions of the method can also adjust for multiple covariates.
Another important statistical issue is the problem of multiple comparisons. When considering the many computed n/e values in a large database, it is natural to focus on the largest ratios. This is an example of post hoc selection, which is likely to select ratios biased toward large values, based on counts that happen to be large because of sampling variation. Bayesian shrinkage methods are designed to correct for this bias by shrinking estimates toward a prior distribution. This prior distribution is estimated from the ensemble of all (n,e) pairs. As an example of this issue, consider a disproportionality analysis of one drug—event combination having (n=3, e=0.03, n/e=100) with that of another combination having (n=50, e=5, n/e=10). Both ratios are likely to be statistically larger than their “true values”; the computation of how much to shrink their estimates depends on fitting a Bayesian model to the entire set of (n, e) pairs in the database. Depending on the results of the fit, it might be that the first estimate shrinks from 100 down to 5, whereas the more reliable second estimate only shrinks from 10 to 9 (2). Shrinkage will be the same for all pairs with the same n and e. Finally, MGPS can evaluate all outcomes simultaneously without requiring any to be specified in advance. Semi-automated software programs have been developed that provide rapid and visual implementation of this approach and provide an adjusted summary relative risk estimate.
To date, use of Bayesian data mining methods has largely been restricted to evaluation of adverse event reports and clinical trial results. This type of data can be thought of as ‘packet’ data that does not place much importance on the element of time. An extension of these methods should theoretically be able to incorporate time-dependent exposures and varying durations of times at risk across patients, but this possibility has largely been unexplored, and the implementation of this approach has not been well-characterized.
We therefore conducted a pilot study with the primary goal of evaluating the feasibility and validity of adapting Bayesian data mining methods to analysis of longitudinal administrative claims data. As a framework within which to do this, we studied outcomes associated with cyclooxygenase-2 selective (coxib) non-steroidal anti-inflammatory drugs (NSAIDs) compared to use of non-selective non-steroidal anti-inflammatory drugs (NS-NSAID). For comparison with the results of the data mining analyses, we then conducted a parallel study using traditional epidemiologic methods to evaluate the consistency of results between the two approaches and to assess the likely validity of adapting Bayesian data mining algorithms for use with longitudinal data.
We obtained linked survey information, medical claims, and medication use data from the Medicare Current Beneficiary Survey (MCBS) for the years 1999-2003. We identified current NSAID exposure using the MCBS medication data and all medical events using the linked Medicare claims. After mapping the administrative claims data to a classification system that allow simultaneous consideration of all outcome events, we used Bayesian data mining methods using the empirical Bayes MGPS algorithm developed by DuMouchel (3-7) to evaluate the relationship between current exposure to celecoxib or rofecoxib compared to NS-NSAIDs and the occurrence of all possible outcome events. The MGPS algorithm can adjust for important factors, such as age, gender, and comorbidity score, that might confound these relationships and also can protect against inflated estimates of statistical significance resulting from multiple comparisons. As a separate but parallel analysis, we used traditional epidemiologic methods to evaluate the validity of the MGPS approach.
After institutional review board approval from the University of Alabama at Birmingham, we obtained the de-identified Cost and Use Files from the MCBS from 1999-2003. The MCBS is a rotating panel survey of institutionalized and community-dwelling Medicare beneficiaries that collects detailed information on demographics, insurance coverage, comorbidities, medical events, costs, and medication use. Most individuals remain in the panel three years. Data are collected via in-person interviews that occur in the participant’s home every four months. Medicare claims for each beneficiary are linked and provide a unique amalgam between survey data and insurance claims. For community dwelling beneficiaries, and at every interview, the MCBS obtains information on the names of the medications from medication bottles that the beneficiary is asked to produce and on the number of refills of each medication since the last MCBS interview. For institutionalized beneficiaries, medication information was obtained by once monthly review of medical records.
We identified all NSAID usage for each person during the study period. Persons who never used NSAIDs were excluded from analysis given a previous observation that non-NSAID users have a higher risk for mortality than NSAID users, likely due to channeling of sicker patients away from NSAIDs (8). NSAIDs were grouped into three unique categories as celecoxib, rofecoxib, and NS-NSAIDs. Because valdecoxib usage was minimal during the study period, we did not compute separate risk estimates for valdecoxib. NSAID exposure was defined as current use of a NSAID in the current or prior month before each outcome event. The public use MCBS data files were supplemented with more specific medication data obtained directly from CMS to allow for greater precision in defining current NSAID exposure.
We identified events of interest using the linked Medicare claims data, which contain information on diagnoses coded using the International Disease Classification, 9th revision (ICD-9) system. Because no particular outcome diagnosis was pre-specified as being of particular interest for the data mining analysis, we needed a mechanism by which to group all possible ICD-9 codes into a manageable number of unique categories. Use of 5 digit ICD-9 codes to represent different events is statistically inefficient given that many clinically similar events have different 5 digit ICD-9 codes. Combining 5-digit codes into higher-level categories, such as to 4 digit or 3 digit groups, does not fully address the problem because these higher-level categories sometimes group dissimilar types of events.
For that reason, we used the Clinical Classifications Software (CCS) developed by the Agency for Healthcare Research and Quality (AHRQ) to classify outcome events into 259 unique groups. The ICD-9 codes from each claim were mapped to the corresponding CCS groups, were identified as being the primary or a secondary diagnosis, and were classified as coming from an inpatient or outpatient setting. For the purposes of this analysis, we considered only those events that were primary diagnoses recorded in claims from an inpatient setting.
We used a simple technique to convert the longitudinal record of each patient to a set of pseudo adverse event reports. Each month of observation for each patient was viewed as a separate report, the report consisting of a list of medical events (unique CCS group codes meeting the inpatient and primary diagnosis requirements) and a list of drugs (recorded as exposed in either the current or previous month). Unlike the spontaneous reporting scenario, months for which no medical event occurred (or for which no drug exposure was recorded for the corresponding two month window) for a patient still generated reports. Thus, the patients in this study generated a total of 243,916 monthly reports having a total of 7,037 inpatient primary diagnosis CCS group code events. The association between celecoxib and rofecoxib use compared to current NS-NSAID use was evaluated using the empirical Bayes geometric mean (EBGM) ratios produced by the MGPS method. The EBGM ratios are the estimated number compared to the expected number of adverse events. The expected value e is computed as the number of reports having the event in question multiplied by the overall proportion of reports having the exposure in question. If covariates are used, this computation is repeated separately for each covariate stratum, and the results summed across strata. EBGMs are smoothed (shrunk toward 1 and adjusted for covariates) values of the ratios n/e mentioned above. In the context of spontaneous report databases, these are called relative reporting ratios and are used for exploratory signaling of potential causal relationships. In order to evaluate the statistical significance of these EBGM ratios, we calculated the 5th percentile of the posterior distribution of the EBGM ratios for celecoxib and rofecoxib (EB05) and the 95th percentile of the posterior distribution of the EBGM ratios for NS-NSAIDs (EB95). If the lower and upper bounds of these respective confidence intervals are non-overlapping, the ratio of the 5th percentile of celecoxib/rofexoxib to the 95th percentile of NS-NSAID will exceed 1.00. We described this as the EB05/95 ratio and consider it as an informal criterion for judging that two reporting ratios were different if their respective confidence intervals did not overlap. We adjusted for covariates of interest (i.e. age, gender, calendar year, and Charlson comorbidity score) by using them to compute the ‘e’ in the n/e ratio. The database of pseudo-reports was then analyzed as if it were a spontaneous database of reports such as AERS, using the WebVDME software (Lincoln Technologies, Waltham, MA). In order to preserve the semi-automated nature of the procedure and its software implementation, observation time was not censored after the occurrence of any particular event. This allowed us to consider multiple outcomes simultaneously rather than having to censor observation time based upon the occurrence of pre-specified events.
We next conducted a parallel analysis to establish the concurrent validity of the MGPS approach compared to traditional epidemiologic and statistical methods. Cox proportional hazards models were used to estimate the hazards ratios of AMI, comparing coxib and NS-NSAIDS and adjusting for potential confounding variables at baseline. Both coxib and NS-NSAIDS were time-dependent variables. We evaluated confounding effects of age, gender, race, education, income, body mass index, tobacco use, and comorbidities. Comorbidities were summarized using the Charlson comorbidity index (9). Confounding effects were adjusted by including the potential confounders in the final model if a >20% change in the estimated regression coefficients of coxib was observed. Although all potential confounders were evaluated based on their ability to modify the exposure-outcome relationship, only the Charlson comorbidity index had this property, so only age and Charlson comorbidity index was included in the adjusted models. We performed this analysis using the CCS outcome definition, and we repeated it using a validated claims-based definition for AMI shown to have excellent positive predictive value in identifying confirmed events compared to a gold standard of medical record review (10). Because we pre-specified the outcome, we censored observation time at the first occurrence of AMI in the analysis using traditional methods.
Characteristics of the Medicare beneficiaries that used NSAIDs at any time from 1999-2003 are presented in Table 1. The mean age of the cohort was 72 years, and a majority was Caucasian. Approximately one-quarter of the cohort described themselves as current smokers. The prevalence of cardiovascular risk factors including coronary artery disease, hypertension, diabetes, and hyperlipidemia was ten to twenty-four percent. One-third of the cohort used celecoxib during the study period, and slightly more than half used a NS-NSAID.
The results from the MGPS approach for celecoxib are shown in Table 2. As none of the 259 outcomes of interest was prespecified, for the sake of brevity and transparency we have displayed all those outcomes that approach or exceed conventional levels of statistical significance using an EB05/95 ratio threshold of > 0.85. As shown, the only outcome ‘significantly’ associated with current use of celecoxib was acute myocardial infarction (AMI) (CCS group 100). Other diagnoses for which ratios were > 0.85 included osteoarthritis and rehabilitation-related, followed by acute cerebrovascular disease and coronary atherosclerosis and other heart diseases.
For rofecoxib, rehabilitation, device-related complications, and osteoarthritis were the most significant events associated with current rofecoxib use (Table 3). Also significant was the result for events related to acute cerebrovascular disease. Additional events that approached statistical significance included non-specific chest pain and cardiac dysrhythmias. Based on clinical interest, we also took note of the MGPS-estimated association between rofecoxib and AMI. Based on 8 AMI events in the rofecoxib users, the EBGM risk ratio was 1.08, indicating a slightly increased risk among the rofecoxib users. The EB05/95 ratio was 0.50 and non-significant, indicating substantial overlap in the corresponding confidence intervals.
In our parallel analyses using traditional epidemiologic methods, we specifically focused on AMI as the outcome of interest in order to compare with the MGPS result of a significant association with celecoxib use and no significant association with rofecoxib. Table 4 describes the relationship between current celecoxib and rofecoxib referent to current use of a traditional NS-NSAID use and the risk of AMI. In both the age and age + comorbidity adjusted analysis, there was a significant association between current use of celecoxib and AMI. Results were minimally changed when we re-defined AMIs using the validated claims algorithm (10). In contrast, there were no significant associations between rofecoxib use and AMI. Hazard ratios from these analyses were similar to the corresponding EBGM ratios previously described.
In this pilot study, we evaluated the feasibility of applying Bayesian data mining methods to longitudinal survey and administrative claims data. These methods have been most commonly applied to spontaneous adverse event reports such as those from the AERS database. As the main endpoint of this project, we successfully adapted these methods for use with administrative claims data and demonstrated concurrent validity with results from a separate analysis conducted using traditional epidemiologic and statistical methods. In contrast to the usual methods of analysis for observational data, however, the data mining approach did not require us to prespecify the outcomes of interest and identified several important associations between coxib use and cardiovascular and cerebrovascular events out of a total of 259 potential outcomes.
As anticipated from prior research, the data mining analysis identified several outcomes associated with celecoxib and rofecoxib. For celecoxib, the only association that exceeded an EB05/95 ratio of 1.0 was events associated with an AMI diagnosis. Other outcomes for which associations were of borderline significance included acute cerebrovascular disease and coronary atherosclerosis diagnoses, as well as osteoarthritis and rehabilitation care. Similar patterns were observed for rofecoxib, although the strong association with AMI was not observed. As is known from analysis of adverse event reports, significant associations resulting from data mining methods will not only reflect potential safety concerns but also disease indications for which the drug is prescribed. For that reason, content knowledge must be applied in order to differentiate these two possibilities. As we observed in our study, all significant and near-significant results either were for indications for the drug (e.g. osteoarthritis, rehabilitation diagnoses) or for ischemic vascular events.
Recognizing that the actual findings from this feasibility study are of somewhat lesser interest than its methodologic focus, they nevertheless deserve mention. Our observation that celecoxib was associated with an approximately two-fold increased risk of AMI has been observed in some (11-13) but not all (14, 15) studies. Although we observed a significant association between rofecoxib and stroke events, as has been found previously (16, 17), we did not observe a significantly increased risk of AMI. This study was likely underpowered to establish a significant relationship.
The principal strength of our study lies in its uniqueness, as we are not aware of prior reports that have demonstrated success in adaptation of Bayesian data mining methods to longitudinal claims data. The promise that these methods could be applied in an automated way to perform routine signal detection to identify unrecognized adverse drug events soon after product launch using administrative data would be a substantial advance and would fill an important gap in postmarketing drug safety surveillance. We do not view these methods as ever replacing welldesigned postmarketing RCTs or observational studies. Rather, we believe them to be complementary to traditional methods by providing a tool adapted to longitudinal data sources such as claims data by which to identify safety signals that need to be pursued.
Despite our initial results from this pilot project, we are cautious with respect to the broad applicability of these methods without further research regarding their validity, precision, and power. In evaluating such efforts it is always desirable to have a gold standard set of results to which data mining analyses can be compared. We intentionally chose to evaluate coxib safety given the recent furor and numerous studies published on this topic. However, given the lack of consensus on even this subject, well designed simulation studies, or a pooled analysis of randomized controlled trials data where randomization controls for both measured and unmeasured confounders, may be the optimal next steps to ensure the robustness of our adaptation of data mining methods.
We used the AHRQ’s CCS classification system to group similar ICD9 codes together into 259 unique groups. The data mining analysis evaluated all of these outcomes simultaneously and did not require foreknowledge of which of the 259 groups were of particular interest. However, not all important events have specific enough ICD-9 codes to be useful for a claims-based analysis, much less fit into a well-defined CCS category. Moreover, the events of greatest interest in our analysis, and those most likely to represent true safety problems, were AMI and acute cerebrovascular disease. These conditions are relatively homogeneous with respect to the ICD-9 codes included in them and appeared as the primary diagnosis from a hospitalization. Different outcomes that are included in a more heterogeneous event group may be masked if they are rare compared to somewhat dissimilar events also included in that group. Nevertheless, it is possible to use a different event classification system that includes more or different groups; up to 750-1000 event groups is likely to be an upper limit, depending on the size of the data source.
We acknowledge a number of limitations of this study. First, neither pharmacy data from an administrative claims database, nor medication information collected during in-home interviews from the MCBS, accurately reflect actual medication taking behavior or precisely identify the start and end dates of drug use, and we lacked information on drug dose. Additionally, the sample size of the MCBS is relatively small, and this may have limited our ability to detect some important associations. Although we evaluated a number of potential confounders, and the MCBS collects data on covariates not routinely found in claims databases (e.g. race, BMI, smoking status, education), we recognize the possibility for residual confounding. However, of greater interest than our ability to answer content-related questions were our concordant findings between the Bayesian compared to traditional epidemiologic methods. We would expect the same sources of confounding to be operant using both methods, and our finding of concordant results between the two parallel methods is reassuring. Finally, we do not expect that this or any method will be adequate to detect significant increases in very rare “sentinel” events (e.g. Stevens-Johnson syndrome) when they occur, but these should nevertheless be pursued based on clinical relevance.
In summary, results from this pilot project demonstrated the feasibility of using Bayesian data mining methods to analyze administrative claims data. We also showed concurrent validity between the data mining results and traditional methods in the analysis of one particular outcome, AMI. These techniques appear to hold substantial promise to fill a large niche in the evaluation of drug safety for which the available tools for pharmacovigilance are few in number. However, despite these encouraging results, these approaches will require further validation before they can be recommended for widespread use.
This work was funded by the Agency for Healthcare Research and Quality (U18 HS10389-06S1). Some of the investigators (JRC, KGS) also receive support from the National Institutes of Health (AR053351, AR052361) and the Arthritis Foundation (JRC). There was no pharmaceutical support for this project.
JRC: grant support: Merck, Proctor & Gamble, Lilly, Amgen, Novartis; consulting/honorarium: Merck, Proctor & Gamble, Roche, Lilly
KGS: grant support: Amgen; consulting/honorarium: Amgen HC, ED, MK, HY: grant support: Amgen