|Home | About | Journals | Submit | Contact Us | Français|
Although biologic treatments have excellent efficacy for many autoimmune diseases, safety concerns persist. Understanding the absolute and comparative risks of adverse events in patient and disease subpopulations is critical for optimal prescribing of biologics.
The Safety Assessment of Biologic Therapy collaborative was federally funded to provide robust estimates of rates and relative risks of adverse events among biologics users using data from national Medicaid and Medicare plus Medicaid dual-eligible programs, Tennessee Medicaid, Kaiser Permanente, and state pharmaceutical assistance programs supplementing New Jersey and Pennsylvania Medicare programs. This report describes the organizational structure of the collaborative, and the study population and methods.
This retrospective cohort study (1998–2007) examined risks of 7 classes of adverse events in relation to biologic treatments prescribed for 7 autoimmune diseases. Propensity scores were used to control for confounding and enabled pooling of individual level data across data systems while concealing personal health information. Cox proportional hazard modeling was used to analyze study hypotheses.
The cohort comprised 159,000 subjects with rheumatic diseases, 33,000 with psoriasis, and 46,000 with inflammatory bowel disease. This reports summarizes demographic characteristics and drug exposures. Separate reports will provide outcome definitions and estimated hazard ratios for adverse events.
This comprehensive research will improve understanding of the safety of these treatments. The methods described may be useful to others planning similar evaluations.
The Safety Assessment of Biologic Therapy project was funded by the U.S. Agency for Healthcare Research and Quality and the U.S. Food and Drug Administration. It combined 4 large healthcare data systems with the aim of testing hypotheses relevant to 7 autoimmune diseases, 8 biologic agents, 7 classes of adverse events, and 6 vulnerable populations. The large number of analyses to be conducted demanded an efficient and valid approach to study design and data management and analysis, which we describe in this report. Separate reports will summarize the results for each of the 7 classes of adverse events.
This project was a collaboration of 5 research centers that participate in the Agency for Healthcare Research and Quality Centers for Education and Research in Therapeutics: the University of Alabama at Birmingham, the University of Pennsylvania, Vanderbilt University; Brigham and Women’s Hospital; and Kaiser Permanente Northern California (KPNC), a member of the HMO Research Network. Institutional Review Board approval was obtained at each of the study centers.
We studied 7 patient subgroups including rheumatoid arthritis (RA), juvenile idiopathic arthritis (JIA), psoriasis (PsO), psoriatic arthritis (PsA), inflammatory bowel disease (IBD), and ankylosing spondylitis (AS). Methods for JIA will be described in a separate report. The biologic agents under investigation were FDA-approved through 2008 and included 3 anti-tumor necrosis factor alpha (anti-TNF-α) agents and 5 other biologic agents. The study team subdivided into workgroups to study 7 classes of adverse events, with serious infections assigned for coordination to the University of Alabama at Birmingham; cardiovascular outcomes to Brigham and Women’s Hospital; malignancies to the University of Pennsylvania; hip and non-vertebral fractures and congenital malformations and pregnancy complications to Vanderbilt University; and interstitial lung disease and mortality to KPNC.
The methods described here were used for all patient subgroups except JIA. Similarly, one class of adverse events, congenital malformations and pregnancy complications, was addressed using a distinct study design. The methods used for JIA and congenital malformations and pregnancy complications will be reported in the future.
The project period for the study was 18 months, with the process of collaboration involving 2 face-to-face meetings, weekly teleconferences of the entire study team, weekly teleconferences of the Data Coordinating Committee, and monthly or more frequent teleconferences of workgroups.
The study included information from 1998 from 4 data systems (as listed in Table 1) held in custody by 4 of the 5 collaborating institutions.
The University of Alabama at Birmingham held custody of national data from the Centers for Medicare and Medicaid Services. Medicaid is a federal-, state-, and county-funded program that provides means-tested health insurance for those with limited income and financial resources. Medicaid covers a wide range of health care services including clinic and hospital visits, prescription drugs, and other utilization. Among the groups of people served by Medicaid are U.S. citizens and resident aliens, including people with specific disabilities as well as low-income adults and their children. Medicare is a federally-funded social health insurance program focused primarily on people age 65 or older, but also including some individuals under age 65 with specific disabilities, and people of all ages with end-stage renal disease. An individual’s financial resources play no role in determining eligibility. Some persons, known as Medicare dual eligibles, are eligible for both Medicaid and Medicare. The patients included in the study included Medicaid patients as well as Medicaid-Medicare dual eligible patients. The national-level data files used for the study comprised Medicaid Analytic eXtract (MAX) files (2000–2005); Medicare inpatient and other institutional and non-institutional provider files (2000–2006); and Medicare Part D prescription drug event files (2006).
Vanderbilt University held custody of the TennCare data, the managed-care Medicaid agency in Tennessee that provides access to healthcare insurance to Medicaid-eligible patients and others who lack access to care. TennCare covered approximately 21% of the population of Tennessee in 2005. TennCare enrollees are primarily low-income: children, pregnant women, parents of minor children, and people who are elderly or have a disability. During the study period TennCare also covered persons who were uninsured and/or insurable, even if they did not otherwise qualify for Medicaid. There is also an overrepresentation of females and non-white persons, compared with the Tennessee population.
Brigham and Women’s Hospital held custody of data from New Jersey’s Pharmaceutical Assistance to the Aged and Disabled program (PAAD) and Pennsylvania’s Pharmaceutical Assistance Contract for the Elderly (PACE) program.[3,4] These state-run pharmacy benefits programs pay for medications for low-income elderly residents who do not quality for Medicaid.
The Division of Research at KPNC held custody of KPNC’s data. The health plan provides services to 3.2 million persons through the Kaiser Permanente Medical Care Program. The program, established in 1946, provides comprehensive, integrated care to one-third of the population in its Northern California service areas in the San Francisco Bay region, Sacramento, and less urbanized communities. It is a closed, staff model plan with capitated payment. Information from computerized clinical and patient data systems is used to provide care and manage utilization, not to process insurance claims. Compared with persons whose medical insurance is covered by other insurers, the Kaiser Permanente membership has greater racial diversity, lower mean income, lower college attainment, more obesity, and similar smoking prevalence. When compared with both uninsured and insured by others, Kaiser Permanente members have similar racial diversity, although with fewer Latinos; higher income; greater college attainment; similar obesity, and lower smoking prevalence.
This section provides an overview of key study design decisions. Later sections provide detailed methods, including operational definitions and the analytic approach used to estimate the relative risks. Operational definitions of adverse events were a key design issue, and are described in separate reports.
We made head-to-head comparisons and compared users of biologic agents to users of non-biologic regimens that would be appropriate for patients with active disease. The study hypotheses are listed in Table 2.
We used a retrospective, incident user cohort study design.  The study design and analysis plan were developed with the goal of obtaining valid, precise, and detailed estimates of the drug-event associations while protecting personal health information in accordance with the provisions of Federal, State, and KPNC Data Use Agreements. 
The first key design issue was minimizing confounding by indication. We used four approaches to obtain biologic-exposed and comparison groups balanced with respect to potential confounders. (1) During the design phase, we identified patients with contraindications to the therapy for exclusion from some analyses because they would differ substantially from other patients with respect to their likelihood of being treated and their risk of the adverse event under study. (2) We used comparison groups of patients adding or switching therapies consistent with the patient having active disease (Table 2). (3) We used propensity score matching and adjustment within each of the data systems to maximize covariate similarities between biologic and comparison groups. (4) Finally, we specified a prior subgroup analyses and sensitivity analyses to test key assumptions for evaluating the potential bias of mis-specified operational definitions.
The second key design issue was study power. To maximize the power of the study, patients were allowed to contribute multiple treatment episodes, each with its own start date and baseline period for covariate assessment, provided the episode met eligibility criteria for the particular hypothesis under evaluation.
The third key design issue was the method for aggregating data and results across the 4 separate data systems. Sharing of raw person-level data across the study centers was not possible because of the provisions of each of the 4 Data Use Agreements. We therefore pooled individual-level data containing a very limited number of class variables while masking more detailed covariate information through the use of data system-specific propensity scores. To accomplish this, we used common data formats to create files that were shared across workgroups and collaborating institutions.
We used Cox proportional hazard regression models to estimate hazard ratios taking into account time on therapy. Given the large number of associations being tested, consistent definitions across the analyses were essential for efficient programming and workflow. We therefore standardized and automated data processing and analyses as much as possible while preserving the flexibility needed to analyze the data while considering the nature of the adverse event.
This section briefly describes the criteria used to determine whether a patient was eligible for at least one analysis. Additional eligibility criteria may have been applied at the time the patient was considered for any specific analysis.
Patients were provisionally eligible if (1) they met an operational definition for autoimmune disease, (2) had at least one dispensing of a biologic agent or comparison non-biologic regimen relevant to their autoimmune disease, as specified in Table 2, and (3) had an eligible baseline period, defined as a period 12 months of observation in the data system preceding their first eligible dispensing of biologic agent or comparison non-biologic regimen. Detailed operational definitions for autoimmune diseases are available upon request. The operational definitions of autoimmune arthritides were age >16 years, and ≥ 1 relevant inpatient or outpatient physician visit codes per the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) of RA (714.x but not 714.3), PsA (696.0), and AS (696.0). PsO and IBD were defined without regard to age, with the relevant code for PsO being 696.1; and for IBD, 555 and 556. In most circumstances, a single diagnosis code was accepted because the therapies under study are used for patients with active and relatively severe disease, and because use of the therapies has strong face validity for ascertaining autoimmune disease. The diagnosis code may have been present at any position on the claim or clinical record from an inpatient or outpatient physician evaluation or management visit, and it may have been recorded in the computerized data at any time during the 12 months preceding the treatment episode start date. Patients with multiple autoimmune diseases were eligible for each disease unless specifically excluded. Persons aged ≤ 16 years on the first date of the relevant diagnosis code with an RA, PsA, or AS were defined as having JIA and not RA, PsA, or AS.
For some but not all analyses, we identified for possible exclusion those study subjects who used cyclophosphamide, cyclosporine, or tacrolimus because of the infrequent use of these therapies and the expectation that patients who used them would be dissimilar to patients using more standard therapies. Patients with organ transplantation, infection with human immunodeficiency virus, advanced kidney and liver disease, or recent cancer diagnoses were identified for exclusion from some analyses because their risk of adverse events would likely be substantially different than risk in the majority of patients without these conditions.
The following concepts were used for all biologic drugs and comparison regimens. For each biologic and comparison regimen, exposure was assessed on each day throughout the observation period. A treatment episode was defined as a period of use of a therapeutic regimen, either biologic or comparison (as detailed in Table 2) with or without concomitant therapy, but with a precisely defined start date, which we label, throughout this report, as the treatment episode start date. A single dispensing was sufficient to define a treatment episode. Our incident user design required that the patient had not met the operational definition for the same regimen in the baseline period, i.e., the 365 days preceding the start of the treatment episode.
A treatment episode ended when the patient (1) switched to a new regimen, biologic or comparison, that met a different operational definition listed in Table 2, or (2) did not have the agent dispensed again within 30 days after the supply of the drug ended. The pharmacy variable “days-supply” was used to estimate the intended duration of each dispensing, and the patient was given a 30-day grace period to refill the dispensing or undergo an infusion. Thus, duration of use was defined as the date of dispensing + days supply + 30-day grace period. When the days-supply variable was not recorded in the pharmacy data, we imputed the mode value for the medication within the specific data system (MAX, TennCare, PACE/PAAD, or KPNC). If the patient received a new dispensing of the medication before the days-supply was exhausted, the excess supply was carried over. If the patient was dispensed a new biologic drug before the supply of the preceding drug (non-biologic or a different biologic) had been exhausted, they were assumed to have switched therapy and were assigned a new treatment episode with a new baseline period. If they were dispensed a non-biologic drug before the supply of a biologic drug had been exhausted, they were assumed to have added a concomitant drug. All treatment episodes were mutually exclusive in time.
The initial cohort entry date was defined as the date when the patient first met an operational definition for an eligible treatment episode, either biologic or comparison. To ensure protection of personal health information, we re-coded all calendar dates referent to the initial cohort entry date thereby de-identifying all dates while preserving the precise temporal relationship of the variables.
The biologic drugs under study are listed in Table 3. A new treatment episode of biologic drug began on the date of first dispensing. The expected duration of the treatment episode was based on the expected frequency of dispensing, as shown in the last column of the table.
Operational definitions for seven comparison non-biologic regimens were developed referent to the underlying autoimmune disease (Table 2). Comparison regimens involved (1) initiating new therapy, or (2) intensifying therapy by adding additional agents or switching to a new agent. A new treatment episode of a non-biologic regimen began on the date when the operational definition (Table 2) was met.
Study covariates were ascertained during the 365-day baseline period preceding each treatment episode.
Some covariates were shared across workgroups to facilitate analysis of vulnerable populations and to assess study validity, and we refer to these as “shared covariates”. They were used as stratifying variables for subgroup and sensitivity analyses. Shared covariates included data system identifier, calendar year of cohort entry, age (5-year categories), sex, race/ethnicity, urban/rural residence (census block), Charlson-Deyo comorbidity, and oral glucocorticoid exposure. For oral glucocorticoids, we calculated the average daily dose and the cumulative prednisone-equivalent dose over each 6-month interval in the baseline period and following the initial cohort entry date. Thus, oral glucocorticoid treatment was not mutually exclusive with other non-biologic regimens or with biologic treatments.
Other covariates could not be shared across the collaboration and were incorporated into the propensity score such that their values were known only to those who had custodial responsibility for the data. Variables included in the propensity score are listed in Table 4. Age, in 5-year groups, was a shared covariate; it was also in the propensity score as a continuous variable. The Charlson comorbidity index was a shared covariate; in addition, its specific elements were in the propensity score.
We computed propensity scores estimating the probability of initiating a biologic agent in contrast to the comparison regimen for each hypothesis specified in Table 2 and within strata defined by autoimmune disease. The propensity scores were computed separately for each data system, and the same propensity scores were used for every outcome.
To enable the analysis of acute events, we recomputed the propensity score whenever the patient initiated a new episode of the same treatment within the same analysis; however, in this instance, we recomputed the propensity score only when the gap between the end of the previous episode and the start of the new episode exceeded 12 months.
Propensity scores were calculated by each of the data custodians using a common logistic regression program. Predicted probabilities were plotted for visual inspection of overlapping distributions. The C statistic was used to estimate the goodness of fit.  Some analyses used propensity score matching to maximize comparability between users of biologic agents and users of comparison regimens. Individuals eligible to be matched were randomly ordered, and a 1:1 greedy matching strategy without replacement was used. After matching, balance on all covariates of interest was assessed by comparing descriptive characteristics between treatment groups for each propensity-score matched cohort. To enhance precision, some analyses used propensity score adjustment instead of propensity score matching. When this was done, analyses were restricted to treatment episodes that had propensity scores within the region of overlap between the scores of the two treatments being compared, and the propensity scores were classified as ordered categorical variables according to quintile.
The adverse events of interest included (1) serious infections (both hospitalized bacterial and hospitalized and outpatient opportunistic infections), (2) cardiovascular outcomes (list which from CV workgroup), (3) solid tumor and hematologic malignancies, (4) hip and non-vertebral fractures, (5) interstitial lung disease, and (6) mortality. Separate reports will provide operational definitions of each adverse event, additional details of the modeling approach specific to the adverse event, and analytic results.
At each of the four centers providing custodianship of the source data, analysts transformed local data to standardized variable names and formats using a common data dictionary. Before transfer, the data were analyzed to assess quality and comparability. Each center prepared 3 files for transfer of data to the workgroups responsible for combining and analyzing the data. File 1 contained the subject’s study identification number, data system, autoimmune disease, first date the patient was contraindicated for biological therapy (for potential exclusion or censoring), shared covariates (Table 4), and propensity scores (one for each pair of drug-exposure contrasts). File 2 contained information on exposure to biologic drugs and comparison regimens. File 3 contained the adverse events being investigated. Files were transferred from data custodians to workgroups using secure file transfer.
Following extensive discussion by the project team, a common data analysis program was provided by the Study Biostatistician and the Director of the Data Coordinating Center in collaboration with the data analysts at each center. The SAS® (Cary, North Carolina) program was distributed to each of the workgroups with the intention that it enable the flexibility needed to assure the most valid and efficient analysis for each of the adverse events under investigation, recognizing that the epidemiologic and clinical features of each event were important considerations for making analytic decisions.
Each of the 6 workgroups made analytic decisions about: (1) the specific hypotheses under study from among those listed in Table 2, (2) use of person or treatment episode as the unit of observation, (3) exclusions from the study population, (4) coding of biologic and comparison drug exposure with respect to timing, (5) outcome definitions and analysis of second occurrences of the same outcome in a single patient, (6) censoring rules with respect to switching or discontinuing therapy, (7) adjustment for oral glucocorticoids and shared covariates, (8) aggregating the analysis, or not, across autoimmune diseases, and (9) subgroup and sensitivity analyses.
Cox proportional hazards analysis was used to estimate the hazard ratio. For all analyses, entry into follow-up was defined as the initial cohort entry date (the day of the start of the first relevant treatment episode, either biologic or comparison). In all analyses, patients were censored on the earliest of the death date, the end of enrollment, or the end of the data. Additional censoring rules were applied at the discretion of the workgroups and included, for example, the end of the treatment episode, or the date a patient received a diagnosis or procedure code that created an exclusion. Patient follow-up continued throughout any hospitalizations that occurred during observation.
Typically, a subject could contribute only a single outcome to any single analysis. However, in some analyses, the subject was allowed to contribute multiple outcomes, such as when the subject had two diagnoses of pneumonia recorded during two separate treatment episodes, with both being counted in a single analysis. Since patients could contribute one or more episodes of new use (with an updated set of covariates), we accounted for this clustering of observations using patient’s study numbers to define clusters, accounting for this additional intra-group correlation using the Huber–White ‘sandwich’ variance estimator to calculate robust standard error for all estimates.[13, 14]
Primary analyses included data from all data systems with the expectation that treatment effects would be similar across the different systems. Nonetheless, when feasible, we assessed whether the observed associations differed across data system by including an interaction term between the variable for data system and the treatment variable. Results of these tests for interaction were interpreted with consideration of both the magnitude of difference in hazard ratios across data systems and the level of statistical significance of the test for heterogeneity. When significant heterogeneity was observed, it was considered hypothesis generating and stratified analysis results were reported in addition to the pooled results.
When sample size permitted, we stratified the analyses for vulnerable populations defined by sex, age (with a focus on children and those aged ≥65 years), co-morbidity (defined by the Charlson-Deyo co-morbidity index), race/ethnicity, and rural/urban residence (defined by census block group). It should be noted that the specification of “vulnerable populations” or “priority populations” evolved from that proposed by the Agency for Healthcare Research and Quality at start of the project. 
Figure 1 shows the 3,170,997 persons ascertained as potential study subjects based on their receipt of one of the study drugs or comparison regimens. Of these, 59% had no eligible baseline period. Of those remaining potentially eligible, 19% had at least one code for one of the 5 autoimmune diseases of interest during the 12-month baseline period. The number of subjects eligible for at least one analysis was 239,806. The mean duration of observation (standard deviation [SD])—from the initial cohort entry date to the date of death, disenrollment, or the end of the data—was as follows: MAX, 1.9 years (SD 1.5); KPNC 3.0 years (SD 2.5); PACE 3.3 years (2.4) and TennCare 3.1 years (2.1).
RA was the most common (66% of subjects) underlying autoimmune disease (Table 5). The study population was diverse with respect to age, race/ethnicity, residence, and co-morbidity, primarily because of differences in eligibility (Table 5). The mean (standard deviation) of census block household income was as follows: MAX, $37,342 (SD $14,557); KPNC $60,253 (SD $21,391); PACE $44,921 (SD $19,579) (and TennCare $33,935 (SD $11,097). Episodes of use of the biologic drugs and comparison regimens of interest in the study are shown in Table 6. Etanercept and infliximab were the most commonly used biologic drugs, followed by adalimumab.
The Safety Assessment of Biologic Therapy project is an early application of research involving comprehensive analysis of drug safety and using multiple healthcare data systems. Collaborations such as this will increase in number and complexity to meet the needs of society for post-marketing evaluations of drug safety. Critical to the successful completion of the work has been (1) interdisciplinary expertise in medicine, epidemiology, biostatistics, and data processing and analysis, as well as deep experience with the data systems that were used, (2) agreed-upon terminology for discussing the study design and analytic approach, (3) use of a standardized data dictionary, and (4) data processing that preserved flexibility during modeling while stream-lining a common approach to the analysis.
The primary innovation of the project was in developing a common data management and analysis framework to enable testing of large numbers of hypotheses across multiple data systems and in the context of Data Use Agreements that limited sharing of person-level data. While large computerized data systems are essential for drug safety research and represent real-world experience with the drugs, they have critical limitations. Most critical are (1) the lack of randomization of patients to intervention and comparison groups with consequent biases that can undermine the study aims; (2) incomplete information on the patient’s clinical history and past drug exposure, among other important patient-level factors; (2) interval censoring, with patients cycling into and out of enrollment, during which the researcher cannot observe important events; (3) short follow-up, with long-term exposure and outcomes with long latency being unobservable; and (4) use of algorithms that have not been validated and may not be accurate for predicting outcomes or critical aspects of the patient’s clinical history. Nevertheless, the data systems used for this study have relatively long follow-up, allowing greater capture of the patient’s history. We addressed confounding through use of propensity scores, restriction, adjustment, and careful selection of comparison groups. These limitations and the solutions we applied will be addressed in forthcoming papers specifically in connection with the adverse events that were at the heart of this research project.
This work was funded through by contract with the Agency for Health Research and Quality (AHRQ) (1U 18H 17919) and by infrastructure support from the AHRQ Centers for Education & Research on Therapeutics.