|Home | About | Journals | Submit | Contact Us | Français|
Accurate measurement of antiretroviral adherence is essential for targeting and rigorously evaluating interventions to improve adherence and prevent viral resistance. Across diseases, medication adherence is an individual, complex, and dynamic human behavior that presents unique measurement challenges. Measurement of medication adherence is further complicated by the diversity of available measures, which have different utility in clinical and research settings. Limited understanding of how to optimize existing adherence measures has hindered progress in adherence research in both HIV and other diseases. Though self-report is the most widely used adherence measure and the most promising for use in clinical care and resource limited settings, adherence researchers have yet to develop evidence-based standards for self-reported adherence. In addition, the use of objective measures, such as electronic drug monitoring or pill counts, is limited by poor understanding of the source and magnitude of error biasing these measures. To address these limitations, research is needed to evaluate methods of combining information from different measures. The goals of this review are to describe the state of the science of adherence measurement, to discuss the advantages and disadvantages of common adherence measurement methods, and to recommend directions for improving antiretroviral adherence measurement in research and clinical care.
Treatment adherence across diseases has been the focus of research for the past four decades, but interest in studying adherence has intensified during the era of combined antiretroviral therapy. However, the limitations of existing adherence measures have hindered progress in adherence research in both HIV and other diseases (1). The use of different adherence measures can lead to discrepancies in conclusions about adherence rates and predictors of adherence. As the field of antiretroviral adherence research has evolved, the question of how to optimize adherence measurement for both research and clinical care has emerged as a fundamental issue that must be addressed before potential solutions to the problem of poor antiretroviral adherence can be rigorously evaluated.
Optimizing adherence measurement in both clinical and research settings is crucial for several reasons. In clinical settings, measures must be efficient, practical, and inexpensive, and precision may be less important than accurately identifying patients in need of interventions. HIV providers are urged to screen for sub-optimal adherence with every patient at every visit (2), but this emphasis on adherence may have the unintended effect of promoting inaccurate self-reported adherence. Patients are most vulnerable to reporting bias, a form of social desirability bias, when they are reporting directly to health care providers from whom they may fear chastisement. Overestimated adherence rates can result in patient misclassification and lead to inaccurate targeting of adherence-improving interventions or delays in addressing adherence problems.
In addition to improving adherence reports in clinical settings, better adherence measures are needed for public health officials who rely on adherence prevalence rates and predictors of poor adherence to identify high-risk populations for adherence interventions. Better measures are also needed for the growing number of randomized trials testing the efficacy of different adherence improving interventions. Adherence measurement challenges for investigators are different from those encountered by HIV providers. Lack of precision in adherence measures may be causing adherence differences between study arms in controlled trials to go undetected. Finally, antiretroviral adherence is crucially important in resource limited settings where second line medication options are limited or virtually nonexistent and sub-optimal adherence must be identified prior to the development of resistance (3).
Commonly used methods for measuring adherence include indirect measures, such as self-reports, electronic drug monitoring (EDM), pill counts, and pharmacy refill records, and direct measures, including detection of drugs or drug metabolites in plasma. Advantages, disadvantages, and key challenges of commonly used adherence measures are listed in the Table. While directly observed therapy is sometimes considered a method for measuring adherence, because it is primarily evaluated as an intervention, it is beyond the scope of this review. Though each method has conceptual, empirical, and logistical advantages and disadvantages, the diversity of measurement methods has contributed to the complexity of the field. The goals of this review are to describe the current state of the science of adherence measurement, to discuss the advantages and disadvantages of commonly used adherence measurement methods, and to recommend future directions for improving the measurement of antiretroviral adherence.
Self-report is the most commonly used adherence measure in both clinical and research settings because it is has low staff and respondent burden, and is inexpensive, flexible, and takes very little time. In clinical settings, self-report allows for a discussion of reasons for missed doses and potential solutions. The validity of self-report for antiretroviral adherence has been demonstrated in two recent reviews. A systematic review of 77 studies employing various self-report adherence measures reported that self-reported adherence was significantly correlated with HIV viral load in 84% of recall periods (4). In a meta-analysis of 65 studies, despite significant heterogeneity in point estimates, the odds of having detectable HIV viral load was more than double in non-adherent patients compared to adherent patients (ORadj=2.31, 95%CI=1.99 - 2.68) (5).
Because of its numerous advantages, self-report is the most likely candidate for widespread use. However, there are several unresolved measurement issues regarding the psychometric characteristics of specific self-report items. Currently, numerous different adherence questions are used in clinical and research settings (4). Most commonly, respondents are asked to report the number of doses they missed during a specified recall period or to estimate their overall percent adherence on a visual analog scale. Response tasks may also include qualitative estimates of overall adherence, reporting the number of days of perfect adherence in the prior week, recalling when the respondent last missed a dose, or determining the number (or proportion) of doses (or pills) missed (or taken) over a specified recall period. Substantial variation also exists in the relevant time frame, with recall periods including the past one, three, seven, or thirty days (4). Response options may be Likert (6-8) or visual analog scales (9-11), and may be close-ended or open-ended. Some self-report measures consist of a single item, some are scales (12), and some questions assist the respondent with visual aids or anchors. Data suggests poor agreement between different self-report adherence measures (10;12-15). The range of self-report items used makes it difficult to compare results even among studies that all use self-report.
Another important unresolved methodological issue is how to mitigate the “ceiling effect,” or the tendency of self-reported adherence to be positively skewed. This bias increases the risk of patient misclassification and presents analytic challenges for investigators. Though often attributed to social desirability bias, the ceiling effect may be influenced by other factors such as question misinterpretation and poor recall (16;17). For example, if a respondent is asked about number of missed pills but answers about the number of missed doses, adherence would be overestimated. Alternatively, because recall for events remembered is more accurate than for those forgotten (18), patients may overestimate adherence due to differential recall. Because of the ceiling effect, self-report is often considered specific but not sensitive for diagnosing poor adherence (19). However, some data suggest that self-reported adherence is inaccurate even among patients who report missing doses, and should therefore not be considered 100% specific for poor adherence (20).
Several interview techniques may lessen the ceiling effect associated with self-reported adherence. It has become standard practice in clinical and research settings to introduce adherence questions with a statement that normalizes non-adherence by acknowledging the challenges of regular medication taking (21). Some research suggests that patients and research participants may be more candid about their behavior when answering questions anonymously on a computer screen instead of in a face-to-face interview. Computer Assisted Self-Interview (CASI), which can include an audio component that eliminates literacy requirements, has been shown to increase reporting of numerous socially undesirable or stigmatizing behaviors in various populations (22-24). Though CASI has be shown to be feasible in clinical settings (25) and has become popular for measuring adherence in research settings, data examining the effects of this method on response rates or validity is lacking.
Two potential directions for future investigation may provide insight into ways to improve self-reported adherence. The first is cognitive interviewing, which was developed through collaboration between survey methodologists and cognitive psychologists. Cognitive interviewing is the study of how targeted audiences interpret, mentally process, and form responses to survey questions, with an emphasis on potential biases in this complex process (26;27). In one study, cognitive interviews of adherence questions revealed that complex recall tasks and estimation of percentages were particularly challenging without the help of a trained interviewer. Since the investigators were planning to use a self-administered survey, the results of the cognitive interviews prompted them to change their questions to enable respondents to complete them on their own (28). With an enhanced understanding of how respondents interpret adherence questions and what strategies they use to recall information and formulate answers, better self-report questions that minimize response error can be developed and tested.
A second potential method for improving self-reported adherence uses Item Response Theory to combine information from multiple questions, or items, that measure a latent trait. Unlike Classical Measurement Theory, which relies on the demonstration of reliability and validity, does not separate sources of error, and combines items to create a composite “score,” Item Response Theory enhances precision by finely differentiating sources of error and evaluating each item's relationship to the construct of interest (29). Because of its advantages over Classical Measurement Theory, Item Response Theory is increasingly being applied in health research (29;30).
Before the adherence research community can converge on evidence-based standards for self-reported adherence, rigorous survey development and testing are needed. Fortunately, empirical testing of adherence questions is beginning to advance the science of self-reported adherence (6;14;31-33).
Electronic drug monitoring (EDM) has been used over the past several years to measure adherence in several diseases (34-37), and are frequently used by HIV researchers (38-40). EDM utilizes monitoring devices, such as the Medication Event Monitoring System (MEMS) cap, which is a pill bottle cap embedded with a microprocessor that records the time and date of each bottle opening as a presumptive dose. The cap stores the information until it is downloaded. Benefits of EDM include the ability to examine both patterns of adherence and detailed aspects of medication taking, such as dose interval adherence (39;41-43). EDM is often treated as the adherence gold standard, because it produces adherence rates with lower central tendencies and more variance than other measures and correlates more closely with HIV viral load than other individual measures (19;44).
As the use of EDM has increased, several potential limitations have been characterized (45). Underestimated adherence can result from not using the cap consistently, which was reported by 36% of respondents in one sample (46). “Pocket dosing,” or the act of removing more than one dose for each bottle opening and pocketing the extra doses to ingest at a later time, can also underestimate adherence. This has been documented in a substantial minority (37% and 41%) of participants in two studies (46;47). In contrast, “curiosity opening,” referring to participants opening the monitored pill bottle without removing any pills, can lead to overestimates of adherence and was reported by 26% of participants in one study (46).
For some participants, the monitor itself is an adherence-improving intervention (48). Recently presented data suggest that electronic monitoring induces an improvement in adherence or “Hawthorne effect” that is self-limiting after 40 days, even among those participants that report no such effect (49). If these data are replicated in other populations, solutions to the “Hawthorne effect” may lie in study design (a phase in period to control for the 40 day effect) or data analysis (excluding the period of initial use). Lastly, concerns about selection bias have been raised because MEMS caps are large, bulky, preclude flexibility in medication taking, and are not appropriate for those who use adherence aids such as pillboxes. Studies that exclude pill box users may be biased towards persons who are less comfortable taking medications in public, less adherent, and less resourceful regarding medication adherence (50).
In addition to these participant-level variables, procedural, quality control, and data management issues can have profound implications on EDM adherence estimates. Examples of important procedural decisions include the optimal interval between study visits where data are downloaded, and the best way to keep track of medication changes between appointments (51). The most vexing data management challenge is correctly identifying periods of non-use that do not represent periods of non-adherence. These can include hospitalizations, incarcerations, periods of substance abuse treatment, times when someone else's pills are taken, or when pills are taken from a different container. To aid in data management decisions, it is recommended to “modify” downloaded EDM data using responses to survey questions or by information from daily diaries kept by participants (47;52). In one study, the investigators report the numerous procedural, data management, and analytic challenges faced by the research team (46). Subjects' concerns about the accuracy of the electronic data led these investigators to conduct a small scale quality control check that demonstrated that MEMS caps did not record every bottle opening. Lastly, the high per person cost has made EDM feasible primarily for use in funded investigations.
Because of the high correlation between EDM adherence estimates and viral load and the ability to examine patterns of adherence over time and dose-interval adherence, EDM is currently the most promising objective adherence measure. Continued research on strategies to minimize measurement error will inform the establishment of empirically based guidelines for the use of electronic monitors to measure adherence.
Two pill-counting techniques for measuring adherence have been described. Announced pill counts take place at clinical appointments or scheduled research visits to which the patient brings their medication bottles. With the correct start date for the prescription, an adherence rate is simple to calculate, and these adherence rates have been shown to have moderate correlations with EDM and HIV viral load (19;53). However, announced pill counts can be inaccurate if patients empty pill containers without ingesting any pills (“pill dumping”), if the accurate start date for the pill supply cannot be determined, or if patients use multiple pill containers. These participant-level and methodological sources of error often lead pill counts to overestimate adherence (54;55). In contrast, unannounced pill counts used in research settings minimize the risk of pill dumping and have been shown in some studies to predict viral load slightly better than EDM (56;57).
Because of the associated methodological challenges and potential impact on the patient-provider relationship, pill counting is poorly suited to assessing adherence in clinical settings. Logistically, pill counting requires both that the patient bring the pill bottle to the visit, and that there is sufficient time to count the pills. The social desirability of the clinical interaction combined with the foreknowledge of when the appointment will occur, may increase the risk of “pill dumping.” Most importantly, patients may perceive pill counting as threatening and suggestive of lack of trust in their self-reported adherence. An adherence measure that promotes an adversarial dynamic between provider and patient should be viewed as counterproductive (58;59).
Using pharmacy refill records to measure antiretroviral adherence is common among investigators in settings where medications are likely to be covered by a single payer, such as Medicaid, the Veterans Administration, a universal health care system, or managed care organizations (60-63). Using pharmacy refill data in community based settings with multiple payers is more labor-intensive, but has been done successfully (64). Data from population based pharmacy refill studies were instrumental in demonstrating the relationship between antiretroviral adherence and important HIV-related clinical outcomes such as disease progression, hospitalization, and mortality (65;66).
The underlying premise of this method is that if patients do not receive timely refills from the pharmacy, they are either missing doses (as measured by prolonged periods between refills) or not taking the medication at all. However, this premise is invalid if patients are obtaining medications in alternate ways (e.g. free samples, family members, or friends) or from other pharmacies. Adherence rates from pharmacy refill records are determined either by comparing actual to expected refill dates (67) or by identifying “medication gaps,” defined as periods of time during which the patient's supply of medication is assumed to have been exhausted (64).
This method of measuring adherence further relies on the major assumption that patients who receive timely pharmacy refills ingest their medications correctly. The validity of this assumption has been evaluated by examining the association between pharmacy refill adherence and biologic outcomes, and in several studies pharmacy refill adherence has been shown to correlate significantly with HIV viral load (61;68-70). In one study, pharmacy data were used to examine those who self-reported 100% adherence. Forty-one percent of this population was non-adherent by pharmacy refill records. The reclassified group had significantly poorer virologic response than those who were adherent by both self-report and pharmacy refill records (61). Another study found a significant linear trend of improved virologic response across ordered categories of pharmacy adherence. Specifically 84% of individuals with pharmacy adherence ≥ 95% had undetectable HIV viral load whereas only 64% of individuals with pharmacy adherence between 90% and <95% achieved this clinical outcome (p=.001) (69).
The benefits of pharmacy refill records for measuring adherence are potential immunity to social desirability and reporting bias, ability to obtain population level data, and absence of patient-level burden. The major disadvantage of this method is its limited feasibility, which makes it inappropriate for adherence assessments in clinical settings. Despite their limitations, pharmacy refill records should be considered an important and valid adherence measure in population based research (71).
Monitoring of drug levels has been considered a direct, objective measure of medication adherence that is feasible in both clinical and research settings (52;72). Low drug levels have been associated with self-reported non-adherence (72;73) and virologic failure (74-77), but not all studies have shown a benefit to measuring serum drug levels. In one study an abnormally low drug level had a specificity of 88% for detecting adherence of 90% or less (75), but another study found that the addition of drug levels to a composite adherence measure did not improve the composite measure's ability to predict viral load (78).
Therapeutic drug monitoring is most severely limited by a lack of technological standardization. Procedures for sample collection, cross-validation of analytic procedures, and interpretation of results vary between settings (79-81). Further, factors other than adherence may affect drug levels, such as drug-drug interactions and diet. Lastly, serum drug levels only reflect adherence over the past 24 hours, and patients who are aware of a planned visit may ingest medication in anticipation of the test (82).
Several studies have shown that pharmacy refill and pill count adherence are lower than self-reported adherence and higher than adherence measured by EDM (9;19;44;53;56;83-87). This suggests that there is measurement error associated with each method, and that true adherence lies somewhere between the extremes of these diverse adherence estimates. Because of the weaknesses of each individual measure, it is likely that the best method of assessing adherence includes multiple measures. Techniques that combine information from multiple measures reduce the error associated with any single measure. hough using multiple measures is not feasible in routine clinical practice, in research settings investigators should be held to the most rigorous measurement standards. Though several authors have suggested that multiple adherence measures be used in research settings, data is lacking on how best to combine measures, how many measures to combine, or how many time points to include (88-90).
Liu et al developed the first composite adherence measure in 2001. This measure involved a hierarchy of individual measures, starting with EDM data, and then filling in missing data with pill counts, and if necessary, self-reports, to create a complete database. The composite score produced an adherence rate that was higher than EDM and lower than both pill counts and self-reports. Compared to each individual adherence measure, the composite score was the most highly correlated with virologic response (19).
In contrast, in an effort to improve the sensitivity of measures to detect non-adherence, some investigators have examined composite scores based on combining information from survey questions. One study combined questions about psychosocial, clinical and environmental characteristics that are associated with poor adherence and found that such a composite score had a sensitivity of 71% for detecting non-adherence (91). Another study found that a composite score based on questions about trust in physician, number of children, substance use, CD4+ T-cell count, and beliefs about antiretroviral medications, had a much higher sensitivity for predicting non-adherence than self-report (33). With rigorous development and testing, composite measures have the potential to improve ease and accuracy of adherence measurement.
Validity is an essential feature of psychometric instruments that is synonymous with truthfulness, authenticity, and accuracy (92). Traditionally, the major types of validity are content, criterion-related, and construct validity. Content validity evaluates how well the questions represent the domain they are intended to measure. Demonstrating content validity often involves having questions reviewed by experts, and is more challenging when the construct is multifaceted or poorly defined. To have content validity, adherence measures should represent the many distinct forms of medication adherence, including refill adherence, dosing interval adherence, dietary requirements, and pill quantity adherence (93).
Criterion-related validity refers to how well a measure correlates with an external criterion. Demonstrating this type of validity relies on the existence of an external criterion that accurately represents the construct of interest. Several investigators have chosen to demonstrate the validity of adherence measures by measuring the strength of their association with EDM (6;32;94). However, given the numerous sources of measurement error with EDM, it cannot be viewed as a perfect external criterion. Efforts to minimize measurement error associated with EDM will provide a more accurate external criterion against which to compare new adherence measures. While criterion-related validity is important, it provides limited information about how well adherence behaviors are actually being measured.
Lastly, construct validity reflects how well the measure matches theoretical expectations and can be demonstrated by examining expected relationships between variables, or by showing that measures are reliable indicators of an underlying construct. Most adherence measurement studies demonstrate construct validity of individual measures by assessing the strength of the association between the measure and HIV viral load. Though commonly reported, the relationship between adherence and viral load may be confounded by viral resistance, regimen potency, drug interactions, and individual differences in drug absorption and metabolism. Construct validity is often considered more important than criterion-related validity for complex behaviors, like medication adherence.
The three major types of validity were recently evaluated in a widely used measure of nicotine dependence, the Fagerstrom test for nicotine dependence (95;96). In this study, investigators examined content validity, or how well the questions represented the domain they were intended to measure, by compiling a list of published definitions of nicotine dependence and cigarette addiction and then comparing items in the Fagerstrom test against the published definitions. They found that the scale failed to assess several important domains of addictive behavior, including diminished effects with the same amount of tobacco, and unsuccessful efforts to cut down. For an adherence measure, this might be analogous to capturing missed doses but not examining dosing intervals. In both examples, lack of content validity puts the measure at risk of misclassifying respondents.
Continuing to use smoking as an example, criterion-related validity can be demonstrated by examining associations between the Fagerstrom test for nicotine dependence and an external variable that reflects nicotine dependence, for example, the number of cigarettes smoked per day or nicotine dependence measured with a different instrument. These evaluations are comparable to using EDM as an external criterion for adherence measures. In both cases, choosing an appropriate external criterion is crucial.
Evaluating expected theoretical relationships is the most common way to demonstrate construct validity. For the measurement of nicotine dependence, such theoretical relationships might include those between nicotine dependence measured by the Fagerstrom test and the following variables: likelihood of future abstinence, self-efficacy for quitting smoking, nicotine withdrawal symptoms, self-perceived nicotine dependence, and saliva cotinine (a nicotine metabolite). For adherence, theoretical relationships might include expected associations between poor adherence and patient level variables (such as depression, active drug use, or poor self-efficacy) or biologic variables (such as HIV viral load or the development of resistance).
Though this approach may not be directly applicable to all adherence measures, it is important to consider broader measurement issues when developing and testing adherence measures (12). In particular, investigators should incorporate approaches to demonstrating the three major types of validity. The skillful incorporation of these concepts will require interdisciplinary collaborations with social scientists who have expertise in measuring complex behaviors.
Adherence is difficult to measure because it is comprised of several distinct behaviors. Component adherence behaviors include obtaining refills, ingesting the right number of pills, ingesting pills within an effective dosing interval, and ingesting them in accordance with dietary requirements. Often individual measures only gauge one aspect of adherence behavior, and this can threaten validity. This phenomenon, termed “construct under representation” occurs when a measure fails to assess important dimensions of the construct in question (92). While some adherence measures, such as EDM, provide the ability to measure several aspects of adherence, the data are not generally analyzed this way.
In addition to construct under-representation, EDM and other measures are vulnerable to another validity threat caused by measuring unrelated constructs. The term “construct irrelevant variance” is used when a measure contains excess variance due to unrelated constructs. For example, constructs that may be represented in adherence data but are unrelated to medication adherence include “curiosity” that may cause a person to open a MEMS cap without taking a dose, “pocket dosing,” and susceptibility to the “Hawthorne effect”, for whom EDM is an adherence-improving intervention. Construct-irrelevant variance can result from a measure that is too broad, the method of administration, or a deficiency in reliability. Future measurement research should address these specific threats to validity.
Though much existing research has focused on the critical issue of medication adherence, the specific field of adherence measurement is still young. In particular, the best way for health care providers to assess individual patients' adherence is unclear, and the most rigorous way for researchers to measure adherence is unknown. Research on measurement work has separate implications for investigators and clinicians. For adherence investigators, this work will inform adherence intervention trials by providing evidence on which to base key methodological decisions, such as which adherence measures to use and how to optimize data collection. For clinicians, examination of ways to improve self-reported adherence has the potential to decrease provider frustration, minimize adherence misclassification, and promote timely adherence interventions. Rigorous adherence measurement research will demand interdisciplinary collaborations between HIV researchers and social scientists. Improving measurement of antiretroviral adherence will allow the adherence community to embark on the development and evaluation of adherence improving interventions with standardized and empirically tested adherence measures.
Funding: This work was funded partly by a Montefiore Medical Center Health Services Faculty Scholar Award to Dr. Berg, and also by a Center for AIDS Research grant (P30 AI51519) awarded to the Albert Einstein College of Medicine of Yeshiva University by the National Institutes of Health.