|Home | About | Journals | Submit | Contact Us | Français|
Evidence-driven decisions have become a standard for health interventions, policy, and programs. While randomized controlled trials (RCTs) are encouraged for public health interventions, there are limitations with RCTs as the gold standard of evidence for HIV interventions. We developed a novel system of evaluating evidence for assessing HIV preventive interventions termed the Highest Attainable Standard of Evidence (HASTE).
The HASTE system focuses on triangulation of three distinct categories of evidence: efficacy data, implementation data, and plausibility. We conducted systematic reviews, including experimental and observational data, to assess all available interventions for men who have sex with men (MSM). We collected implementation and programmatic data using a global electronic consultation, Internet searches, and in-person consultations. We assessed plausibility with expert analyses of both biological and public health evidence.
HASTE includes four grades of evidence: Strong (Grade 1), Conditional (Grade 2), Insufficient (Grade 3), and Inappropriate (Grade 4). We used the HASTE system to evaluate the evidence for HIV interventions for MSM in low- and middle-income countries. Several differences emerged in the strength of recommendation with the use of the HASTE system, including strong recommendations for voluntary counseling and testing and for structural interventions.
The HASTE system addresses a need for an evidence evaluation tool that is specific for HIV interventions and facilitates an evaluation of biomedical, behavioral, and structural approaches using the highest standard of attainable evidence. HASTE represents a tool that balances scientific integrity and practicality in assessing the quality of evidence of preventive interventions targeting the most-at-risk populations for HIV.
Evidence-based decision-making has become a global standard for health interventions, policy, and programs. In the human immunodeficiency virus (HIV) arena, this methodology has been a welcome trend. While randomized controlled trials (RCTs) with HIV endpoints are encouraged for public health interventions, there are limitations with using RCT as the gold standard of evidence for HIV interventions.1–3 These limitations include the very high cost of efficacy trials, the relative scarcity of populations with sufficient incidence in which to mount trials, and the ethical imperative to compare experimental treatments with ever more potent control conditions, which can mitigate the ability to assess efficacy. In addition, some interventions have never been formally evaluated with RCTs but have demonstrated significant impact from implementation and observational research, making RCTs now either unfeasible or ethically challenging. An example of this limitation is the use of needle and syringe exchange programs for injecting drug users (IDUs), for which many would argue the window of opportunity to conduct an RCT has long passed.4
Within the realm of clinical medicine, evidence-based medicine (EBM) is now considered the basis by which to define standards of clinical care. Defining packages of clinical services has been predicated on systematic reviews of individually randomized double-blinded placebo-controlled trials of medications and/or services for patients with varying clinical conditions. The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system has been widely endorsed as the most effective method with which to grade the current state of evidence for a variety of clinical interventions.5,6 The GRADE system was designed for individual-level clinical interventions in which the traditional hierarchy of evidence is applied. Specifically, the highest-quality evidence is derived from double-blinded placebo-controlled RCTs, followed by unblinded RCTs, prospective cohort studies, case-control studies, clinical case series, and consensus among experts. Additional weight is given to appropriately executed systematic reviews and meta-analyses of studies with little heterogeneity among participants, methods, and results.
To further standardize the presentation of evidence in clinical interventions and meta-analyses, criteria including the Consolidated Standards of Reporting Trials (CONSORT) and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) were developed, which have facilitated the grading of evidence.7,8 Given the nature of individual-level clinical interventions, the GRADE system has facilitated the development of clinical practice guidelines and other clinical practice tools to promote the practice of EBM.9 The GRADE system is also relevant, as it integrates the potential for a separation between quality of evidence and strength of recommendations based on extenuating circumstances, such as cost-efficacy, risk-benefit, and contextual factors.10
While there is general acceptance of the use of the GRADE system in clinical medicine, there has been no widely accepted standard for grading public health interventions, although a series of different algorithms and hierarchies of evidence have been proposed.1,11–13 Varying algorithms of evaluating public health interventions are also in use by the International Agency for Research on Cancer, the United States Preventive Services Task Force (USPSTF), the newly revived Canadian Task Force on Preventive Health Care (CTFPHC), and the National Institute for Health and Clinical Excellence in the United Kingdom, among others. While Hill's criteria for causality, including strength of the relationship, dose-response, temporality, experimental evidence, analogy, and biologic plausibility, still apply to public health interventions, it is more difficult to demonstrate efficacy using traditional evaluation strategies.14 One reason for this difficulty is that public health interventions tend to be context-specific, including geographic and socioeconomic contexts, and are generally multifaceted. Moreover, primary prevention strategies targeting at-risk populations may be subject to the prevention paradox first described by Rose.15 The prevention paradox describes a situation in which an effective population-level public health intervention may provide only little benefit at the individual level while being significant at the population level, thereby complicating the measurement of effectiveness or efficacy.16
There is an emerging awareness of the significant roles the key or most-at-risk populations (MARPS) may play in HIV epidemics. Many country epidemics began as concentrated epidemics among MARPS including gay men and other men who have sex men (MSM), sex workers, and IDUs, and then transitioned to more generalized epidemics. The role of MARPS in concentrated epidemics is relatively uncontroversial. However, in generalized epidemic settings, the initial presentation of numerous HIV epidemics was among MSM.3,4 With the emergence of generalized epidemics, the role of these three MARPS and other country-specific MARPS, such as truckers, internally displaced people, and victims of gender-based violence, has been given less attention. However, there is a growing evidence base of disproportionate risk among MARPS including sex workers, MSM, and IDUs in these settings.17,18 Recent assessments of global HIV prevention suggest that few HIV/acquired immunodeficiency syndrome (AIDS) prevention, treatment, and care programs include targeted programming for these populations.19 Given this lack of funding, infections continue to increase in the context of slowing epidemics in the general population and, with the exception of IDUs, there has been limited progress in defining the optimal package of services for MARPS in low- and middle-income settings.
In response to this lack of progress, our multi-disciplinary team worked to develop a novel system of evaluating evidence for HIV interventions targeting decreasing HIV risk specifically among MSM. We propose the use of the term the “Highest Attainable Standard of Evidence” (HASTE). HASTE was initially used to define a package of services for preventing HIV infection among MSM in low- and middle-income countries (LMIC). Other derivatives of the GRADE system have been suggested, including by Tang et al.13 When stigma affecting MARPS is compounded with the difficulties and limitations of RCTs evaluating public health intervention with biological endpoints, the evidence base for any HIV intervention supporting these vulnerable populations is limited. Thus, while our initial intent was to use the GRADE methodology to evaluate individual interventions, it became clear that the system required modifications to include assessment of what we termed HASTE. HASTE deliberately echoes the language of human rights conventions on the right to health, which accept that what can realistically be attained in resource-constrained environments can still serve as life-saving aspirational goals.20,21 The aspiration was to develop a system that balances scientific integrity with the need to make recommendations for prevention programming and policies for an understudied and underserved population.
The HASTE system proposed in this article is specific to HIV interventions among MSM and focuses on the triangulation of three main characteristics: efficacy data, implementation data, and biological and public health plausibility, although it may be adaptable to other public health areas and disciplines. The inclusion of a review of efficacy data is a common denominator across all systems evaluating levels of evidence. However, the response to preventing HIV infection has differed compared with most other clinical conditions in that implementation of these preventive measures is generally managed by civil society and not-for-profit nongovernmental institutions rather than health-care facilities. Although RCT data from interventions are often unattainable, these interventions constitute a predominant majority of HIV prevention services in LMIC. Thus, our group considered capturing these implementation data that are crucial in informing the optimal package of services for MSM.
Hill's criteria for causality remain as the most relevant set of determinants of whether a risk factor causes disease or an intervention causes prevention or mitigation of a disease. One of these criteria is plausibility, and our group considered this criterion vital, similar to others before us.13 However, in considering the public health plausibility of an intervention, we assessed whether a preventive intervention with limited demonstrated efficacy in preventing HIV as a biological endpoint was within the causal pathway of another intervention that does have demonstrable effectiveness in preventing HIV infection. We will use the example of the role of voluntary counseling and testing to illustrate the importance of public health plausibility for the HASTE system (Figure 1).
To review the evidence for effective interventions to prevent HIV among MSM, we used a process first proposed by the Centers for Disease Control and Prevention's (CDC's) Community Preventive Services Task Force when developing an evidence-based guide to public health interventions.10 This process included forming a multidisciplinary team, developing an approach to assessing the evidence for interventions, selecting which interventions would be included in the review, developing systematic search strategies for each intervention, implementing these search strategies, assessing and summarizing the quality of evidence using a standardized abstraction tool, and then translating this evidence into a set of recommendations for including interventions into a combination HIV prevention/interventions package. Finally, through the global consultation for implementation data describing interventions supporting the needs of MSM in LMIC, we considered information and data outside the realm of effectiveness, and have identified and summarized research gaps (Figure 2). Two reviewers independently abstracted data from the identified reports using a standardized abstraction tool. Conclusions were presented to a multidisciplinary group for review before final adoption of the recommendation.
We used a scoping review as defined by Arksey and O'Malley to determine individual components of the package of preventive services for MSM.22 In brief, the scoping methodology is focused on mapping literature relevant to HIV preventive interventions for MSM without assessing the quality of included studies. These reviews included the development of a specific search protocol with selection criteria, but did not include tracking the numbers of included studies. We used sensitive search terms to facilitate the development of an overview of the state of prevention sciences to help facilitate decisions on which prevention methods were to be further assessed. Once we defined the potential components of the package of services, we completed an umbrella review of systematic reviews if systematic reviews of the efficacy of individual components had been previously completed, such as assessing the efficacy of behavioral interventions to decrease HIV risk among MSM.23 However, each individual article used in the systematic reviews was collected to assess specific inclusion and exclusion criteria.
We completed novel systematic reviews with defined search protocols when previous systematic reviews were not harnessed on individual components of HIV prevention for MSM. When quantitative outcomes were available, such as changes in self-reported unprotected anal intercourse secondary to targeted interventions, we completed a meta-analysis. If meta-analysis was not possible, either because there was only one study or there was significant intra-study variation in design, relevant data were abstracted with the aforementioned tool, conclusions on efficacy were made by three different investigators, and results were compared until consensus was reached.
We used several different methods to harness implementation data.24 The majority of small implementers with limited scope do not have websites to post these data, but they do produce digital reports of their programs for their funders. As such, an electronic global consultation was completed in October 2009. Letters requesting information on epidemiology, rights contexts, and programming for MSM were sent out using dedicated listservs in Asia, Africa, Latin America and the Caribbean, and Eastern Europe. These letters were also sent out by key funders of related initiatives including the American Foundation for AIDS Research (known as amfAR) with its MSM Initiative to its grantees, the MSM Global Forum for HIV, and key United Nations (UN) agencies including UN Development Programme (UNDP) and UNAIDS. In addition, we contacted key informants in 28 countries requesting information specific to their country. In total, we retrieved implementation data from 68 LMIC pertaining to MSM. To attain implementation data from larger implementers, we searched Google and Bing using similar keywords to what was used for the systematic searches. In addition, we individually searched the websites of large international HIV prevention implementers known to provide services for MARPS for further documentation of programs and outcomes.
Separate from this search, a global consultation was held after the umbrella review, separate systematic reviews, and further review of electronic documents describing implementation data. This consultation included representation from several UN agencies including the UNDP, UNAIDS, World Health Organization, and the UN Populations Fund, in addition to key informants from 15 countries where a member-checking and results-validation session was completed. We presented the package of services using the framework of biomedical, behavioral, and structural interventions along with the evaluation approach. As this consultation was held early in the process of developing this set of recommendations, the feedback for the process and content of this package facilitated appropriate changes being made to the package of recommendations and evaluation system. One of the key changes was the need to review the evidence supporting conversion or reparative therapies, whose focus is to try and change sexual orientation, as these interventions form the base for many interventions targeting MSM in LMIC with a focus on regions in the Middle East and North Africa.25
While there is no common evaluation framework used by HIV implementers large or small, our group used the Reach, Efficacy, Adoption, Implementation, and Maintenance (RE-AIM) framework, which has five key components in assessing programmatic reach; efficacy or effectiveness; and measures of adoption, implementation, and maintenance to assess the relative importance of key interventions.26 The RE-AIM framework is well established for non-randomized processes and program evaluations.27–29
Reach functions as a measure of program scale and can be measured in several ways, including the number or percentage of the target audience being reached. Because the denominator or true population size of MSM is rarely known, programs describe the absolute number of people reached as the most common characterization of scale. Generalizability of the population reached in the particular program is also included under reach.
Efficacy or effectiveness is measured by assessing the changes in a set of appropriate outcomes, the impact on the quality of life of participants, and any potential adverse effects of the program. Adoption is measured by assessing the proportion of those targeted by the intervention who actually participated in the intervention. It can also be considered as a measure of acceptability of this program with impact on ultimate program uptake. Implementation-related issues include a feasibility assessment of the program with key components, including a proprietary assessment, program evaluability, resource needs for the program (e.g., fixed and marginal costs), and legality. The assessment of program implementation also evaluates whether the program was consistently delivered episodically or in different geographic regions.
Finally, maintenance of the program represents sustainability of the programmatic outcomes generally assessed with cutoffs of either six months or one year. The non-randomized study or program data abstraction tool focused on extracting quantitative and qualitative data from programmatic studies under each of the realms of RE-AIM. While the abstraction tool and framework for evaluation included the potential for assessing all components of RE-AIM, most programmatic descriptions did not report each of these elements.
We applied the biological plausibility criterion to assess biomedical interventions using current levels of knowledge of biological causal pathways. We assessed public health plausibility using conceptual frameworks of HIV risk among MSM developed by members of the team akin to the analytic frameworks used by the Guide to Community Preventive Services.30
The aforementioned global consultation meeting served as an interim validation and review of the recommendations suggested in the package of services. A separate process for ensuring the appropriateness, akin to the GRADE values and preferences process, of the conclusions included peer review by HIV/AIDS prevention experts not connected with the study.6 The external reviewers included both academic experts in HIV/AIDS epidemiology and targeted interventions for MSM, as well as experts in implementation of interventions for MSM in high-, middle-, and low-income settings.
The HASTE process for HIV/AIDS interventions was modified from several tools used to evaluate public health interventions including ones developed by -USPSTF, CTFPHC, and GRADE to evaluate the quality of evidence of HIV interventions for MSM. The HASTE system gives the highest weight of efficacy evidence to RCTs, systematic reviews, and meta-analyses of RCT studies where attainable. The differentiating factors for this system are that other experimental studies, such as non-randomized controlled studies and pre- and post-assessments of interventions, were also included in RCTs pending an assessment using the RE-AIM framework.
This approach generated four grades of evidence for HASTE—Strong (Grade 1), Conditional (Grade 2), Insufficient (Grade 3), and Inappropriate (Grade 4)—depending on the amount of efficacy trial and implementation data available for a given intervention (Figure 3). Grade 1 (i.e., strong) recommendations are given when there are available efficacy or implementation data that the benefit of an intervention clearly outweighs the potential risks, or there are data that the intervention addresses a known causal risk factor of HIV risk among MSM using the approach of assessing plausibility.
Grade 2 (i.e., conditional) recommendations are given when there is potential efficacy for an intervention, but further research or implementation data are required to make a strong recommendation. Conditional recommendations may be given when there is not a large body of implementation data and limited quality of experimental research evaluating a particular intervention. Specifically, for Grade 2a (i.e., probable) interventions, there may be limited or no efficacy data, but there are plausibility and consistent implementation data highlighting the importance of this intervention. For Grade 2b (i.e., possible) interventions, there are limited or inconsistent efficacy data, plausibility, but only limited implementation data from non-randomized studies or programmatic data. The most common difference between interventions receiving probable or possible recommendations is the amount of non-randomized data available for assessment using the RE-AIM framework. In addition, for some interventions such as circumcision, Grade 2b recommendations were given if the intervention had the potential to be beneficial for a small subset of MSM, such as those who have high insertivity ratios and multiple female partners.31 Separately, for some interventions, Grade 2c (i.e., pending) recommendations are given when there are ongoing efficacy trials among MSM.
Grade 3 (i.e., insufficient) is applied when there is inconsistent evidence for the efficacy of a particular intervention, the causal pathway is unclear or not well defined, and there are limited non-randomized study or programmatic data about a particular intervention. Grade 4 (i.e., inappropriate) is given when there is evidence of no efficacy or effectiveness of an intervention, where there are efficacy or consistent implementation data suggesting harm, where risks outweigh potential benefits, or where there is consensus from implementation data that this is an inappropriate intervention.
We used the HASTE system to evaluate the evidence for individual HIV-related interventions for MSM in LMIC. The results of this process are presented in Figure 4, although the complete information and rationale for selection are available in the World Bank report by Beyrer et al.32 Figure 4 also presents an analysis of which interventions are supported by RCTs in both high-income countries and LMIC, as defined by the World Bank Atlas Method.33 The only interventions evaluated using RCTs in LMIC are individualized risk-reduction counseling, brief client-focused counseling, community-level interventions, and preexposure prophylaxis. Of these interventions, only a single HIV intervention, preexposure prophylaxis, has been evaluated as an RCT among MSM in a generalized HIV epidemic. Thus, these are the only interventions that would be awarded four points as a starting point based on type of evidence under the GRADE system, whereas all observational studies would be given a starting value of two. Moreover, only the oral chemoprophylaxis study included an objective outcome of HIV seroincidence, involved blinding, and had high retention rates; thus, the others would lose between two and three points based on quality of evidence. Directedness of generalizability of all studies but oral chemoprophylaxis is limited given that it was the only one with efficacy completed in an LMIC. Finally, effect size as graded by having a measure of association greater than two or five would also be a limitation, as no HIV intervention has been shown to decrease HIV incidence with this magnitude. With this scoring framework, where a numerical value of four or higher is associated with a high strength of recommendation—three being associated with medium, two being associated with low, and one or lower being associated with a very low-strength recommendation—only oral chemoprophylaxis would receive a strong recommendation for MSM in LMIC.
Several differences emerge in the strength of recommendation with the use of the HASTE system compared with a classical RCT-driven evaluation system. These differences were strong recommendations for voluntary counseling and testing and for structural interventions. An example of the abbreviated HASTE analysis for voluntary counseling and testing is shown in Figure 1. Structural-level risk interventions among MSM have not been evaluated with an RCT, partly because of the complexity of the study designs required to characterize the efficacy and effectiveness of these interventions, and partly due to logistical and cost considerations. However, there are convincing public health plausibility and implementation data that highlight the importance of these interventions to facilitate the development and delivery of HIV services for MSM in LMIC.32
Among MSM in LMIC, there is a confluence of high HIV-related risk practices at the individual level with structural barriers to HIV preventive services including stigma, discrimination, and criminalization. Drivers of HIV risk among MSM occur at multiple levels including the individual, social, and sexual networks; community; and public policy realms. Together, these factors have resulted in a disproportionate burden of HIV disease among MSM wherever studied, including generalized epidemics.18 The imperative to mount evidence-based and comprehensive responses to this disproportionate risk is urgent, as more than three decades into the epidemic most LMIC have no dedicated services for MSM. The initial systems for evaluating evidence to define recommendations, including the GRADE system, were initially designed for individual-level clinical interventions. Moreover, the available systems in the literature evaluating public health interventions did not include detailed assessments of implementation data, which comprise a large component of the evidence base in the field of HIV/AIDS prevention. There is significant variability in organizations focused on implementing HIV prevention service delivery, ranging from small community-based organizations to large not-for-profit institutions active in a number of countries spanning multiple continents. Many of the smaller community-based organizations serving the needs of MARPS tend to be led by members of these same populations that recognize the vulnerable state and disproportionate burden of HIV risk and disease among their peers.
Given limited resources, human and financial resources are focused on service delivery rather than sophisticated evaluation strategies. Larger HIV prevention service implementers often evaluate interventions using process indicators of HIV prevention rather than the measurement of biological endpoints. Moreover, HIV implementers rarely submit the results of their programs for publication in peer-reviewed journals. This publication dearth is likely because smaller-scale implementers lack the inherent technical capacity to do so and because larger implementers do not allocate resources to this activity. The HASTE system presented in this article was developed to be able to comprehensively assess all forms of evidence relating to HIV prevention for MARPS and was not developed as a tool to evaluate evidence relating to all public health interventions. Rather, this tool was used to develop recommendations for combination HIV preventive interventions for MSM in LMIC. The full HASTE analysis supporting Figure 4 is included in the World Bank report, “The Global HIV Epidemics Among Men Who Have Sex with Men.”32 Figure 4 summarizes the interventions assessed, demonstrating that few of these interventions have been tested with RCT, although many are considered the standard of care rendering RCT level of evidence unattainable. Even fewer of these interventions have been tested in LMIC, where the risk environment is often heightened because of stigma and discrimination. Consequently, if RCT-level evidence is held as the standard by which one can make strong assertions of evidence efficacy, there would be few interventions that would receive a strong recommendation. As such, we believe that the HASTE tool can also be used to define an appropriate package of HIV services for other vulnerable populations including sex workers, IDUs, victims of gender-based violence, and internally displaced or other migrant populations.
There were clear limitations to the HASTE system presented in this article. For one, there is no common evaluation framework for HIV/AIDS implementers, and the evaluation of these programs tends to be specific to the identified needs of the funders focusing on process rather than outcome indicators. While we used the RE-AIM system that has been previously used to evaluate health sector programming, a standardized system of evaluating and reporting non-randomized research studies was suggested by CDC and termed Transparent Reporting of Evaluations with Non-Randomized Designs (TREND).34 However, this system has not yet been widely adopted by HIV implementing organizations. Standardizing the evaluation and reporting of HIV interventions would likely increase the availability of evidence in the peer-reviewed literature. In addition, when including implementation data from a series of organizations, there is inconsistency in the quality of these data, given that the primary purpose of these programs is to provide services rather than to conduct research.
To compensate for these limitations, the HASTE system combines implementation data with efficacy data and plausibility analysis. Observational studies and analysis of implementation data do not allow for analysis at the level of the individual, thereby making interpretation subject to ecological fallacy. Additionally, reproducibility of the HASTE system for MSM in LMIC may be difficult, as it involves retrieving implementation data that are not all indexed in the public domain. Significant effort would be required to retrieve these data. Finally, the HASTE system purposefully does not assign a numerical score to different components of the assessment, which may be perceived as being less objective than score-based systems such as GRADE. The development team decided that the assignment of abstract numerical values to evidence attributes also represents a subjective judgment assessment. However, we believe that despite these limitations, the system presented represents a tool that balances scientific integrity and practicality in assessing the quality of evidence of HIV/AIDS preventive interventions targeting MARPS for HIV.
In the context of a slowing HIV pandemic, HIV incidence and prevalence continue to increase among MARPS. HIV interventions for these populations have been inconsistently implemented, which is likely secondary to inadequate political motivation to address vulnerable populations, insufficient targeted funding, and the lack of a means to define an optimal package of services in resource-constrained settings. We have developed the HASTE system to define an appropriate package of HIV services including biomedical, behavioral, and structural approaches using the highest standard of attainable data. This system can be used to advocate for a package of HIV prevention, treatment, and care services for MARPS in low- and middle-income settings, allowing advocacy for the appropriate scale-up of these services in response to evidence-based need. article. This study relied on publicly available documents and was, therefore, exempt from Institutional Review Board determination.
This study was a component of a project funded by the Global AIDS Monitoring and Evaluation Team of the World Bank. The World Bank did not have any role in the study design, data collection and analysis, decision to publish, or preparation of the