|Home | About | Journals | Submit | Contact Us | Français|
Intimate partner violence (IPV) screening remains controversial. Major medical organizations mandate screening, whereas the U.S. Preventive Services Task Force (USPSTF) cautions that there is insufficient evidence to recommend for or against screening. An effective IPV screening program must include a screening tool with sound psychometric properties. A systematic review was conducted to summarize IPV screening tools tested in healthcare settings, providing a discussion of existing psychometric data and an assessment of study quality.
From the end of 2007 through 2008, three published literature databases were searched from their start through December 2007; this search was augmented with a bibliography search and expert consultation. Eligible studies included English-language publications describing the psychometric testing of an IPV screening tool in a healthcare setting. Study quality was judged using USPSTF criteria for diagnostic studies.
Of 210 potentially eligible studies, 33 met inclusion criteria. The most studied tools were the Hurt, Insult, Threaten, and Scream (HITS, sensitivity 30%–100%, specificity 86%–99%); the Woman Abuse Screening Tool (WAST, sensitivity 47%, specificity 96%); the Partner Violence Screen (PVS, sensitivity 35%–71%, specificity 80%–94%); and the Abuse Assessment Screen (AAS, sensitivity 93%–94%, specificity 55%–99%). Internal reliability (HITS, WAST); test–retest reliability (AAS); concurrent validity (HITS, WAST); discriminant validity (WAST); and predictive validity (PVS) were also assessed. Overall study quality was fair to good.
No single IPV screening tool had well-established psychometric properties. Even the most common tools were evaluated in only a small number of studies. Sensitivities and specificities varied widely within and between screening tools. Further testing and validation are critically needed.
Intimate partner violence (IPV) is a major public health problem associated with adverse health consequences for victims.1–3 Healthcare settings represent important sites for IPV screening and intervention. In 2004, however, the U.S. Preventive Services Task Force (USPSTF) concluded that there was “insufficient evidence to recommend for or against routine screening of women for IPV.”4 This recommendation reflects limited empirical data about the potential negative impacts of screening and about effective interventions that decrease IPV. Conducting rigorous research is critical to determine the potential negative impacts of screening and to establish effective interventions. In order to conduct this research, however, investigators need psychometrically sound IPV screening tools.
Clinicians also should be aware of the psychometric properties of empirically tested IPV screening tools. Despite the USPSTF recommendation, most major medical organizations (including the American Medical Association [AMA], the American Academy of Pediatrics [AAP], the American Academy of Family Physicians, the American College of Obstetricians and Gynecologists, and the American College of Emergency Physicians) recommend routine IPV screening as a part of standard patient care.5–8 With their recommendation for routine IPV screening, leaders of the AMA and the AAP acknowledged that the state of the art for measuring behavioral-health outcomes is relatively undeveloped, but they cautioned that waiting for empirical evidence of improved outcomes jeopardizes the health of millions of victims.8
Within the past 5 years, researchers have developed and tested a wide variety of IPV screening tools. Comprehensive reviews of IPV screening tools, however, are limited, and there has been no synthesis of the psychometric data from existing tools.9–11 In 2002, Fogarty et al.9 summarized IPV screening tools, based on a search of studies published between 1966 and 2001; much of the extant research was published subsequent to their review. Additionally, the CDC recently conducted a systematic review and published a compilation of IPV screening instruments for healthcare providers.12 The CDC publication included a table of published and unpublished screening tools, and it contained the instruments themselves. Neither the reviews to date, nor the CDC publication, however, discussed the strength of the published psychometric data or evaluated study quality. Therefore, the current review was designed to accomplish these objectives through systematically summarizing IPV screening tools tested in healthcare settings.
For the current review, IPV was defined as physical, sexual, or emotional abuse or battering (including fear and coercive control) between intimate partners. For inclusion, studies had to (1) determine the psychometric properties of IPV screening questions; (2) test the IPV screening tool in a medical setting such as internal medicine, family practice, obstetrics–gynecology, the emergency department, or pediatrics; (3) be written in English, and (4) be published in a peer-reviewed journal. The IPV screening questions could be part of a larger screening questionnaire provided that the authors tested and reported the psychometric properties of the IPV questions specifically.
Studies focusing on the following subjects were excluded: (1) elder abuse or child abuse; (2) IPV perpetration; (3) assessment of different screening methods (such as verbal versus written); (4) IPV prevalence; and (5) IPV severity or frequency using longer, established tools intended for research (including the Conflict Tactics Scale [CTS], the Index of Spousal Abuse [ISA], the Composite Abuse Scale [CAS], and the Abuse Behavior Inventory [ABI]).
Three published literature databases (MEDLINE via PubMed, CINAHL Plus, and PsycINFO) were searched from their start through December 2007. The following search terms were used: domestic violence or intimate partner violence or spouse abuse or battered women and questionnaires or measure or instrument or screening. The names of identified screening questionnaires (such as the Abuse Assessment Screen [AAS]) also were used as search terms. The reference sections of all included studies and related review articles were searched for potentially relevant articles.
Data extraction and synthesis were conducted from the end of 2007 through 2008. The initial literature search yielded a total of 2420 articles in PubMed, 1218 articles in CINAHL Plus, and 868 articles in PsycINFO. Eight additional articles were located through the IPV screening tool name-based searches. Titles of articles were reviewed to screen for eligibility and duplication among online databases. Because the initial search was purposefully broad, many titles reflected studies that were not relevant. Abstracts of the articles were examined if eligibility was not evident from the title alone.
After completing the initial screen for eligible articles and eliminating duplicates across databases, 210 potentially eligible articles remained. These articles were then abstracted using a pre-specified form to record relevant study content and to determine whether the study met inclusion criteria. Final review narrowed the initial set of 210 articles down to 33 articles13–45 that met all the inclusion criteria. Reasons for exclusion are detailed in Figure 1.
The quality of each of the remaining 33 articles was evaluated based on a 14-point scale developed for this systematic review. Items on the quality scale were derived from standards used by the USPSTF for diagnostic studies and from previously published work46,47 evaluating the quality of observational studies. Specifically, the following USPSTF criteria for evaluating the internal validity of diagnostic accuracy studies were applied: credible reference standard (CTS, ISA, CAS, ABI) performed regardless of screening test results; spectrum of IPV risk for participants; and sample size. Three additional factors also were considered: (1) external validity/generalizability (including number of study sites, and provision of demographic and SES data); (2) study description of consenting versus nonconsenting patients; and (3) appropriate description and conduct of statistics. Inter-reviewer agreement was high overall (Pearson correlation r =0.77). Papers with scores of 13–14 were considered excellent, 10–12 good, 7–9 fair, and ≤6 poor.
The most studied IPV screening tools were the Hurt, Insult, Threaten, and Scream (HITS),13–15,24,43 the Woman Abuse Screening Tool/Woman Abuse Screening Tool-Short Form (WAST/WAST-SF),15–17,25,26,44 the Partner Violence Screen (PVS),22–26,44 and the AAS.30,35–37 These screening instruments are summarized in Table 1, which includes the specific questions and scoring for each screening tool, demographics of the populations on whom the screening tool has been tested, and a summary of the screening tools’ psychometric properties.
Initial development and testing of the four-item HITS involved family physicians and family practice offices, although the screening tool since has been evaluated in diverse outpatient settings. Two24,43 of the five studies13–15,24,43 investigating the psychometric properties of the HITS enrolled men, and one investigated a Spanish-language version.13 Four studies13,14,24,43 tested the sensitivity and specificity of the HITS. The range of sensitivities varied widely depending on population, with sensitivities lower in men than women. Internal reliability and concurrent validity also were tested and found to be acceptable.13–15,43
Like the HITS, the eight-item WAST was originally developed for family physicians, but subsequently it has been tested in the emergency department. The WAST has been evaluated in Spanish-speaking patients.17 A two-item short-form version uses the first two questions, which ask general relationship questions as opposed to specific questions about violence. Only one study tested the sensitivity and specificity of the eight-item WAST;44 two studies tested the WAST-SF in combination with other screens and/or physical signs;25,26 and one study compared the eight-item version to the short form.17 Two studies16,17 found that the WAST has good internal reliability. One study16 documented acceptable concurrent validity, and one study17 found that the WAST differentiated abused and non-abused women.
The three-item PVS was developed as a brief instrument for the emergency department. The authors conducted the primary development and testing of the tool exclusively with women, although Mills et al.24 later tested the instrument with men. Three studies22,24,44 assessed the sensitivity and specificity of the PVS, reporting a wide range of sensitivities. Two additional studies25,26 examined the sensitivity and specificity of an “augmented” PVS. Houry et al.23 established the predictive validity of the PVS plus three additional questions. The authors found that women positive for IPV on the initial augmented PVS were 11 times more likely to report having experienced physical abuse at a 4-month follow-up assessment than women who were negative on the initial screen.
The five-item AAS was created to detect abuse perpetrated against pregnant women. The screening tool has been tested predominantly with young, poor women. Two36,37 of four studies30,35–37 evaluating the AAS enrolled women in countries other than the U.S. (Brazil and Sri Lanka). Two studies30,37 calculated the sensitivity and specificity of the complete AAS; a third36 evaluated the sensitivity and specificity of the pregnancy question only. Test retest reliability was acceptable in one study.37
See Appendix A, available online at www.ajpm-online.net, for a summary of the content and quality of the 33 included studies.13–45 The 33 articles evaluated a total of 21 IPV screening tools. This number reflects the fact that some sets of IPV screening questions were tested in multiple papers. For example, five papers studied the psychometric properties of the HITS.13–15,24,43
The majority of studies were categorized as either fair (15) or good (14). Two studies were rated as excellent, and two were rated as poor.
Of the 21 IPV screening tools, 16 made an assessment of physical violence and five did not (Women’s Experiences with Battering [WEB]18; one-item screening tool by Peralta et al20; SAFE-T 31; two-item screening tool by Webster et al.39; and five-item screening tool by Zink et al.42). Seventy-one percent (15/21) of screening tools assessed threats or fear. Only approximately half (11/21) asked respondents about emotional abuse. Finally, just one third (7/211) included items about sexual abuse.
The time period about which screening tools inquired ranged from current to ever. For example, the Ongoing Violence Assessment Tool and the Ongoing Abuse Screen asked about abuse at the present time or presently, whereas the HITS asked about the past 12 months. Some screening tools, such as the WAST, asked patients if they have ever been abused.
Of the 21 sets of IPV questions, the mean number of items was 4.2 (range 1–11, SD=2.8), with only four (WEB, WAST, Partner Abuse Interview, and the PVS plus three additional questions of Houry et al.23) containing more than five questions.18,21,23 Four screening tools used a single item to screen for IPV.20,38,40,45 The single items performed inconsistently in their ability to identify IPV victims.
Two studies24,43 tested IPV screening tools with exclusively male populations. Shakil et al.43 determined that the HITS had acceptable sensitivity (88%) and specificity (97%) in men recruited from an ambulatory care clinic, an HIV clinic, and an emergency department. In contrast, Mills et al.24 found significantly lower sensitivities of the HITS (30%–46%) and the PVS (35%–46%) in a population of predominantly African-American men.
Authors of a 1968 WHO report, The Principles and Practice of Screening for Disease, commented that “in theory, screening is an admirable method of combating disease…in practice, there are snags.”48 The current review highlights a number of “snags” that preclude drawing definitive conclusions about the effectiveness of IPV screening tools tested in healthcare settings. First, even the most common screening tools (the HITS, the WAST, the PVS, and the AAS) were evaluated in only a small number of studies (three to six) in healthcare settings. Consequently, all of the included IPV screening tools need additional reliability and validity testing. For example, test retest reliability of the HITS, the WAST, and the PVS has not been studied. No studies reported the internal reliability of the PVS. One study documented the discriminant validity of the WAST, but further validation in other populations would be helpful.
Second, there is a lack of consensus about the most appropriate comparison measure for testing the sensitivity and specificity of IPV screening tools. Traditionally, sensitivity and specificity are determined by comparing a screening test to a gold standard. Because of the complexity of IPV, no gold standard exists, and decisions about the most appropriate comparison measure are conceptually difficult. However, the lack of consensus about the most appropriate comparison measure limits synthesizing data across multiple studies and determining the value of any one IPV screening tool.
Finally, in part because of the variability in comparison measures, each of the four screening tools tested in three or more papers (the HITS, the WAST, the PVS, and the AAS) had sensitivities and specificities that varied widely. For example, the sensitivities of the PVS ranged from 35% to 71%. A reported sensitivity of 35% is concerning because most screening tests maximize sensitivity to avoid missing affected patients; maximum sensitivity should be the goal for IPV screening tools also.
In addition to having sound psychometric properties, IPV screening tools used in healthcare settings ideally should be brief, comprehensive, and tested in diverse populations. Of the most studied IPV screening tools, the three-item PVS is the shortest, and the eight-item WAST is longest. The HITS has a scoring system that may take several minutes to calculate. Thus, the HITS and the WAST may be difficult to implement in a busy clinical practice.
Individual providers must determine the optimal balance between brevity and comprehensiveness. Inquiring about different forms of abuse may be important for a number of reasons. First, emotional abuse often precedes physical abuse, so detection of emotional abuse allows for early intervention.50 Second, sexually abused women are at higher risk for adverse health outcomes than physically or emotionally abused women.50 Finally, some abusive relationships involve only threats and coercive control tactics.18
The WAST and the AAS conceptualized IPV most broadly, including physical, emotional, and sexual violence as well as threats/fear. The AAS, however, was the only screening tool that asked specifically about abuse during pregnancy and therefore potentially represents an important screening tool for obstetric populations. The HITS included questions about physical abuse, emotional abuse, and threats, but excluded sexual abuse. The PVS used a narrower underlying definition of IPV, asking only about physical violence and safety.
Two papers24,43 tested the PVS and/or the HITS exclusively on men. Recent literature documents that rates of female-perpetrated violence are high, and the screening of men for victimization has increased.51,52 It is unclear whether IPV screening tools, such as the PVS, that were originally designed to screen women are the most appropriate tools for men. The etiology of violence may be different in situations in which women are violent.53 If this is the case, then screening questions likewise may need to be adjusted. Also, given social desirability bias, male patients may respond to brief IPV screening questions differently than female patients. Continued study in this area is clearly warranted.
The findings of this review should be interpreted in light of several limitations. First, despite attempts to conduct a systematic search, it is possible that relevant papers were missed. Searching multiple databases and bibliographies and seeking expert opinion likely minimized exclusion of eligible papers.
Second, determining paper eligibility and assessing study quality are inherently subject to bias. In order to address this potential bias, eligibility and study quality were determined independently by two reviewers, and disagreements were handled through consensus with a third reviewer. Third, IPV screening tools tested in mental health settings were excluded because these settings were felt to be qualitatively different from other healthcare settings. Separate reviews of IPV screening tools used in mental health settings would be helpful.
Intimate partner violence is a prevalent public health problem requiring urgent attention from researchers and clinicians. Both clinical practice and research are hindered by the lack of comprehensive evaluation of the psychometric properties of existing IPV screening tools. Many of the current screening tools are promising, but further testing and validation in diverse populations using a universally accepted comparison measure is critically needed.
Dr. Megan Bair-Merritt had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. She is funded by a Career Development Award (K23HD057180) sponsored by the National Institute of Child Health and Human Development.
No financial disclosures were reported by the authors of this paper.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.