The diagnosis of bipolar disorder is based on a review of symptoms and potential medical explanations for those symptoms, as there is no biological marker for the disorder. In clinical practice, symptoms are frequently reviewed in an unstructured manner. It should be noted, though, that when practitioners do not use structured diagnostic tools, as many as half of comorbid conditions go undetected (
Zimmerman & Mattia, 1999). Furthermore, many practitioners report that they do not routinely screen for bipolar disorder even among people with a history of major depression, many of whom would meet the diagnostic criteria for bipolar disorder (
Brickman, LoPicollo, & Johnson, 2002). Due to informal or poor screening, the average time between onset of symptoms and formal diagnosis is more than seven years (
Lish, Dime-Meenan, Whybrow, Price, & Hirschfeld, 1994;
Mantere, Suiminen, Leppamaki, Arvilommi, & Isometsa, 2004). Improper diagnosis has serious repercussions because antidepressant treatment without mood-stabilizing medication can trigger iatrogenic mania (
Ghaemi et al., 2001).
Several semistructured interviews have been developed to assess bipolar disorder in adults. The two most commonly used measures are the Structured Clinical Interview for
DSM-IV (SCID) and the Schedule for Affective Disorders and Schizophrenia (SADS). We will not focus here on the Composite Interview Diagnostic Interview (CIDI;
Robbins et al., 1988), which has been developed and used mostly in epidemiological surveys (e.g.,
Kessler & Zhao, 1999). Briefly, there is some evidence that the CIDI may systematically underdiagnose bipolar disorder (e.g.,
Kessler, Rubinow, Holmes, Abelson, & Zhao, 1997), but more recent work has since validated it against the SCID (
Kessler et al., 2006). The SCID and the SADS both provide interview probes, symptom thresholds, and information about exclusion criteria (i.e., medical or pharmacological conditions that may induce mania). They differ, however, in the criteria they were designed to assess. The SCID is designed to help assess diagnoses according to the
DSM-IV, whereas the SADS is designed to assess diagnoses according to the Research Diagnostic Criteria (RDC). RDC criteria are stricter in that psychotic symptoms are more likely to yield a diagnosis of schizoaffective disorder than would be applied in the
DSM-IV criteria; within the
DSM-IV criteria, psychotic symptoms must be present for at least two weeks outside of episode to be considered evidence of schizoaffective disorder. Further details about these measures are provided next. We begin by describing the measures and their psychometric characteristics for assessing bipolar I disorder. We then turn toward some specific issues that complicate the assessment of milder forms of bipolar disorder. summarizes some of the well-supported measures for the diagnosis of bipolar disorder.
| Table 1Summary of Validated Bipolar Disorder Assessment Tools for Diagnosis |
The SCID (
Spitzer, Williams, Gibbon, & First, 1992) is recommended as a routine part of clinical intake procedures. The SCID is a semistructured interview that is divided into modules to cover different diagnoses. The modular design allows for the interview to be easily tailored to capture relevant diagnoses for a given research or clinical situation. Each SCID module contains probes to cover each of the core symptoms, and interviewers can use clinical judgment in gathering supplemental information if probes do not provide sufficient information for reliable symptom assessment. A clinician’s version is available through American Psychiatric Publishing (
First, Spitzer, Gibbon, & Williams, 1997). The SCID, and more specifically its bipolar disorder module, demonstrated good interrater reliability both in a large international multisite trial (
Williams et al., 1992) and in at least 10 other major trials (
Rogers, Jackson, & Cashel, 2001). In patient samples, reliability for current and lifetime diagnoses of bipolar disorder has been adequate to excellent, ranging from .64 to .92; establishing reliability for the SCID in community samples is more difficult due to low base rates of the disorder (
Williams et al., 1992). Compared to other structured interviews including the Diagnostic Interview Schedule (DIS) and the Composite International Diagnostic Interview (CIDI), and to clinicians not using a structured interview, diagnoses of bipolar disorder based on the SCID appear substantially more reliable. Results of one study indicated that the percentage of agreements with the gold standard were higher for the SCID as compared to standard clinician interviews (
Basco et al., 2000). In a sample of twins, diagnoses of bipolar disorder made using the SCID showed similar concordance rates between monozygotic and dizygotic twins compared to traditional twin studies using standard diagnostic interviews (
Kieseppä, et al., 2004).
The SADS (
Endicott & Spitzer, 1978) was designed to assess a broad range of Axis I diagnoses. For each diagnosis, the probes focus on the symptoms for the most recent episode and then capture a broad overview of past episodes. The reliability and validity of the SADS has been established across 21 studies (see
Rogers, Jackson, & Cashel, 2001, for a review). The SADS has demonstrated good to excellent reliability for both symptoms and diagnoses (
Andreasen et al., 1981). Specifically, mania diagnoses have achieved good interrater reliability and achieved good test–retest reliability over 5 to 10 years among adults (
Coryell et al., 1995;
Rice et al., 1986). SADS diagnoses of bipolar disorder correlate robustly with other measures of mania (
Secunda et al., 1985), and the SADS appears to validly capture diagnoses across different cultural and ethnic groups within the United States (
Vernon & Roberts, 1982).
Diagnostic Assessment of Bipolar II Disorder in Adults
Hypomania is unique among DSM syndromes, in that by definition it does not cause any functional impairment. Perhaps because of this quality, the presence of at least one major depressive episode is also required to achieve a diagnosis of bipolar II disorder. This presents a unique diagnostic challenge: the hypomanic episodes that separate bipolar II disorder from unipolar depression are by definition of only limited severity, making this a hard diagnosis to reliably detect. Complicating this picture is the fact that there are important disagreements in the field regarding the best criteria for hypomanic episodes. For instance, current DSM criteria require three or four symptoms, in addition to elevated or irritable mood, lasting at least four days. In contrast, RDC criteria only require three symptoms lasting two days. Given this uncertainty and relative lack of severity of hypomania, it is not surprising that the accurate assessment of bipolar II disorder is more difficult to achieve than bipolar I disorder.
Given that hypomania is almost always accompanied by less distress than depressive episodes, one might be tempted to focus on detecting depression. There is evidence, however, that the diagnosis of hypomania (and hence, bipolar II disorder) is important above and beyond the detection of depression. Diagnoses of bipolar II disorder are accompanied by increased mood lability (
Akiskal et al., 1995) and a family history of bipolar II disorder (
Rice et al., 1986). In addition, at least three studies have demonstrated that people with bipolar II disorder are at a higher risk for suicide than are those with bipolar I disorder or unipolar depression (
Dunner, 1996). It is possible that the low mood of depression, combined with the impulsivity of hypomania, may be especially likely to lead to suicide attempts. In addition to suicide risk, the misdiagnosis of bipolar II disorder can have harmful pharmacological implications. The prescription of antidepressants, which is likely if bipolar II disorder is misdiagnosed as unipolar depression, may cause or exacerbate manic symptoms (
Ghaemi et al., 2001). Thus, identification of bipolar II disorder may be pivotal in administering effective treatments.
The above-described difficulties in assessing hypomanic symptoms have manifested in low reliability for the SADS in detecting bipolar II disorder (
Andreasen et al., 1981), even when interviewers rate the same tapes (
Keller et al., 1981). Some research groups have achieved better estimates, however (
Simpson et al., 2002;
Spitzer & Endicott, 1978). Beyond the inconsistent estimates of interrater reliability, test–retest reliability over six months to two years likewise has been low for bipolar II disorder and cyclothymic disorder alike (Andreasen et al.;
Rice et al., 1986). In one study, only 40% of participants with bipolar II disorder according to the SADS at baseline experienced any manic or hypomanic episodes over the ensuing 10 years (
Coryell et al., 1995). This lack of ability to accurately detect bipolar II disorder is not limited to the SADS. In one study, a SCID interview missed one third of bipolar II cases identified by expert clinical interview (
Dunner & Tay, 1993;
Simpson et al., 2002). In sum, the best available diagnostic interviews are limited in their psychometric characteristics for the diagnosis of bipolar II disorder.
These difficulties have led some researchers to suggest that interviews aimed at detecting bipolar II disorder should start with questions about behavioral activation and increases in goal-directed behaviors rather than mood (
Akiskal & Benazzi, 2005). Although promising, such approaches have not yet been fully validated.
In sum, a set of issues mars diagnosis of bipolar II disorder. Persons who meet criteria for bipolar II disorder may be at high risk for suicidality, and they may experience a worsening of manic symptoms if prescribed antidepressants. On the other hand, available tools do not detect bipolar II disorder reliably. Thus a major goal for ongoing research is to develop ways to reliably capture diagnoses of bipolar II disorder.
Self Report Measures
The most reliable and valid way to obtain a diagnosis of bipolar disorder is through a structured interview with a trained clinician (
Akiskal, 2002). Nonetheless, given the time commitment involved in conducting structured interviews, several self-report measures have been developed to help clinicians identify persons most likely to meet criteria for bipolar disorders. It should be emphasized that these measures do not provide diagnostic accuracy, but, rather, might help identify people who should warrant more careful diagnostic interviews.
The General Behavior Inventory (GBI) was designed to cover the core symptoms of bipolar disorder, including both depressive and manic symptoms (
Depue et al., 1981). Different versions range from 52 to 73 items (e.g.,
Depue et al., 1981;
Depue & Klein, 1988;
Mallon, Klein, Bornstein, & Slater, 1986). Items on each version assess symptom intensity, duration, and frequency on a scale ranging from 1 (“never or hardly ever”) to 4 (“very often or almost constantly”). Although the GBI has the most robust psychometric properties of the available self-report screeners, the multiple versions make generalizations regarding psychometric properties difficult.
The full 73-item version of the GBI has demonstrated excellent internal consistency and adequate test–retest reliability. It has demonstrated sensitivity to bipolar disorder of approximately 75% and specificity greater than 97% (
Depue & Klein, 1988;
Depue et al., 1989;
Klein, Dickstein, Taylor, & Harding, 1989;
Mallon et al., 1986) in clinical and nonclinical samples. Cutoff scores, however, have not been consistent across studies, further limiting the generalizability of the scale. At present, the GBI appears to be a useful screening tool for bipolar disorder, but future research to establish norms and cutoffs would increase its utility.
Another screening tool is the Mood Disorder Questionnaire (MDQ;
Hirschfeld et al., 2000). The first 13 items of the MDQ ask about the
DSM-IV manic symptoms using a yes–no format. To achieve a positive screen, seven items must be endorsed. Additional items assess if the identified symptoms co-occurred and caused at least moderate impairment. The MDQ has attained adequate internal consistency (
Hirschfeld et al., 2000;
Isometsä et al., 2003), fair one-month test–retest reliability, and fair sensitivity (.73 to .90) in distinguishing between bipolar and unipolar disorder in clinical samples (
Weber Rouget et al., 2005). In addition, at least one recent study has demonstrated that high MDQ scores are associated with greater impairment and suicidal ideation in a primary care setting (
Das et al., 2005). Nonetheless, specificity has been low in some studies (.47 to .90;
Hirschfeld et al., 2000,
2003;
Isometsä et al., 2003;
Miller et al., 2004;
Weber Rouget et al., 2005) and the sensitivity in a community sample was only .28 (
Hirschfeld et al., 2003).
A review of the content of MDQ items may help clarify why the scale has achieved better performance in inpatient settings than in community settings. Several of the items appear to capture common experiences in community samples. For example, in one study, as many as 90% of college students endorsed items such as “Have you ever had a time when you were not your usual self and you felt much more self-confident than usual?” (
Miller, Johnson, & Carver, 2008). These items may be less commonly endorsed by persons with schizophrenia and other severe psychopathology, explaining why the scale may appear more beneficial in an inpatient setting than in a community sampling. Hence, the MDQ may be a potentially useful tool in clinical settings to screen for bipolar disorder among those with severe psychopathology, but may be less helpful in community settings.
Other scales appear helpful in nonclinical samples, but do not have enough data regarding their usefulness as screening tools in clinical settings. The Hypomanic Personality Scale (HPS;
Eckblad & Chapman, 1986) predicted the development of manic episodes at 13-year follow-up in undergraduates (
Kwapil et al., 2000). To date, the HPS has only been studied in one clinical sample, achieving a positive predictive value of .82 and a negative predictive value of .67, and achieving a point-biserial correlation of .56 with bipolar I diagnosis (
Kwapil, 2008). The Bipolar Spectrum Diagnostic Scale (
Ghaemi et al., 2005) and the Mood Spectrum Self-Reports (
Dell’Osso et al., 2002) have only been examined in a single study each, and two Hypomania Checklists (
Angst et al., 2005;
Hantouche et al., 2006) have only been examined in Europe and China (e.g.,
Meyer et al., 2007;
Vieta et al., 2007). The Temperament Evaluation of Memphis, Pisa, Paris, and San Diego—Autoquestionnaire version (TEMPS-A;
Akiskal & Akiskal, 2005) is a measure of temperament rather than manic or hypomanic episodes per se. Although the four-factor structure that includes dysthymic, cyclothymic, hyperthymic, and irritable temperaments has been examined in several countries and languages and psychometrically validated in clinical populations, research has not directly established the usefulness of this measure as a screen for bipolar spectrum disorders (e.g.,
Akiskal et al., 2005;
Karam et al., 2007;
Kesebir et al., 2005;
Matsumoto et al., 2005;
Mendlowicz, Jean-Louis, Kelsoe, & Akiskal, 2005;
Sandor et al., 2006;
Vazquez et al., 2007). At least one study, however, has demonstrated that the cyclothymic subscale of the TEMPS-A can prospectively predict bipolar spectrum diagnoses among clinically depressed children and adolescents over a two-year period (
Kochman et al., 2005). Although initial studies indicate that these scales demonstrate good psychometric properties, more research is needed to determine their usefulness as screening measures.
Summary of Assessment Tools for Diagnosis
Overall, the SCID and the SADS are the most common means of diagnosing bipolar disorder in adults. With excellent psychometric characteristics for the assessment of bipolar I disorder, they fare less well in assessing bipolar II disorder. This may be due to issues related to the definition of hypomania.
As a diagnostic screening tool, the scale with the best support is the GBI, as it has consistently demonstrated sensitivity of approximately .75 and specificity above .97. Readers should be cautious, however, because multiple versions of the scale exist, and cutoffs for a positive screen have not been firmly established. The MDQ has been helpful in clinical populations, but suffers from poor discriminatory power in community settings. Other promising scales require more psychometric development. When using self-report scales as screening tools, several broader issues must be kept in mind. First, the usefulness of a screening tool will vary depending on the prevalence of a disorder in the population of interest (
Phelps & Ghaemi, 2006). Second, few studies provide direct comparisons of psychometric characteristics of the different measures. Third, there are several ways to report on a screener’s usefulness, including sensitivity and specificity, positive and negative predictive values, area under the curve, and point-biserial correlations with diagnosis (
Kraemer, 1992). Not all studies on the detection of bipolar disorder report all of these results, limiting the ability to compare studies or measures. Furthermore, sensitivity and specificity are commonly reported, but these indices may be dependent on sample characteristics. Fourth, authors have often modified the diagnostic interviews used as a reference standard to capture milder forms of bipolar spectrum disorder, yet limited information about these modifications is available. Each of these issues makes comparisons between measures complex.