|Home | About | Journals | Submit | Contact Us | Français|
The number of Spanish-speaking individuals and immigrants in the United States has risen dramatically and is projected to continue to rise. The availability of appropriately translated and validated measurement instruments, such as the Beck Depression Inventory, is a priority for researchers and clinicians in the U.S. and Mexico, where the first edition of the BDI is still prominently used. The purpose of this study was to pilot a Mexican adaptation of the BDI-II and report initial psychometric characteristics. Two samples were used: students from across Mexico and community adults from Mexico City. Results indicated that the translation was easily understood by most individuals, had adequate internal consistency, and a three-factor structure (negative attitude, performance difficulties, and somatic elements) had the best fit. Implications for use with Mexican-origin Spanish speakers are discussed.
El número de Hispano-hablantes en los Estados Unidos ha aumentado dramáticamente y va a seguir aumentando. El desarrollo de instrumentos de medición, como el inventario de depresión de Beck (IDB), debidamente traducido y validado, es una prioridad para los investigadores y los clínicos en los Estados Unidos. Esta es también una prioridad en México, como la primera edición del IDB es todavía prominentemente utilizada. El propósito de este estudio fue desarrollar una traducción mexicana del IDB-II e informar de las características psicométricas. Dos muestras fueron utilizados: estudiantes de todo México y adultos de la comunidad en el Distrito Federal. Los resultados indicaron que la traducción se entendió fácilmente por la mayoría de los individuos, tenía una consistencia interna adecuada, y una estructura de tres factores (actitud negativa, dificultades de rendimiento, y elementos somáticos) tuvo un mejor ajuste. Implicaciones para el uso con Hispano-hablantes de origen mexicano se discuten.
Hispanic individuals are an increasingly important segment of the United States population. In 2010, there were 50.5 million Hispanic individuals in the U.S. and were the largest growing ethnic group, accounting for more than half of the total U.S. population growth between 2000 and 2009, with an increase of 15.2 million individual.1 Sixty-three percent of Hispanic individuals in the U.S. are of Mexican descent, and 78.5% speak a language other than English at home.1-3 Furthermore, around 30% of the nation's immigrants—both legal and illegal—are from Mexico, totaling 12 million.4-5
The burden of illness for this significant population can be greater than that of other groups, even after controlling for variables such as socioeconomic status.6 For example, Hispanic individuals are more likely to have a current depressive episode than non-Hispanic whites (4.0% vs. 3.1%).7 Studies have shown 12 month prevalence rates of 8.6% for Hispanic individuals, although this varies widely depending on country of origin or generational status.6,8-9 Despite these rates of illness, there are disparities in mental health visits, treatment, and expenditure.6
Given the disproportionate burden of illness, it is important to have appropriate measurement to help assess and treat depression in this population. To address this, researchers have translated several measures of depression into Spanish, including the second edition of the Beck Depression Inventory (BDI-II). The BDI-II is one of the most commonly used measures of depression and has demonstrated strong psychometric properties in a variety of settings and populations.10-14 There are currently two published Spanish translations of the BDI-II: one by the Psychological Corporation and another by Jesús Sanz and colleagues in Spain.
The Psychological Corporation's translation was created by a international team of psychologists, with an aim to “eliminate cultural influences that may bias an individual's responses.”15 However, there was no formal publication of the translation's methodology, its psychometric properties, or normative data for Spanish-speaking populations. Subsequently, two studies have used this translation and provided psychometric properties for two samples: bilingual undergraduates and Spanish-speaking patients receiving hemodialysis for end-state renal disease.15-16
These two studies specifically assessed reliability, factorial validity, and possible language effects with subsamples of bilingual individuals. The authors reported high internal consistency in both samples (α = .91 and .92). They also reported adequate 1-week test-retest reliability (ICC = .86) in their undergraduate sample.16 They reported good fit with all the indices used in the undergraduate sample using Beck and colleagues’ undergraduate structure as a criterion.11 However, there was only adequate fit with three of five fit indices when using Arnau and colleagues’ medical structure as criterion.10 In both studies, only one factor structure was tested and there were no significant differences between English and Spanish versions in their sample of bilingual individuals.
A second translation of the BDI-II was created and validated by Sanz and colleagues at the Universidad Complutense de Madrid. Three Spanish psychologists translated the BDI-II and their versions were compared for discrepancies with one version being created and sent to a bilingual psychologist in the U.S. for back-translation into English. After resolving discrepancies and pilot testing, a final version was created with 25 items, including four from the BDI-IA that were removed from the BDI-II. Norms and psychometric data were provided for undergraduates, community sample of Madrid, and a clinical population.17-19 Internal consistency for the measure was strong in all three samples (αs = .87 - .89). The authors reported a single factor solution using principal components analysis (PCA) for all three samples. They also reported that the translation had adequate criterion validity in their undergraduate and clinical samples using diagnoses of depression based on interviews.17-18 Convergent and discriminant validity were also demonstrated in the clinical sample using Spanish translations of the MCMI-II depression scale and STAI trait anxiety scale.
In addition to these BDI-II translations, the first edition of the BDI has previously been translated for use in Mexico.20 It was translated into Spanish and reviewed by 10 experts to create a consensus version, which was piloted before being administered to student, nonclinical, and clinical samples in Mexico City. The study established adequate internal consistency, convergent validity, proposed a three-factor structure based on exploratory factor analysis.
Despite the reviewed literature, there are issues that need to be addressed before the BDI-II is used with confidence in the U.S. and Mexico. First, the U.S. translation does not have normative data reported, the studies using this translation were with U.S. Undergraduates and hemodialysis patients, and the Spanish translation used Spanish samples. It is important to extend the research base to individuals in Mexico and individuals from the community to see if it is appropriate for use in Mexico and Mexican-origin Spanish-speakers in the U.S., including immigrants.
Second, in personal communication with Penley and colleagues, one of the translators for the U.S. translation noted that a multinational team was used to translate with the specific goal of reducing cultural influence.15 However, the experience of depression is nested within culture and context. For example, regions have different dialects that express the same concept in different words. A growing body of literature has demonstrated that there are differences between Spanish-speaking regions that continue to change based on immigration status and socioeconomic status.21-25 Common differences between regional Spanish include combining English and Spanish words, phonological variants, use of subject personal pronouns, and differences in future tense.21,26 Although differences in written and spoken language between regions do not necessarily equate with “complete misunderstandings,” it may lead to misunderstandings that influence the carefully validated psychometric characteristics of measures, the most important being the validity of the measure.27-29
Furthermore, 57% of Hispanic individuals under the age of 25 receive a high school education in comparison to the 88.4% of non-Hispanic whites.30 In Mexico, 63.3% of the population has received a middle school education or below, with the mean educational level being fourth grade.31 Given these facts it critical that educational attainment be taken into account when adapting and validating the Spanish BDI-II for use in the U.S. and Mexico. One way to improve understanding for those with low education is to increase readability, which is operationalized as the number of syllables in words and numbers of words per sentence.32 Not only does the specific language of a cultural context need to be taken into account, but simple and clear wording needs to be used in order for all participants to understand the questionnaire, and thus accurately measure depressive symptomatology.
Third, different factor structures have been reported in the literature, including a one-factor structure,17 two different two-factor structures,11-12 and a three-factor structure.13 These structures are discussed in detail in the methods section. It is unclear which factor structure best describes the BDI-II in Spanish-speaking individuals as they have not been explicitly compared in previous studies.
The purpose of this study was to develop and pilot an adaptation of the BDI-II for use with Spanish-speaking individuals of Mexican origin. There were three main goals of this study. First, this adaptation addressed issues with regional language and readability previously discussed. Second, this study extended knowledge of the BDI-II's psychometric characteristics to Spanish-speaking individuals of Mexican origin. Third, there is no evidence for which factor structure best reflects the latent structure of the BDI-II in Spanish-speakers; this study explored latent structure with PCA and compared different structures using confirmatory factor analysis (CFA) techniques.
The study consisted of two samples: a student sample and a community sample. The former consisted of 420 medical students from 35 different hospitals across Mexico contacted via the Universidad Nacional Autónoma de México's (UNAM) Department of Anesthesiology. Twenty-nine did not complete all measures and were removed from subsequent analyses, leaving 391, of whom 60.4% were female and 39.6% were male. Ages ranged from 24 to 39 (mean = 28.62). The majority were single (68%), with the remaining being either married (31%) or widowed (1%).
The community sample consisted of Mexican residents living in Mexico City and surrounding area. 220 individuals in different areas of the city completed questionnaires. Of the original sample, 15 were discarded due to incomplete information, leaving 205, of whom 57.1% were female and 42.4% were male. Ages ranged from 16 to 70 (mean = 29.94). With regard to education, 10% had middle school or less, 37% had some or completed high school, 50% had some or completed college, and 3% had some post-graduate education. The majority were single (71.2%), with the rest being married (19.5%), widowed (1.0%), or divorced or separated (3.0%).
In addition to the BDI-II, participants in the student sample were given a demographic questionnaire, multidimensional coping styles questionnaire, a life stress checklist, and the Hospital Anxiety and Depression Scale (HADS) as part of a separate study. Only results from the BDI-II and HADS were used for the study.
The Beck Depression Inventory – Second Edition is a 21-item measure of depression that was revised to include DSM-IV symptoms of depression—which are equivalent to DSM-5 symptoms—and different cognitive symptoms of depression.11,33 Individuals may rank their responses to items on a 0-3 scale and total scores can range from 0-63 with the following cut-offs: 0-13, minimally depressed; 14-19, mildly depressed; 20-28, moderately depressed; and 29-63, severely depressed.
The English BDI-II was translated by seven separate translators. These translators were bilingual psychologists and psychology doctoral students from UNAM. One of the authors (I.R.-L.) reviewed the translations and led a consensus discussion to resolve differences and select unambiguous wording for a single translation. Discussion ceased when there was 80% agreement between the seven translators. This Spanish translation was back-translated to English by a separate bilingual psychologist. Discrepancies between the original and back-translated BDI-II were resolved by I. R.-L. through consensus discussion. The corrected Spanish BDI-II was then piloted with a small group of undergraduate students (n = 17). No difficulties were noted by the pilot group and this final BDI-II was used for all subsequent analyses.
This adaptation was compared with the translation currently available in the U.S. and several differences were noted. For example, on item 17 (irritability) the U.S. version uses the phrase “estoy irritado” and the Mexican adaptation uses the phrase “estoy irritable.” In common usage, “estoy irritado” may be misinterpreted as having skin irritation and may not tap into diagnostic criteria. The Mexican adaptation also included words at a lower reading level, such as “inútil” instead of “inservible” for item 14 (worthlessness). The Fernandez-Huerta readability level of our adaptation is 80, which is considered very readable/easy, compared to the extant U.S. translation, which is 68 and considered moderate/normal.
The Hospital Anxiety and Depression Scale is a 14-item self-report measure of depression and anxiety.34 There are two 7-item scales, with each of the items corresponding with a four-point Likert-type scale, with the highest possible score for each scale being 21. The psychometric properties of the scale have been demonstrated to be adequate in multiple populations.34-35 This study used the Spanish translation provided by López-Alvarenga and colleagues.36 The Spanish HADS has been used by medical outpatients and students with adequate internal consistency (αs > .80), test-retest reliability (ICCs > .80), sensitivity and specificity to MDD and GAD (> .75), convergent and divergent validity (rs > .70), and two-factor structure.36-39
The study was approved by the institutional review board at UNAM. Student participants were contacted as part of a larger psychological intervention study tailored for medical students.40 They were given information about the study through UNAM's Department of Medicine and Postgraduate studies. The sample primarily came from the Instituto Nacional de Ciencias Medicas y Nutricion Salvador Zubiran in Mexico City, although students from 34 other hospitals also participated. Participants that consented to participate would then arrive at UNAM's central campus and complete a packet of questionnaires in a group setting.
For the community sample, individuals from different public parks and plazas in Mexico City were approached to participate in the study over a two month period. They were told that the study would involve questions about their mood over the past two weeks, including the current date. Interested individuals who were residents of Mexico were read an informed consent form and asked if they wanted to participate. If so, they initialed the informed consent and were given the BDI-II.
In both samples, lists with contact information for psychological resources were made available and no identifying information was collected. Participation in both studies was completely voluntary and there was no compensation.
Data were screened for assumptions (e.g., normality, homogeneity of variance, and outliers). After calculating basic psychometrics for each sample, a subset of data from both sample was randomly selected to explore with PCA since the BDI-II has not been tested with Mexican samples. Oblimin rotation was used to ease interpretability. CFA with a separate subset of data was performed to test structural equivalence of our adaptation in this sample and determine the most appropriate factor structure. A separate subsample was used to reduce inflated estimates of model fit from exploitation of error variance that occur when PCA and CFA are performed with the same sample. Multiple factor structures have been presented in the English literature: a one-factor structure,17-19 two different two-factor structures,11-12 and a three-factor structure.13 The two-factor models tend to reflect cognitive and somatic symptoms, with affective symptoms oscillating between factors depending on the sample.41 To account for this, Osman and colleagues posited a three-factor model which has received support in different samples.42 In this study, five different factor models were tested: Sanz and colleagues’ one-factor model, Beck and colleagues’ two-factor model (Factor I: Items 1-14, 17, 21; Factor II: 15-16, 18-20), Dozois and colleagues’ two-factor model (Factor I: 1-3, 5-9, 13-14; Factor II: 4, 10-12, 15-21), and Osman and colleagues’ three-factor model (Factor I: 1-3, 5-9, 14; Factor II: 4, 12-13, 15, 17, 19-20; Factor III: 10-11, 16, 18, 21). Although most of these models have been validated on student samples, they have also been shown to function in community samples and are generally combined to indicate a nonclinical sample model.42 The results of our PCA were also subject to CFA.
Since the individual items of the BDI-II are ordinal, robust weighted least squares estimation (WLSMV) is the preferred estimation technique and was employed.43 WLSMV yields multiple indices of fit, including: chi-square test of model fit, comparative fit index (CFI), Tucker-Lewis index (TLI), and root-mean square-error of approximation (RMSEA). Although the use of cut-off conventions for approximate fit indices is disputed and can vary depending on factors such as model complexity, the following cut-offs were used for this study: CFI and TLI ≥ .95 and RMSEA ≤ .06.43
At the scale level, there were no notable outliers, and kurtosis and skew were within acceptable limits using conventions of skewness > 3, kurtosis > 10.43 Linear computation techniques were used to estimate responses in individuals with two or less items missing on the BDI-II. In the student sample, no individuals were removed for missing more than 2 items.
Students had a mean score of 9.31 (SD = 7.84). The internal consistency of the measure with this sample was high (α = .92). Item means, standard deviations, and item-total correlations are reported in Table 1. Seventy-eight percent of the sample was classified as minimally depressed, 13% was classified as mildly depressed, 5% as moderately depressed, and 4% as severely depressed. Students’ mean scores on the HADS depression scale were 4.19 (SD = 3.54) and their scores on the anxiety scale were 6.34 (SD = 3.74). There were large correlations between the BDI-II total score and depression scale (r = .65, p < .001) and anxiety scale (r = .71, p < .001), which reflected convergent validity. There were gender effects, with females having higher scores (10.14 vs. 8.03; t(374) = 2.74, p = .007, d = .28). Total scores did not have a statistically significant association with age.
Individuals in the community who responded to the BDI-II had a mean score of 9.82 (SD = 7.70). Item means, standard deviations, and item-total correlations are reported in Table 1. The measure also demonstrated a high level of internal consistency (α = .87). Seventy six percent were classified as minimally depressed, 12% were classified as mildly depressed, 9% as moderately depressed, and 3% as severely depressed. Comparisons between genders indicated a similar gender effect with females having higher scores (10.85 vs. 8.51; t(201) = 2.28, p = .02, d = .32). No statistically significant relationship with age was observed.
The two samples were combined because no large differences were observed between samples, and to increase power and reduce restriction of range from the education of the student sample. Student and community samples can be combined for a nonclinical sample.42 A subsample of 150 individuals were randomly selected for PCA, which generated six factors with eigenvalues greater than one. Scree plot analysis indicated that a two-factor solution was parsimonious. The factor loadings from the oblimin structure matrix are reported in Table 2. The first factor may be described as “depressed mood and motor complaints” and the second factor may be described as “negative cognitions.”
CFA was performed with the remaining 446 individuals in the combined sample and results for the different models are reported in Table 3. The chi-square significance tests for all models were statistically significant. The two-factor Beck et al. and three-factor Osman et al. models were acceptable using all three approximate fit statistics. The two-factor Dozois et al. model was acceptable using two fit statistics, and the two-factor derived from PCA and one-factor Sanz et al. models were not adequate using cut offs. The three-factor Osman et al. model appeared to have the best fit, but the fit statistics for the Osman et al., Beck et al., and Dozois et al. models were similar.
The purpose of this study was to explore the psychometric characteristics of an adaptation of the BDI-II for use with a Mexican, Spanish-speaking population. In the current study, a translation was created using multiple translators who were familiar with mental health and popular terminology for depression in Mexico. After multiple revisions and piloting, a tentative final version was administered to a student and community sample. Data indicate that this translation of the BDI-II shows promise for use with this population. In both samples, overall internal consistency was adequate (αs = .87 - .92). This is consistent with the alphas found in multiple English samples.
With regard to content and factorial validity, the item relationships occur in a similar fashion as they do in English samples. First, the item total means for our two samples (Ms = 9.31 and 9.82) are consistent with the ones found in various non-clinical samples, which can range from 9 to 12.11-12,16 After analyzing multiple indices of fit, Osman and colleagues’ three-factor model demonstrated the best fit in both samples. This factor model is also supported in English-speaking samples, indicating that item responses reflect cognitive symptoms, performance difficulties, and somatic complaints.42 Theoretically, this model may have better fit because it separates affective items that do not have strong loadings on cognitive or somatic factors.41
Although the three-factor model presents with the best fit of the three models, two alternative models are not completely inviable alternatives. This result may be indicative of a characteristic of the measure. The BDI-II was not designed to be used as having distinct factors or subscales. This may explain why no clear factor structure has emerged in the literature on the English BDI-II. Since the National Institute of Mental Health (NIMH) has called for the increased use of factors in research and practice, individuals interested in measuring components of depression may be better served by using a measure designed for this purpose, such as the Inventory of Depression and Anxiety Symptoms.44-47
Although there was adequate convergent validity, as demonstrated by a large, statistically significant correlation with the HADS depression scale, there were issues with discriminant validity. Our scale's total score had a stronger correlation with the HADS anxiety scale than with the depression scale. This may have multiple explanations. One is that in our sample, the BDI-II measures symptoms that the HADS classifies as anxiety. For example, four of the items in HADS are related to psychomotor agitation and retardation and somatic malaise, which are also measured by the BDI-II. Another explanation is that generalized anxiety symptoms group together with depression, which may reflect a common etiology.48 Several diagnostic systems centered around the common etiology of negative valence disorders have been proposed.48-49 Thus, observed statistics may reflect natural relationships in the data. A third explanation is that depression and anxiety may cluster together very strongly in Spanish speaking populations. Several studies with the Spanish BDI, HADS, and other measures have found high correlations (> .60) between depression and anxiety.36,37,49-50 An alternative explanation is that the HADS subscales themselves are highly related, and that some researchers recommend the use of the HADS as an overall measure of distress, although others have supported its use as a measure of two separate but related constructs.35,51
This study has strengths and limitations worth mentioning. One strength is that it provides important information for clinicians and researchers in Mexico who plan to use the BDI-II. To date, only the BDI has been validated for Mexican populations, which explains why it is frequently used.20 This study provides provisional data indicating this translation of the BDI-II has adequate psychometrics for use in Mexico. This information is also important for clinicians and researchers in the U.S. who work with individuals of Mexican descent. A number of factors, including language, acculturation, and SES, may influence whether individuals have more in common with individuals in their host culture, or culture of origin. Mental health professionals may now refer to this data in conjunction with available research on bilingual individuals to determine the appropriateness of the BDI-II.16
Another strength of this study is that it is the first to compare the adequacy of multiple factor structures with a Spanish translation using CFA techniques. Previous studies have only tested a single model without alternate hypotheses, or used simpler PCA techniques. This data indicates that Osman and colleagues’ three-factor model had the best fit in both our samples, although other models also had adequate fit according to some fit indices. This highlights that although the BDI-II has components, it may not be the most appropriate instrument for obtaining precise measures of these components. Another strength is that the student sample consisted of students from across the country, as opposed to being based from a single university.
A limitation of this study is the use of convenience sampling. As such, the psychometric properties reported may not be robust. The effects of convenience sampling may be mitigated by the use of students from 35 different hospitals across the country, as opposed to being based from a single university, and multiple public parks at different times of day, however these effects should still be considered. Another limitation is the lack of measures to supplement validity estimates from the HADS. Also, in aiming to create a regionally appropriate adaptation, different Spanish translations of the BDI-II may not be comparable, limiting comparisons with other translations.
A final limitation is that the BDI-II was developed to measure clinical phenomena, and this pilot study did not use a clinical sample. Symptoms may cluster together differently when disorders are present, or when the severity of symptoms is greater. Furthermore, clinical norms and diagnostic accuracy statistics (e.g., positive predictive power, sensitivity) are important for clinical decision making, such as clinic screening or triage. However, clinical samples are nested within communities and the data from this pilot study serves as a helpful norm reference.
Another consideration is that there are alternative validation methods that can be implemented. For example, one can compare differences between English and Spanish forms in a bilingual sample, or using dual-language split-half methodology which involves having two forms where alternate halves are in English and Spanish.52 Given that this study was conducted in a monolingual environment, use of these techniques was precluded.
Future studies may focus on addressing these weaknesses as well as extending research to a clinical sample and publishing diagnostic accuracy statistics. Studies could incorporate this translation in epidemiological studies to take advantage of representative sampling procedures. Researchers could also use different measures for validation purposes, keeping in mind that the new measure has demonstrated appropriate characteristics for use with Mexican-origin Spanish-speakers. Ultimately, future studies will want to extend research to clinical samples to see if this measure works equally well with the many Spanish-speakers in immediate need of appropriate assessment and intervention.
This research was supported, in part, by a NIMHD-funded MHIRT fellowship (T37 MD003405.) Student data were collected as part of Areli Reséndiz's doctoral dissertation. Pamela Pérez helped with data collection of the student population and some edits to this manuscript.
González, D. A., Reséndiz, A., & Reyes-Lagunes, I. (2015). Adaptation of the BDI-II in Mexico. Salud Mental, 38, 237-244.
David Andrés González, University of North Texas & South Texas Veterans Health Care System.
Areli Reséndiz, Universidad Nacional Autónoma de México.
Isabel Reyes-Lagunes, Universidad Nacional Autónoma de México.