|Home | About | Journals | Submit | Contact Us | Français|
Body checking may be an important behavioral consequence of body image disturbance. Despite the importance of body checking, few measurements of this construct exist, particularly for males. This study describes the development and validation of the Male Body Checking Questionnaire (MBCQ).
Convergent and divergent validity, factor structure, and reliability were tested in three separate samples of men and women.
Factor analyses suggested a reliable four-factor structure with evidence of a higher order global checking factor for men, but not women. The MBCQ demonstrated good concurrent and divergent validity. Short-term test-retest reliability was good with high internal consistency across time.
Interpretation of psychometrics and recommendations for subsequent research are discussed. The MBCQ is likely to be an appropriate tool for investigating body image-based pathology in males.
Body checking can be defined as repetitive attempts to evaluate or scrutinize one’s appearance. Existing theoretical and clinical models of this construct assume that these phenomena derive from an overvaluation of shape or weight,1,2 which is believed to be a core aspect of eating disorder psychopathology.3,4 Body checking also appears in other forms of psychopathology, most notably body dysmorphic disorder (BDD)5,6 and a subtype of BDD-labeled muscle dysmorphia (MD).7,8 These behaviors are expressed in many different ways and may include mirror checking, pinching one’s fat or feeling one’s muscles, gauging body size, shape, or weight by clothing or through other standardized measures (e.g., tape measure, weighing, etc.), reassurance seeking (e.g., asking partner about evidence for change in weight or shape), social comparison (e.g., comparing oneself to friend or peer), or spot checking (e.g., looking for bone protrusions, etc.; Refs. 4,6,7,9).
The available research supports these conceptualizations of body checking; specifically that body checking is significantly correlated with overvaluation of shape and weight. Evidence of this relationship exists in obese women with binge eating disorder (BED),1 obese men and women presenting for bariatric surgery,10 men and women seeking treatment for weight loss,11 nonclinical university women,9 and mixed nonclinical and clinical samples of University women including those diagnosed with an eating disorder.12 However, the strength of the relationship between shape concern (SC) and weight concern (WC) and body checking is not universal. Mountford et al.13 found that body checking varies across eating disorder diagnoses; those diagnosed with BED or anorexia nervosa (AN) reported less overall checking behavior than those with bulimia nervosa (BN) or eating disorder not otherwise specified.
Negative affect is likely to play an important role in body checking by both driving and being a consequence of the behavior. Haase et al.14 proposed that the relationship between SC/WCs and body checking is mediated by social physique anxiety. In this model, body checking cognitions lead to body checking behavior in the presence of social physique anxiety, suggesting negative affect is a necessary precursor to clinically significant repetitive body checking behavior. Negative affect may also be a consequence of body checking. Shafran et al.15 found that in a sample of nonclinical women, those who were asked to increase their body checking behavior during a mirror task reported more feelings of fatness and decreased affect after body checking. However, these group differences disappeared by 1-h follow-up.
As the literature investigating body checking behavior expands, the need for reliable and valid measures of body checking continues to increase. The most widely used measure of body checking, the Body Checking Questionnaire (BCQ),9 is a 23-item scale anchored in the core symptom presentation found among women with body image concerns. Reas et al.9 identified three subscales: overall appearance, specific body parts, and idiosyncratic checking. The contents of these items, however, are almost exclusively specific to the stereotypically female experience of body image disturbance. In particular, the BCQ measures the tendency to check female “hot spots” such as thighs or appearance related to fatness and body weight. It is relatively well accepted now that men and boys are likely to experience disturbances in appearance that differ from their female peers. For instance, men are much more likely to report a desire to be both lean and muscular, which in most cases involves an increase in muscle mass and a reduction of subcutaneous body fat.16 However, there is an evidence that existing anthropometric status may lead to differentially valued changes in physical characteristics, such that men with more body fat may desire reduction in body fat over gains in muscle mass or vice versa.17 Given that men and boys are more likely to value lean muscularity over low weight, the behavioral expressions of overvalued shape and weight are likely to reflect more interest in aspects of appearance associated with this idealized male body type.
The purpose of this study is to report the development and validation of the Male Body Checking Questionnaire (MBCQ), which includes items more theoretically tied to existing research on male body image disturbance. In particular, the MBCQ items incorporate features common to the idealized male form, such as musculature in the upper body, reduced subcutaneous body fat, particularly around the waist, and the shape or feeling of specific muscles. Our goal was to develop a measure that could be used in conjunction with the BCQ, so we designed the MBCQ to have a similar format and item structure.
The purpose of the first study was to develop an initial item pool and test the factorial, convergent, and divergent validity of items with acceptable psychometric properties. The original item pool consisted of 23 items developed by the authors TH and SD. The format was based on the BCQ,9 which asks participants to rate the frequency of a behavior on a five-point Likert-type scale from 1 = never to 5 = very often. Content validity was established by asking five body image researchers/clinicians that had familiarity with male body image disturbance to rate each item along domains of clarity, relevance, and appropriateness (scale of 1 = do not agree to 7 = strongly agree). Four of the items scored below a five in at least one domain and were dropped from the item pool. The remaining items all had mean scores in each domain above 6.0.a
A total of 342 (men = 196, women = 146) participants were recruited from an undergraduate student population and they received course credit for their participation. Participants were on an average 19.76 (SD = 2.78) years old and self-reported their race/ethnicity as white or European American (n = 182, 53.3%), Asian, Hawaiian, or Other Pacific Islander (n = 76, 22.2%), Hispanic or Latino (n = 58, 17.0%), and African American or Black (n = 26, 7.6%). The average body mass index (BMI; kg/m2) for men was 24.74 (SD = 3.71) and for women was 22.8 (SD = 3.45).
As described earlier, the MBCQ is a 19-item measure of male body checking behavior that utilizes the same format as the BCQ. We selected the measures below to provide assessment of convergent and divergent validity.
The BCQ was selected as a measure of divergent validity for the MBCQ because it measures stereotypically female body checking behavior. Different groups have reliably produced a three-factor structure in both mixed gender and female-only samples (1,9,10,12). The coefficient alphas among undergraduate populations range from 0.66 to 0.92 with the idiosyncratic subscale showing the lowest internal consistency across studies.9,12,14 The consistent replication of a three-factor solution using principal component analysis (PCA) and confirmatory factor analysis (CFA) suggests good factorial validity. In addition, the BCQ has reportedly good 1–2 week test-retest reliability in undergraduate populations for the subscales and overall sum score (r =.84).9,12 Furthermore, the overall sum score and subfactor sum scores have moderate to high correlations with theoretically related constructs including overvaluation of shape and weight, eating disorder symptoms, body checking cognitions, and physique anxiety.9,10,12,14,18
The Eating Disorder Examination-Questionnaire (EDE-Q)19 is a 41-item measure that has a Global and four subscale scores, which include restraint (R), eating concern (EC), weight concern (WC), and shape concern (SC). Psychometric evaluation of this measure in undergraduate populations indicates test-retest reliability for the scales ranging from .81 to .94 with coefficient alpha levels ranging from 0.78 to 0.93 at each time point.20 Similar results have been found in community samples examining the 1-year stability of these scales with Pearson’s r ranging from .81to .94.21 The behavioral items of the EDE-Q have lower test-retest reliability20,21 ranging from .57 to .70 for short-term reliability coefficients. These items include binge eating (subjective and objective) and compensatory behavior. Normative data collected on young women between the ages of 18 and 22 indicate means for R subscale of 1.29 (SD = 1.41), EC subscale of 0.87 (SD = 1.13), WC subscale of 1.89 (SD = 1.60), SC subscale of 2.29 (SD = 1.68), and Global score of 1.59 (SD = 1.32).22 Similar norms have been reported for undergraduate populations.23 No male norms exist for the EDE-Q (EDI-P).
The Eating Disorder Inventory is a 64-item measure developed to assess the core pathology of women with AN and BN.24 The Perfectionism scale of the EDI assesses the extent to which one believes only high standards are acceptable for both self and others. Garner et al.24 report good internal consistency and construct validity with evidence that perfectionism can discriminate between women with and without an eating disorder diagnosis. Evaluation of factorial invariance suggests that the EDI-P measures a similar construct in both genders.25
The Muscle Dysmorphic Disorder Inventory (MDDI)26 is a 13-item measure of stereotypically male body image disturbance based on the proposed diagnostic criteria for MD. Items are scaled on a five-point Likert-type scale from 0 = never to 4 = always and factor analyses indicate a three-factor structure. These subscales are desire for size (DFS), appearance intolerance (AI), and functional impairment (FI). Hildebrandt et al.26 reported test-retest reliability ranging from .81 to .87 and coefficient alphas ranging from 0.77 to 0.85 in several samples of men and women.
To explore the factor structure of MBCQ, we conducted exploratory factor analysis (EFA) using a maximum likelihood (ML) estimation extraction method with Promax rotations and PCA with Kaiser normalization and Promax rotations. The appropriateness of EFA versus PCA for examining factor structure has been an issue of considerable debate (e.g., Ref. 27). The primary difference between EFA and PCA is the values placed along the diagonal of the correlation matrix. In PCA, the diagonal is always 1.0 for every cell. In comparison, EFA uses a communality which is iteratively estimated and substituted along the diagonal of the correlation matrix.28 We report on both EFA and PCA because they offer different advantages in exploring the properties of a given scale in development. For instance, PCA provides the most appropriate method for data reduction, whereas EFA is best for exploring latent variables that underlie the measured variables.29 Promax rotations were chosen because this method allows for the identified latent variables to be correlated.
To determine the appropriate number of factors to extract, goodness-of-fit statistics were used to evaluate successively complex (i.e., more factors) EFA models. As suggested by Browne and Cudeck,30 the simplest model with a root-mean-square of approximation (RMSEA) < 0.05 was used as a cutoff for determining the final model. The EFA model results were compared to PCAs where factors were extracted with an Eigenvalue above 1.0, and these results were confirmed with both a parallel analysis and examination of scree plots.
The pattern coefficients for EFAs with ML are reported in Table 1 for the total sample and also separately for men and women. Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy for the entire sample was 0.742, men only 0.769, and women only 0.722. Bartlett’s test of sphericity was statistically significant for the total sample [x2 (171) = 3315.64, p < .001], men only [x2 (171) =2368.51, p < .001], and women only [x2 (171) =1267.75, p < .001]. Goodness-of-fit assessment indicated that a five-factor solution provided an acceptable fit to the total sample [RMSEA = 0.082 (90% CI = 0.062–0.102)]; however, there were differences in the simple factor solutions when conducted separately for men and women. The results suggest that a less complex four-factor solution provided a good fit [RMSEA = 0.039 (90% CI = 0.023–0.055)] for men, but a more complex five-factor was needed to achieve a good fitting solution [RMSEA = 0.041 (90% CI = 0.016–0.066)] for women. Final solutions explained between 64.79% and 72.43% of the variance (see Table 1). These results were supported by both examination of Eigenvalues (all >1.0) and examination of the scree plots for each analysis. Parallel analyses,31 which compare actual eigenvalues to null eigenvalues, generated from a random set of correlation matrices (in this case 50) supported these models. Eigenvalues that are greater than the average of the null eigenvalues generated from the random set are retained.
Table 2 reports the results of PCAs for the same groups, and comparison between the PCA and EFA results suggested a similar number of composite and latent variables but with some significant instability in the pattern coefficients for total sample and women-only PCAs. In particular, the differences in item factor relationships suggested that MBCQ did not have an interpretable structure when females were included in the analysis. The results of the men’s EFA and PCAs, however, suggested some degree of item and factor/composite stability.
In the men’s data, items 1–5 and 14–15 all had pattern coefficients and communalities above 0.4 and had the strongest relationship with factor/composite 1. This factor/composite was labeled the Global Muscle Checking (GMC) subscale, because it appeared to measure global aspects of muscle checking including items that tap both of the leanness/reduced body fat and increased size aspects of muscle appearance. Items 11–13 had the strongest relationship with factor/composite 2, labeled the Chest and Shoulder Checking (CSC) subscale, which appeared to measure the frequency of checking the musculature of one’s chest and shoulders. The third factor/composite, labeled the Other-Comparative Checking (OCC) subscale, had the strongest relationships with items 6–9 and appeared to measure the tendency for one to check his appearance through comparison to others or through enlisting the help of others in the body checking process. The fourth factor/composite, the Body Testing (BT) subscale, had the strongest relationships with items 10 and 16–19 and appeared to measure the idiosyncratic forms of body checking that specifically involved some form of objective evaluation or manipulation of one’s body to assess muscle appearance.
Despite the replicability of this structure in the analyses of the men’s responses, there was some evidence that certain items were not explained well by this structure. In particular, Items 8 and 10 had low communalities (<0.4), although item 8 had a higher communality in the PCA where extracted communalities were reported. In the EFA, the highest pattern coefficient for item 9 was still below typical standards for item evaluation (>|.4|) reported in the literature (e.g., Ref. 28). Finally, items 18 and 14 had appreciable cross-loadings on separate factor/composites suggesting that they may be related to other aspects of body checking. To determine whether a simpler structure could be estimated, we systematically eliminated these items. We generated an EFA and PCA for each potential combination (a total of 20 potential combinations) of removed items. Across analyses, the identified factors remained intact except for the merging of factor/composites 2 and 3 when any combination involving the elimination of items 8, 10, and 14 was estimated. Table 3 reports the results of the most interpretable structure and item collection from our analyses and resulted in a solution where item 8 was deleted.
Table 4 summarizes the relationship between MBCQ subscales and associated measures. Composite MBCQ subscale scores were used in the correlational analyses although we also calculated factor scores and correlated these with the additional measures and the correlations were very similar. The BT subscale was most strongly correlated with eating disorder and MD psychopathology. A dummy variable reflecting men who reported evidence of binge eating (SBE or OBE) and/or purging behavior (n = 15) was created and a logistic regression analysis conducted. The regression including all four MBCQ subscales correctly classified 12/15 (80%) of the men with eating disordered behavior and only misclassified 7/181 (3.4%). This was true even when accounting for BCQ subscales. However, only the parameter estimates for the CSC subscale (exp(β) = 1.63, 95% CI = 1.19–2.07, p < .01) and BT subscale (exp(β) = 1.88, 95% CI = 1.43–2.33, p < .01) were significant. These findings suggested that (a) MBCQ could be useful in screening for eating disorder pathology, (b) provides additive predictive value not accounted for in the BCQ, and (c) some degree of discriminant validity (correctly classifying men with vs. without eating disorder behavior).
The purpose of this study was to (a) test the factor structure of the MBCQ in a separate sample of undergraduate men and (b) evaluate the possibility of a higher order body checking factor.
Five hundred and forty-nine undergraduate men were recruited from a University subject pool and given course credit for their participation in a larger study of body checking examining the relationship between body checking and extreme weight and shape control behaviors (Walker et al., Manuscript under review). Average age was 18.98 (SD = 1.59) and BMI was 24.66 (4.35). Self-reported race/ethnicity indicated a majority of participants self-identified as White or European American (n = 378, 68.9%), African American or Black (n = 48, 8.7%), Asian or Pacific Islander (n = 42, 7.7%), Hispanic/Latino (n = 32, 5.8%), and others (n = 2, 0.4%). Only data from the MBCQ are used and reported here.
Confirmatory factor analyses (CFA; see Ref. 32) were used to test the factor structure of the MBCQ. This methodology allows for testing predetermined data structure to competing models and for the presence of higher order factors. Given the higher order structure identified in the original BCQ,9 we hypothesized a similar factor in MBCQ. Models were compared using χ2 tests, Comparative Fit Index (CFI; scale 0–1.0, >0.95 is good fit),33 Tucker-Lewis Index (TLI; scale 0–1.0, >95 is good fit),34 Akaike Information Criterion (AIC; no scale, lower is better fit),35 Bayesian Information Criterion (BIC; no scale, lower is better fit),36 standardized root mean square residual (SMSR; <0.08 is good fit),33 and RMSEA (<0.05 is good fit).33 Modification indices (MIs) were also used to guide refinement of the hypothesized model including examining the possibility of cross-loading items and significant item covariance in the model.
The hypothesized model, based on EFA and PCAs conducted in Study I, was a four-factor model with correlated factors. Item 8 was included during data collection, but the loading between item 8 and factor 3 was set to zero in the hypothesized model. The results of this model are reported in Table 5. Chi-square test indicated [x2(147) = 1723.103, p < .001] a poor fit to the data, which was also supported by other measures of goodness-of-fit. Examination of the MIs suggested adding the covariance between items 8 and 9 and the loadings of these items on factor 4, the covariance between items 14 and 15 and the cross-loading of item 14 on factor 1. Because these items’ (8,9,14) variability appeared to be poorly explained by the model, had correlated measurement error, and/or significant cross-loadings, we estimated a second CFA dropping these items. The results of this new model suggested a much better fit to the data and retention of the original factor structure. Because of the tendency for a three-factor solution to result from exclusion of certain items (factor 2 and 3 combine) in our earlier EFA/PCAs, we compared our final four-factor model to a competing three-factor model where factors 2 and 3 were merged. The results suggested an inadequate fit to the data: [x2(100) = 1086.731, p < .001], CFI = 0.825, TFI = 0.790, AIC = 22320.795, BIC = 22475.886, SRMR = 0.060, RMSEA 90% CI = 0.127–0.141. Comparing these values to the final model, the four-factor model clearly provided a better fit to the data.
A higher order CFA was also calculated; the model was estimated by freeing the loadings between the four first-order factors and the hypothesized second-order factor and fixing the variance of the higher order factor to zero. The estimates of the factor loadings between the first-order (GMC = 0.960, R2 = .923; CSC = 0.871, R2 = .759; OBC = 0.781, R2 = .611; BT = 0.858, R2 = .736) and second-order factors were all statistically significant. The overall fit of the model was also very good, [x2(100) = 389.643, p < .001], CFI = 0.978, TFI = 0.953, AIC = 21023.707, BIC = 21178.799, SRMR = 0.011, RMSEA 90% CI = 0.009–0.023, suggesting the presence of a higher order latent body checking variable drives the variability in item responses to the MBCQ items.
The final study was conducted to evaluate the 2-week test-retest reliability of the MBCQ items, subscales, and composite score.
Twenty-seven male undergraduates were recruited as part of a larger study. On an average, participants were 18.3 (SD = 0.67) years old, 69.5 (SD = 3.46) inches tall, and weighed 167.4 (SD = 25.36) pounds. The majority of participants reported that they were Caucasian/White (81.5%, n = 22), with the remaining participants identifying as African American/Black (7.4%, n = 2), Asian (7.4%, n = 2), and Hispanic (3.7%, n = 1). The average time between the first and second administration of the MBCQ was 12.3 (SD = 1.9) days.
Test-retest reliability was evaluated by calculation of Pearson correlations between time-1 and time-2 individual item scores, subscale composite scores, and global composite scores.
MBCQ items were significantly correlated at time-1 and time-2, as were the global and subscale scores. The data indicate that MBCQ scores are stable over short durations. Alpha levels were high at both time points (α = 0.94 and 0.93, respectively). The test-reliabilities for the overall sum (r = .841), GMC (r = .793), CSC (r = .693), OCC (r = .714), and BT (r = .677) and subscales were acceptable to good range.
The results of the three studies suggest that the MBCQ has four correlated subfactors that measure different aspects of body checking behavior. The presence of a higher order factor suggests that body checking is largely driven by a common underlying construct. As evidenced by the marginal interpretability of EFAs/PCAs in the female-only data set, the utility of using the MBCQ with women is questionable. Rather, these data suggest that the unique phenomena of body checking may be strongly influenced by an individual’s gender and associated with differentially valued aspects of appearance (i.e., thinness vs. musculature). The relationship between MBCQ scales and related constructs suggests both convergent and divergent validity. The MBCQ was positively correlated with perfectionism, eating disorder psychopathology, and muscle dysmorphic disorder symptoms. However, there was a differential relationship between subfactors and these forms of pathology, with the behavioral testing scale emerging as the best discriminator between those with and without eating disordered behavior.
Body checking is an overt behavior and its measurement is subject to variability, as with most behavioral indicators of psychopathology (e.g., binge eating). However, the MBCQ and its subscales appear to have good internal consistency and moderate to good stability over short durations (1–2 weeks). More comprehensive studies of the MBCQ’s reliability are warranted, particularly if this measure is to be used in assessing symptomatic change over time. It is possible that the measurement properties of different items vary and this would consequently reduce the viability of the MBCQ to assess treatment outcome. The stability of the scale’s measurement properties is essential to accurately measuring change.
The theoretical importance of the MBCQ subfactors will also be an important target for future investigations. Research on male body image disturbance continues to suggest that males desire muscularity and its associated attributes, such as social dominance, physical health, and better job-related performance.37,38 The GMC subscale may capture a general tendency for men to evaluate physical features believed to be associated with these stereotypically masculine attributes. The CSC and OCC subfactors appear to measure the comparison of self to others or reassurance seeking from others, with a focus on chest and shoulders. Evidence suggesting gender specificity in body-based social comparison offers some support to this construct.39 Unlike females, social comparison in males may not lead to dissatisfaction as often.40,41 It is possible that body-based social comparison processes in men have a more diverse range of emotional consequences. Finally, the BT subscale includes very specific forms of body checking and may be a better measure of psychopathology than the other subscales. The BT subscale items reflect more idiosyncratic and ritualistic forms of body checking. It is possible that this form of checking is indicative of the obsessive-compulsive psychopathology believed to be shared between eating disorders and MD.17 Thus, it is not surprising that this scale better identifies those with specific types of symptoms (e.g., ED or MD symptoms) than the other MBCQ subscales.
The results of these studies highlight the importance of gender in body image assessment and theory. Although there is an evidence that the BCQ is reliable and valid in mixed-gender samples (e.g., Ref. 1), the results of this study suggest that these behaviors may not have been adequately assessed in prior research, having failed to measure forms of body checking common to men. The scientific and clinical utility of the BCQ and similar measures in male populations is an area in need of further investigation. It appears that there are male-specific forms of body checking that are significantly correlated with other important constructs (e.g., MD and eating pathology). The value of assessing these stereotypically male forms of body checking in male populations is likely to improve methodology and ultimately provide a better theoretical understanding of these behavioral phenomena in males. The discriminant value (behavioral eating disorder symptoms vs. no symptoms) of the MBCQ subscales in Study I suggests a clear value for gender-specific assessment of body checking behavior.
Although the MBCQ did not have good factorial validity in female undergraduates, there may be certain subpopulations of women for whom the items included in the MBCQ represent valid sources of measurement. For instance, women who prefer masculine features, pursue a masculine appearance, or pursue a masculine ideal may be more prone to engage in the body checking behaviors assessed in the MBCQ. The question remains, however, whether a universal and gender-neutral measure of body checking is required. The presence of such a measure may increase the flexibility of assessing this construct in mixed-gender samples, but such a measure may also miss some important aspects of specific forms of checking that appear to be common to men only or women only.
The described studies have a number of limitations, most notably the limited age and education level of study participants. Although the results suggest cross-sample replicability, the psychometric properties need to be tested in mixed-gender community and clinical samples to determine their generalizability. Furthermore, the presence of a two-item factor (see final Model in Table 5) is not desirable as traditional standards suggest minimums of 3–5 items for factor stability and reliability.29 Thus, the use of this scale independently is not recommended, although its contribution to a composite score seems warranted given that the inclusion of this factor yielded better CFA results. The remaining subscales (i.e., GMC, CSC, and BT) will likely perform better when used or analyzed independently. Finally, the test-retest reliabilities of the subscales and composite scores were acceptable, but varied from moderate to good, and may improve with the addition of more items.
Future research on body checking will benefit from inclusion of male-specific forms of body checking. The MBCQ represents the first attempt to measure this behavior for men. However, more psychometric evaluation is needed both for the MBCQ and for the general construct of body checking. Undoubtedly affect, either as a precursor to and/or consequence of body checking behavior, plays a significant role in this behavior. However, affect is not overtly measured in either BCQ or MBCQ, despite evidence of its relevance to body checking.14,15 Furthermore, the ability of the existing body checking measures to assess change in behavior over time is unknown, as is the predictive validity of these measures for relevant negative outcomes. Finally, the relationship between self-report and behavioral measures of body checking needs to be examined to increase our confidence that these self-report assessments are measuring actual behavior.
Supported by K23 024034-01A1 from NIDA.
aFinal MBCQ available from first author by request.