|Home | About | Journals | Submit | Contact Us | Français|
This study examines whether content overlap artificially inflates estimates of the associations of emotional disorders with neuroticism and whether disorder-specificity of prediction exists. We demonstrated a statistical approach for testing the validity of hypothesized facets of neuroticism. In a sample of 627 adolescents, we indentified six facets of neuroticism, one intermediate facet, and a general neuroticism factor (GNF). Only the GNF and the depression facet were significantly associated with depressive symptomatology. The GNF and all facets significantly predicted anxiety symptomatology. This study offers a new statistical approach for addressing content overlap, testing for disorder specific prediction and identifying facets of a broad personality trait, while indicating that content overlap does not largely explain the associations of neuroticism with psychopathology.
An examination of neuroticism (N) and its ubiquitous association with most (if not all) forms of mental distress leads one to question whether item overlap of neuroticism with measures of anxiety and depression has artificially inflated estimates of these associations (see Ormel, Rosmalen & Farmer, 2004). Though addressing this problem might seem relatively straightforward via the deletion of potentially overlapping items, Nicholls, Licht, and Pearl (1982) criticized this approach and argued that the issue of item overlap actually presents a thorny dilemma to researchers. On the one hand, correlations between these measures might be artificially inflated due to overlapping items. On the other hand, Nicholls et al. argued that removing items potentially decreases the construct validity of the N measure. That is, if the potentially overlapping items tap what is theorized to be a facet of the construct, removing these items from one of the measures makes that measure less useful for examining the correlation between the two constructs.
In addition to the potential item overlap of N, Claridge and Davis (2001) raise a second problem in using N to predict emotional disorders concerning the non-specific nature of the association between N and emotional disorders. That is, though elevations on N allow us to predict an increased risk of developing an emotional disorder, they don't allow for predictions of the specific form of disorder an individual is likely to develop. Claridge and Davis advocate for the identification of “the various expressions of N” that “have variance unique to themselves” - that is, particular facets of N, that have more specific power in predicting disorders, As an example of a disorder-specific risk factor, Claridge and Davis offer schizotypy as a predictor of schizophrenia. Schizotypy likely overlaps with N but provides a more specific predictive pathway between vulnerability and disorder. Following the logic of Claridge and Davis, deleting N items that potentially overlap with the emotional disorder being predicted might not only create the problems discussed by Nicholls et al. (1982) but might also hamper the important search for disorder specific predictors.
The current study tests the extent to which item overlap accounts for the associations of N with both depression and anxiety. This study does so without removing overlapping items from measures and thereby also permits tests of likely disorder specific predictors. We begin with a brief overview of neuroticism, followed by a review of past findings concerning the associations of neuroticism with depressive and anxiety disorders (referred to collectively below as the emotional disorders). Next, we present the argument that neuroticism has item overlap with measures of emotional disorders. Finally, we present the results from a structural equation modeling approach to addressing item overlap without removing overlapping items that also provides a statistical basis for testing likely disorder specific predictors and for identifying facets of broad personality traits.
N has been defined as a relatively enduring palette of characteristics associated with a proclivity for experiencing negative mood states and sensitivity to stress (e.g., Eysenck, 1970; Costa & McCrae, 1987). According to Costa and McCrae (1992), N can also be broken down into six lower-order facets (anxiety, depression, anger, self-consciousness, vulnerability, immoderation/impulsivity). Though different measures of N vary with respect to the number and nature of the facets, there is nearly universal agreement that N encompasses the propensity to experience anxiety and depression (e.g., Eysenck & Eysenck, 1975; Goldberg, 1981; Goldberg, 1999). Not surprisingly, N has been shown to predict increased risk for the development of depressive disorders. For example, longitudinal studies have demonstrated that N predicts the overlap. That is, Ormel and colleagues noted that the content of some N items was very similar to items on measures of depression and anxiety (or emotional distress), but was lacking specific qualifiers of frequency, intensity, and duration. They argued that N is merely a measure of an individual's typical degree of distress, and therefore, the association of N with emotional distress may simply tell us that individuals who report feeling depressed and anxious will continue to report feeling depressed and anxious in the future (also see Claridge & Davis, 2001).
If there is significant item overlap between N and the emotional disorders that N putatively predicts, then predicting the disorders from their potentially tautological items might demonstrate little more than the stability of those items (Nicholls, Licht, & Pearl, 1982). This overlap is especially salient for specific facets of N that are named for the disorder with which they putatively overlap (e.g., depression and anxiety facets). For the present study, content analysis of the depression and anxiety facets involved a qualitative, semantic comparison of N items to DSM-IV criteria for the relevant disorder. For example, particular items that load on the depression facet of N (such as “dislike myself” and “have a low opinion of myself”) could be interpreted as paraphrases of the “feelings of worthlessness” criterion for a major depressive episode or the “low self-esteem” criterion for dysthymic disorder. Similarly, items from the anxiety facet of N (such as “worry about things” and “fear for the worst”) are similar to the “excessive worry” symptom of generalized anxiety disorder (for further examples, see Table 1). It is important to note that the problem highlighted by Nicholls et al. (1982) did not define item overlap as precise agreement of the items between two measures. Specifically, Nicholls et al. clarified that the N items and DSM-IV symptom criteria in question need not be worded exactly the same; as long as some qualitative overlap is perceived, then the argument that item overlap may be artificially inflating correlations has viability (Nicholls et al., 1982). Given that several items for the depression and anxiety facets on this and other N scales appear to be synonyms for symptoms of their associated disorders, the idea that there is content overlap between these variables is supported by our content analysis.
The present study will address both the problem of N nonspecifity as well as that of item overlap between N facets and their associated pathologies, by examining the specific predictive power of facets of N in comparison to that of the general N factor. That is, implementing a novel statistical approach as an alternative to deleting potentially overlapping items, the present analyses will compare a series of structural equation models (SEMs) that include both potentially overlapping facets of N (POFs) and non-overlapping facets of N (NOFs), a general N factor (GNF), and the emotional disorders. N is represented hierarchically in these models with the GNF representing the variance of N that is common to all items. These hierarchical models will be used to estimate the contribution of a POF (either the depression or anxiety facet, depending on the analysis) in comparison with both the NOFs and the GNF, without the methodological problems associated with deleting items. If the POFs of N do entirely or even largely account for N's relationship with emotional disorders, interpretations of these relationships become conceptually problematic. More specifically, when the POFs show the strongest predictive power and the GNF shows only weak predictive power (if at all), one could achieve specific prediction for purely applied purposes but N's relationship with emotional disorders may demonstrate little more than a tautology. On the other hand, if the POFs add to the GNF in accounting for N's relationship with their respective emotional disorders but the GNF is the strongest predictor, it will constitute initial evidence for disorder specific prediction while also allowing one to rule out tautology as the sole explanation for N's relationships with those disorders. Finally, if the POFs do not add to a strongly predictive GNF, we could rule tautology as an explanation for N's relationships but we would have no evidence for specificity.
As noted earlier, there is not general agreement about the facets of N. Indeed, the field has not yet reached consensus regarding the facets of any of the Big 5 trait dimensions. Of course, a preliminary criterion for a proposed facet is that it correlates with the general factor underlying the domain that it is hypothesized to be a facet of. A further criterion that might be used in deciding whether a proposed facet should be retained in a consensual model of the facets of a trait is incremental validity. If a hypothesized facet not only correlates with the general factor underlying the domain but also has incremental validity (above and beyond the general factor it is associated with) in predicting a meaningful outcome this would constitute evidence that it is useful to recognize that facet as a subcomponent of its trait dimension. In contrast if many tests of the incremental validity of a hypothesized facet all fail then parsimony would dictate that the items comprising that hypothesized facet be regarded simply as indicators of the general factor associated with the trait rather then being indicators of a meaningful subcomponent of the trait. The hierarchical factor model therefore also provides a useful method for identifying facets that are meaningful to distinguish from their general factors.
Thus, the purpose of this study was to apply a specific statistical model - the hierarchical factor model (e.g., McDonald, 1999; Zinbarg, Revelle, Yovel & Li, 2005) - to determine the unique effects of the POFs (the depression and anxiety facets) and the GNF of N in accounting for the established relationship between N and the emotional disorders without deleting potentially overlapping items. In so doing, we aim to avoid compromising the validity of the constructs and hampering the search for disorder specific prediction that might ensue from the removal of overlapping items. An addition aim of this study was to provide initial evidence identifying facets that should be recognized in a consensual model of the subcomponents of N. In so doing, we hope to demonstrate the utility of the hierarchical factor model as a general tool for testing the incremental validity of the facet level constituents of broad personality traits. (In contrast, a higher-order factor model would be much more useful for testing whether a hypothesized facet correlates with a general factor.)
Participants for the present study were recruited through their local high schools, as part of an ongoing longitudinal study on personality, cognitive, biological, and life stress risk factors for emotional disorders in late adolescents (e.g., Sutton et al., 2008; Zinbarg et al., 2008). All high school juniors in two local high schools (in suburban Chicago and Los Angeles) were initially invited to participate. Over three years, 1,976 invited participants completed the Neuroticism scale of the Eysenck Personality Questionnaire-Revised (EPQ-R; S. B. Eysenck, Eysenck, & Barrett, 1985). To increase the number of participants at high risk for developing an emotional disorder, students scoring in the upper tertile were then over-selected as potential participants for the present study. This behavioral high risk design was intended to overcome statistical power problems associated with the low base rates of individual disorders in community samples. A total of 627 participants were invited, gave consent, and completed baseline assessments. All participants were paid $40 for their participation in the present study. In the present sample, 59% scored in the upper tertile on the EPQ-R N scale, 23% scored in the middle tertile, and 18% scored in the bottom tertile.
The total study sample consisted of 195 males and 432 females. At the time of baseline assessment, the participants were 15 - 18 years old (M = 16.9, SD = .39). The racial makeup of the total sample was as follows: Caucasian, n = 302 (48.2%); Hispanic/Latin American, n = 96 (15.3%); African-American, n = 82 (13.1%); more than one ethnicity, n = 82 (13.1%); other, n = 34 (5.4%); Asian-American, n = 27 (4.3%); Pacific Islander, n = 4 (0.6%).
Participants were administered the SCID-I/NP (First et al., 2002) to assess lifetime Axis I diagnoses. Qualifications for administering this interview included the completion of extensive training procedures (described in more detail below). Interviewers were postdoctoral fellows, doctoral students in clinical psychology, and full-time research assistants who had completed a bachelor's degree in psychology. In addition, participants completed a battery of questionnaires, either immediately following the interview or during a scheduled meeting shortly thereafter.
The EPQ-R N scale (Eysenck et al., 1985) was used as the initial screening questionnaire for the present study. This consisted of 22 items1 in a yes-no format. Scores on this scale can range from 0 to 22, with higher scores indicative of a higher level of neuroticism. Coefficient alpha in the present study was .79. The extensive construct validity evidence for this instrument is reported in the EPQ-R manual (Eysenck & Eysenck, 1975).
The IPIP NEO-PI-R N scale (Goldberg, 1999) is a 60-item self-report scale, similar to the N scale of the NEO-PI-R (McCrae & Costa, 1992). Participants were asked to rate IPIP NEO-PI-R N items on a 5-point scale (1 = “Very Inaccurate”, 5 = “Very Accurate”). The IPIP NEO-PI-R N scale includes 6 subscales, each comprised of 10 items. All six IPIP NEO-PI-R N subscales have been reported to yield substantial internal consistency (α > .77), and correlate with their corresponding NEO-PI-R subscale (r > .70). Coefficient alpha in the current study was .95, and coefficient omegahierarchical (ωh) was estimated to equal .81.
The SCID-I/NP (First et al., 2002) uses DSM-IV (American Psychiatric Association, 2000) criteria in a semi-structured interview to facilitate reliable diagnoses of current and lifetime Axis I disorders. All interviewers administering the SCID-I/NP had completed extensive training that included didactics, self-study, and role plays. Interviewers in training were also required to match diagnoses to a gold standard diagnostic rating determined by the principle investigators in three of five consecutive audio recordings of interviews before being approved to conduct interviews independently. All SCID-I/NP diagnoses were presented at weekly supervision and consensus meetings led by doctoral-level supervisors. To maintain consistency across sites, cases which represented the most difficult differential diagnostic decisions were presented at weekly teleconferences, and supervisors periodically participated by telephone in supervision meetings conducted at the other site.
Inter-rater reliability for categorical DSM-IV diagnoses was assessed by having trained interviewers observe live assessments on a subset of 69 cases. At the end of each assessment, the interviewer left the interview room and the reliability assessor was given an opportunity to ask follow-up questions to clarify any information that was unclear. Cohen's (1960) kappa was good to acceptable when aggregated across all disorders (κ = .82), as well as for the individual disorders - including major depressive disorder (κ = .83), social phobia (κ = .65), generalized anxiety disorder (κ = .85), and obsessive compulsive disorder (κ = .85) - for which the diagnosis was assigned in three or more cases by either the primary interviewer or the reliability assessor. Among the present sample, the numbers of participants meeting diagnostic criteria for a current Axis I disorder are as follows: major depression (n = 24), dysthymia (n = 7), generalized anxiety disorder (n = 19), social phobia (n = 46), specific phobia (n = 28), panic disorder (n = 2), obsessive-compulsive disorder (n = 7), and posttraumatic stress disorder (n = 5).
Upon completion of each SCID-I/NP assessment, interviewers not only assigned categorical DSM diagnoses but also rated the severity of each current diagnosis for the past month. The CSR was based on the number and frequency of presenting symptoms, as well as the reported distress and impairment of the participant (Di Nardo & Barlow, 1988). Ratings were made on a 1 to 8 Likert scale, using the following descriptions: scores of 1 and 2 indicated that at least some DSM symptoms had been present in the past month but that impairment and distress were sub-clinical, a score of 3 that the symptoms were of possible clinical significance, and scores of 4 or above indicated that symptoms were clearly associated with clinically significant distress or impairment in the past month. Intra-class correlation coefficients of CSRs from the same 69 SCID-I/NP interviews ranged from .74 (current major depressive disorder, Specific Phobia-Natural subtype) to .97 (OCD, PTSD). The two CSR variables used in the current study are the average CSRs across all (a) depressive (major depressive disorder, dysthymia, depressive disorder not-otherwise-specified) and (b) anxiety disorders (social phobia, panic disorder, obsessive-compulsive disorder, posttraumatic stress disorder, specific phobia, generalized anxiety disorder, anxiety disorder not-otherwise-specified).
The IDD (Zimmerman, Coryell, Corenthal, & Wilson, 1986) is a 22-item questionnaire that assesses major depressive disorder symptomatology via self-report. Zimmerman et al. (1996) demonstrated significant convergent validity between the IDD and traditional measures of depression, such as the Beck Depression Inventory (Beck, Ward, Mendelson, Mock & Erbaugh, 1961) and the Hamilton Rating Scale for Depression (Hamilton, 1967). The IDD has also been shown to yield alpha reliability estimates of 0.90 or higher (e.g., Zimmerman et al., 1986). Internal consistency for the present sample was estimated using Cronbach's alpha, indicating high reliability for this measure (α = .88).
Participants were also administered the MASQ (Watson, Weber, Assenheimer, Clark, et al., 1995) as an additional measure of emotion symptomatology. Each of the 90 items of the MASQ is measured on a 5-point Likert scale, and the MASQ contains the following five subscales: General Distress, General Anxiety, General Depression, Anxious Arousal, and Anhedonic Depression. All of the above subscales have demonstrated strong internal consistency (α >.80) in student, adult, and patient samples (Watson et al., 1995). Additionally, the Anxious Arousal and Anhedonic Depression subscales have demonstrated adequate convergent and discriminant validity (Watson et al., 1995). For the data reported here, only the General Depression (GenDep), Anhedonic Depression (AnDep), General Anxiety (GenAnx), and Anxious Arousal (AA) subscales were used. Internal consistency for these four subscales in the present sample was estimated with Cronbach's alpha and found to be high (GenDep = .92, AnDep = .90, GenAnx = .85, AA = .89).
Participants in the current study completed the self-consciousness (SC) subscale of the SPS (Mattick & Peters, 1988), in order to provide a measure of social anxiety symptoms. Individuals rated all 13 items on a 5-point Likert scale, with all items describing situations that involve being observed or evaluated by others (e.g., public speaking, eating or writing in public). Using factor analyses and diagnostic group profile analyses, Zinbarg and Barlow (1996) found this subscale to be an excellent marker of social anxiety. Internal consistency for the SPS SC subscale has been estimated to be high with Cronbach's alpha (α = .92; Zinbarg & Barlow, 1996), and was similarly strong in the present sample (α = .89).
To assess specific fears of animals, heights, blood, and injury, participants were administered a total of 10 items comprising three subscales of the FSS-II (Geer, 1965) identified by Zinbarg and Barlow (1996). The 50-item FSS-II scale describes objects, events, or situations that may elicit fear in some individuals. Using factor analyses and diagnostic group profile analyses, Zinbarg and Barlow (1996) found these subscales to be excellent markers of specific fears. For the present data, internal consistency for a composite of these items was strong (α = .82).
The SFQ is a 22-item scale, which was adapted from the Albany Panic and Phobia Questionnaire (APPQ; Rapee, Craske, & Barlow, 1995). Half of the items in the SFQ were designed to assess interoceptive fears, and the other half were designed to assess agoraphobic fears. The interoceptive and agoraphobic subscales of the SFQ have been found to correlate highly with one another, r = .78 (Zinbarg & Barlow, 1996). Internal consistency for the SFQ scale in the present sample was estimated with Cronbach's alpha and found to be strong (α = .89).
An initial confirmatory factor analysis (CFA) measurement model was estimated to evaluate the latent structure of the IPIP-NEO-PI-R N scale. To the best of our knowledge, no previous CFA (or exploratory factor analysis) has been completed on this scale, for neither adolescent nor adults samples. Thus the following analyses provide insight into the structure of the scale in adolescents, and whether it conforms to a hierarchical model similar to the model hypothesized by the creators of the NEO-PI-R which was based on adult participants (Costa & McCrae, 1992). Four measurement models were initially compared, including a one-factor model, a six-factor orthogonal model, a six-factor oblique model, and a hierarchical model with six group factors and one general factor for which all factors were constrained to be orthogonal. The Mplus SEM program (Muthen & Muthen, 1998-2007) was used with maximum likelihood estimation. Factor models with 6 group factors were included because the IPIP-NEO (Goldberg, 1999), as well as the NEO-PI-R (Costa & McCrae, 1985), are hypothesized to consist of 6 facets: anxiety, anger, depression, self-consciousness, immoderation/impulsivity, and vulnerability. The best fitting measurement model was the hierarchical model, which yielded fit indices of χ2 (1649) = 3998.82, p < .001, comparative fit index (CFI) = .82, root mean square error of approximation (RMSEA) = .05 [90% confidence interval (CI): .05-.054], and standardized root mean square residual (SRMR) = .06 (fit indices using the Mplus categorical item approach to CFA with weighted least squares mean and variance adjusted estimation were similar; CFI = .85, RMSEA = .09, SRMR = .06). The hierarchical model provided a significantly better fit than both the 1 factor model [χ2(61) difference = 2707.42, p < .001] and the 6 factor-orthogonal model [χ2(60) = 2114.02, p < .001].
Although researchers have cautioned against the rigid adherence to conventional cutoff values for fit indices (Marsh, Wen, & Hau, 2004), the CFI (.82) for the hierarchical model is not close to conventional standards (> .90) for what is considered a good fitting model. As a result, modifications were made to the model. The sample was therefore split in half and, based on the modification indices in the first sub-sample, changes were made to the measurement model that were then tested with the second sub-sample. First, 21 items were deleted based on correlated errors, leaving 39 items remaining in the measure. Second, three items were moved to different facets based on their high factor loadings on these other factors. Thus, “Panic easily” was moved from vulnerability to anxiety, “Have frequent mood swings” was moved from depression to anger and “Am comfortable in unfamiliar situations” was moved from self-consciousness to vulnerability. Finally, modification indices suggested significant residual correlations among the items loading on the anxiety, self-consciousness, and vulnerability group factors. To address this problem, a mid-level factor representing a broad anxiety facet of N was inserted into the model. All items loading on the anxiety, self-consciousness, and vulnerability facets also loaded on the broad anxiety facet (BroadAnx). BroadAnx represents a group factor locating at midlevel between the facet-level and the general factor. It consists of the shared variance among the anxiety, self-consciousness, and vulnerability facets (including the content that is specific to these items) that is not shared with the other three facets. This model was then tested on the other half of the sample, yielding fit indices of χ2 (648) = 1105.05, p < .001, CFI = .89, RMSEA = .05 (90% CI: ..04-.05), and SRMR = .05. The fit indices for the model on the entire sample are χ2 (648) = 1488.66, p < .001, CFI = .90, RMSEA = .05 (90% CI: .04-.05), and SRMR = .05. The factor loadings for each item in the full sample on both the group and general factors are shown in Table 2.
A measurement model using CFA was then estimated using maximum likelihood to evaluate a unidimensional model of the latent structure of depressive symptomatology (DEP-SXS) as measured by the IDD, MASQ-GD, MASQ-AD, and the average CSR for current major depressive disorder, dysthymia, and depressive disorder-not otherwise specified (Dep-CSR). A correlation table displaying the correlations among the DEP-SXS variable and the N facets is shown in Table 3. Participants without a diagnosis received a score of `0' for the Dep-CSR variable. Factor loadings for each of these measures on DEP-SXS were as follows: Dep-CSR = .16, IDD = .36, MASQ-GD = .73, MASQ-AD = .45. This model had satisfactory fit indices of χ2 (2) = 8.07, p < .05, CFI = .99, RMSEA = .07 (90% CI: .03-.13), and SRMR = .02.
A measurement model using CFA was then estimated using maximum likelihood to evaluate a unidimensional model of the latent structure of anxiety symptomatology (ANX-SXS) as measured by the MASQ-GA, MASQ-AA, SPS, FSS, SFQ, and the average CSR for all current anxiety disorders and anxiety disorder-not otherwise specified (Anx-CSR). A correlation table displaying the correlations among the ANX-SXS variable and the N facets is shown in Table 3. Again, participants without an anxiety disorder diagnosis received a score of `0' on the Anx-CSR variable. Factor loadings for each of these measures on ANX-SXS are as follows: Anx-CSR = .21, MASQ-GA = .34, MASQ-AA = .29, SPS = .55, FSS = .60, SFQ = .66. This model had satisfactory to excellent fit indices of χ2 (8) = 59.05, p < CFI = .96, RMSEA = .1 (90% CI: .08-.13), and SRMR = .04.
We completed a series of planned comparisons among six structural models to assess the contribution of the potentially overlapping facet (POF), the non-overlapping facets (NOFs), and the general N factor (GNF). In analyses for which DEP-SXS was the criterion variable, the depression facet was the POF and the other six facets were the NOFs. In analyses for which ANX-SXS was the criterion variable, the anxiety facet was the POF and the other facets were the NOFs. A structural model displaying all possible measured pathways is shown in Figure 1. The general model in Figure 1, as well as each of the 6 structural models described below, applied to both DEP-SXS and ANX-SXS, although the DEP-SXS analyses and ANX-SXS analyses were done separately.
Model 1: Full Pathway Model. In the full pathway model all paths (a-h) were estimated and allowed to vary. In this model the POF, NOFs, and the GNF predicted the symptomatology of interest.
Model 2: Facets Only Model. In the facets only model, only the facet regression weights (paths a-g) were estimated. This included both the POF and the NOFs.
Model 3: GNF Only Model. In the GNF only model, only the GNF regression weight was estimated (h). This model examined the predictive power of the GNF only.
Model 4: Equal Facets Model. In the equal facets model, all pathways were estimated (a-h), but all the facet pathways (a-g) were constrained to be equal to one another.
Model 5: Unconstrained POF Model. In the unconstrained POF model, all pathways were once again estimated (a-h). The NOFs (b-g) were constrained to be equal to one another, while the POF (a) and the GNF (h) were free to vary.
Model 6: GNF Plus Specific Pathway Model. In the GNF plus specific pathway model, only the POF (a) and the GNF (h) regression paths were estimated.
A series of planned comparisons were completed among the models to determine the predictive power of the POF, the NOFs, and the GNF. The comparisons were conducted in sequential order, with the decision to carry out each subsequent set of comparisons dependent upon the results of the prior set of comparisons. The first two comparisons were used to determine if both the facets and GNF predicted DEP-SXS. The next two comparisons tested whether the POF had more predictive power than the NOFs. The last comparison tested whether the NOFs had any predictive power above and beyond the POF.
The first set of comparisons was done to determine if both the facets and the GNF contributed to predicting DEP-SXS. The full pathway model provided a better fit than the facets only model, χ2(1) difference = 462.38, p < .001, indicating that the GNF added to the prediction of DEP-SXS. The full pathway model also provided a significantly better fit than the GNF only model, χ2(7) difference = 108.39, p < .01. This indicated that 1 or more of the facets add to the prediction of DEP-SXS.
Next, we tested whether all of the facets had equal predictive power, or if the POF had more predictive power compared to the NOFs. First, the full pathway model was compared to the equal facets model. The full pathway model provided a significantly better fit than the equal facets model, χ2(6) difference = 80.96, p < .001, indicating that the facets did not have equal predictive power. Because this result suggests that all facets do not have equal predictive power for DEP-SXS, we then tested whether the POF had more predictive power than the NOFs by comparing the unconstrained POF model to the equal facets model. This comparison was significant, χ2(1) difference = 70.48, p < .001. The regression weights (rw) for the unconstrained POF model showed that the POF (rw = .34) had more predictive power than the NOFs (rw = .01).
Next, we tested if the NOFs had any predictive power for DEP-SXS. We did this by comparing the full pathway model to the GNF plus specific pathway model. The two models were not significantly different from one another, χ2(6) difference = 11.22, ns. Thus, these data failed to provide evidence demonstrating that the NOFs explained additional variance in the prediction of DEP-SXS above and beyond the POF and the GNF. The GNF plus specific pathway model was therefore chosen as our final model, given that it was more parsimonious than the full pathway model, without being associated with a significant decrement in fit. We then ran an additional model, which was identical to the GNF plus specific pathway model, except that the GNF and depression facet pathways were constrained to be equal. The GNF plus specific pathway model was a significantly better fit than this last model, χ2(1) difference = 83.52, p < .001, indicating that the GNF was a significantly stronger predictor than the depression facet. The GNF plus specific pathway model of the prediction of DEP-SXS is shown in Figure 2.
We planned a similar series of comparisons among the models to determine the predictive power of the POF, the NOFs, and the GNF when predicting ANX-SXS. Again, the comparisons were conducted in sequential order, with the decision to carry out each subsequent set of comparisons dependent upon the results of the prior set of comparisons.
The first set of comparisons was done to determine if both the facets and the GNF contributed to the predication of ANX-SXS. The comparison of the full pathway model to the facets only model was significant, χ2(1) difference = 259.20, p < .001, indicating that the GNF added to the prediction of ANX-SXS. The full pathway model also provided a significantly better fit than the GNF only model, χ2(7) difference = 17.01, p < .05. This indicated that 1 or more of the facets add to the prediction of ANX-SXS.
We then tested whether all of the facets had equal predictive power. If the full pathway model had been a significantly better fit than the equal facets model, we would then have tested whether the POF was a better predictor than the NOFs. However, the full pathway model did not significantly differ from the equal facets model, χ2(6) difference = 6.31, ns. Therefore, we did not have any evidence indicating that any one facet, including the POF, had more predictive power than any other facet and no further comparisons were completed. The equal facets model was therefore chosen as our final model for the prediction of ANX-SXS, given that it is more parsimonious than the full pathway model without being associated with a significant decrement in fit. We then ran an additional model constraining the GNF and all facets to be equal. The equal facets model was a significantly better fit than this last model, χ2(1) difference = 172.44, p < .001. This indicates that the GNF (rw = .70) is a significantly stronger predictor than the facets (rws = .06). The equal facets model of the prediction of ANX-SXS is displayed in Figure 3. While none of the facets predicted ANX-SXS more than any other facet, it is interesting that only two of the facets had regression weights that were significantly greater than zero in the full pathway model. Self-consciousness had a regression weight of .14 (95% CI: .03-.24) and BroadAnx had a regression weight of .11 (95% CI: .001-.21).
The present study addressed both the problem of neuroticism (N) nonspecifity as well as that of item overlap between N facets and their associated pathologies, by implementing a novel statistical approach as an alternative to deleting potentially overlapping items. The former problem relates to the ubiquity of N predicting various sorts of mental distress and pathology, resulting in the need to identify disorder-specific risk factors within N (Claridge & Davis, 2001). Our structural equation modeling technique also allowed us to test the convergent and discriminant validity of the facets of N as described by Costa and McCrae (1985). The results of these analyses demonstrate that the POFs do not entirely, or even largely, account for the relationship between N and the emotional disorders. Rather, the GNF - which represents variance common to all the facets including the NOFs - largely accounted for associations with both depressive and anxiety symptomatology. These results lead us to conclude that the relationships between N and the emotional disorders are not due entirely to content overlap. Results also demonstrated the utility of the depression facet of N in specifically predicting depressive symptomatology, above and beyond the GNF. Given that the depression facet was not more strongly predictive than the other facets in predicting anxious symptomatology, the pattern of results across both types of symptoms demonstrate the incremental, convergent and discriminant validity of the depression facet.
For depressive symptomatology, the GNF plus specific pathway model was chosen as the final model, because it was the most parsimonious of the best fitting models. These results support a model in which the cross-sectional relationship of N with depressive symptomatology is largely accounted for by the GNF, with a smaller contribution from the depression facet. Based on these results, we conclude that the cross-sectional relationship of N with depressive symptomatology is not an artifactual one carried entirely, or even largely, by an overlapping depression facet, but is instead largely driven by the GNF.
The depression analyses did show that the predictive power of the depression facet was non-trivial and significantly greater than that of any other facets. By contrast the depression facet did not make a significantly stronger contribution than the other facets to the prediction of anxious symptoms. This pattern demonstrates the incremental, convergent and discriminant validity of this specific facet and is evidence of a disorder-specific factor of the type that has been called for by Claridge and Davis (2001). This result may have specific implications for past and future studies that examine the relationship between N and depression. First, studies that do not discriminate between the effects of the depression facet and the GNF may slightly overestimate the relationship between N and depression. Future research might incorporate methods, such as the one demonstrated in this study, to assess the relationship between N and depression that take into account the potentially overlapping nature of the depression facet. Second, if a study wants to predict the outcome of depressive symptomatology, the depression facet may be a more specific predictor than broad N.
For depressive symptomatology, the NOFs did not seem to be particularly useful, contributing nearly zero predictive power. This is intriguing, because some of the NOFs might be hypothesized to share a stronger relationship with depression. For example, depressive and anxiety disorders are often correlated, so it might be expected that the anxiety facet would have a significant relationship with depressive symptomatology above the GNF. Because this was not the case (the regression weight for the anxiety facet in the full pathway model predicting depression was .02), we might conclude that the GNF (variance common to all facets) is important in linking depressive symptoms and disorders to the anxiety facet. In other words, that which is specific to the anxiety facet of N does not appear to contribute to depressive symptoms.
For the prediction of anxiety symptomatology, the equal facets model was accepted as the final model, because it was the most parsimonious of the best fitting models. However, it also must be noted that the regression weights for the facets are very small (see Figure 3). Therefore, it appears that the GNF largely accounts for the relationship of N with anxiety symptomatology, and that this relationship is not an artifactual relationship carried entirely, or even largely, by the anxiety facet. Although we did find excellent fit indices for a unifactorial measurement model of anxiety, it is possible that the measures which comprised this model contain specific, reliable variance that correlate with other N facets (e.g., SPS and the self-consciousness facet). This might explain why predictive power was shared among the facets, without a single facet carrying more of the weight. When the regression weights of the facets were not constrained to be equal, we did find that two facets, self-consciousness and BroadAnx, had regression weights that were significantly greater than zero. We may have simply lacked adequate statistical power to demonstrate that either or both of these facets was a stronger predictor than the other facets. What is clear is that the neither the narrow nor the broad anxiety facets appear to pose a consequential problem for studies examining the relationship between N and anxiety symptomatology. That is, just as with the analyses of depressive symptomatology, the association of N with anxiety symptomatology was largely driven by the GNF. Additional research is necessary to conclude with confidence that any of the facets have specific predictive power in analyses of anxiety symptomatology.
It is also worth noting that our results support a hierarchical model of the IPIP NEO-PI-R N scale in adolescents, similar to the structure proposed by Goldberg (1999) and to that proposed by Costa and McCrae (1992) for the NEO-PI-R N scale. However, our CFA results led us to make changes to the scale as it was presented in the IPIP. This included deleting 21 items, moving 3 items from one facet to another, adding a intermediate breadth factor (BroadAnx) and having certain items load only the general factor. Because of the improved fit indices found in the current study (with the addition of these alterations), future studies examining an adolescent population may wish to use our altered version of the IPIP-NEO-PI-R-N scale, though these studies should keep in mind that our sample included a disproportionate amount of high-risk participants. It also must be kept in mind that results may differ by the selection of facets. Thus, it is an open empirical question as to whether similar results would be obtained using a model of the facets of N other than the one underlying our altered version of the IPIP-NEO-PI-R-N scale. Additionally, we should acknowledge that, despite our careful analyses to partition the variance into facets, group, and a general factor and distinguish the POFs from the NOFs, our measurement of these variables is not perfect. The structural equation models used in these analyses are subject to the same limitations of structural equation modeling in general (Little, , Lindenberger, & Nesselroade, 1999; Tomarken & Waller, 2003). For example, it could be that items differ in the degree to which they overlap with disorders as opposed to some items having overlap and others not. If so, there would be no method - including ours - that could separate the potentially overlapping variance from the non-overlapping variance.
A limitation of the current study is the use of a cross-sectional design. Although our figures depict directional relationships (N predicting depressive and anxiety symptomatology), it is also known that emotional disorders affect N. Thus, our results are also consistent with a scar hypothesis, according to which psychopathology can have lasting effects on personality (Widiger & Trull, 1992). Similarly, our results are consistent with a negative reporting bias, such that current emotional disorder symptomatology may cause one to rate his or her personality more negatively (Widiger & Trull, 1992). However, the majority of research on this topic suggests the direction modeled in the current study does exist even if a path in the opposite direction also exists (e.g., Clark et al., 1994; Kendler et al., 1993; Klein, Durbin, Shankman, & Santiago, 2002). In addition, we are collecting follow-up data, which will further illuminate the above explanations among the present sample. Prospective analyses could indeed produce a difference pattern of results than in the present study.
A further limitation is the use of a non-clinical sample. It is important for future research to replicate these findings in a sample more saturated with participants experiencing clinically significant DSM diagnoses. This would allow for the models to be tested at the level of the depressive and anxiety disorders, as opposed to depressive and anxiety symptomatology.
Finally, the present study does not offer an explanation as to why N is so strongly related to depressive and anxiety symptomatology. We made the assumption that N and the emotional disorders are different constructs, though the present results do not shed light on exactly what N is. Future studies are needed to explore the underlying mechanisms that relate N to the emotional disorders.
To summarize, the GNF was the best predictor of both depressive and anxiety symptomatology, indicating that past associations were not based entirely or largely on content overlap between N and the emotional disorder criteria. However, the depression facet was a significant (albeit small) predictor of depression above and beyond the other facets and the GNF. Thus, research linking N to depression may slightly overestimate the magnitude of the GNF unless researchers account for the potentially overlapping content between the depression facet of N and depression. Further research is necessary to determine whether the potentially overlapping association involving the anxiety facet of N needs to be accounted for when estimating the relationship between N and anxiety.
This research was supported by National Institute of Mental Health Grants R01 MH65651 to Richard Zinbarg and Susan Mineka (NU) and R01 MH65652 to Michelle Craske (UCLA). Richard Zinbarg was also supported by the Patricia M Nielsen Research Chair of the Family Institute at Northwestern University.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The original EPQ Neuroticism Scale consists of 24 items. The item referring to suicidality was omitted to avoid potential ethical conflicts. The item, “Do you worry about your health” was also omitted, because preliminary analyses indicated its failure to load on any factor.