Rasch analysis of the data from the ten item scale using RUMM2020 showed a lack of fit to the Rasch model with a significant Item-Trait Interaction total chi-square (chi-square = 82.8, df = 40; p < .001). The mean residual for items was -0.50 with a standard deviation (SD) of 1.575, whereas the latter would be expected to be much closer to 1, given adequate fit to the model. The mean residual for persons was -0.287 with a SD of 0.855, indicating no serious misfit among the respondents in the sample.
Initially, the pattern of thresholds was examined to see if disordering may be affecting fit. In the current example all thresholds were ordered (Figure ). The threshold distances vary across items (see varying lengths of category one across items), supporting the use of the partial credit model for the analysis of this scale. A log likelihood ratio test statistic confirmed that this was the case (p < 0.001).
Two items initially showed misfit to model expectations, Item 8 (I have felt sad or miserable) and Item 5 (I have felt scared or panicky for no very good reason) (see Table ). Item 8 showed a Fit Residual value of -3.275 and a chi-square probability value of 0.002, less than the Bonferroni adjusted alpha value of .005, indicating significant deviation from the model expectation. The negative Fit Residual value obtained suggests a high level of discrimination, shown by the ICC for the item where observed responses are steeper than the expected curve (Figure ). Thus responses from the lowest group (low levels of depression) are below what is expected by the model and those for the highest group (high levels of depression), are above model expectation. This high negative residual is usually associated with dependency, and a high item-total correlation, signifying redundancy of the item.
Removal of Item 8 led to an improvement in fit to the model with a non-significant (Bonferroni adjusted) Item-Trait Interaction total chi-square (chi-square = 60.2, df = 36, p = 0.007). The Residual mean value for items became -0.47 with a standard deviation (SD) of 0.909, showing much better fit to the model. Individual person fit statistics showed that no respondents had residuals outside the acceptable range for the 9-item solution. Following the removal of item 8, individual item fit statistics were again reviewed, and item 5, which initially showed misfit to the model, now showed a response pattern consistent with model expectation, and was therefore retained.
In the 9-item solution the possibility of item bias was explored for the age of the mother, educational level of the mother, and the age of the child, using a Bonferroni adjusted p value of 0.003 (0.05/18). Just one of the items Item 7 (I have been so unhappy that I have had difficulty sleeping) recorded a probability value exceeding the adjusted alpha value, showing some degree of uniform DIF for age of child (see Figure ). Inspection of the DIF graph suggests that, at equal levels of depression, mothers with very young babies (6 to 12 weeks) are less likely to endorse this item. As DIF is a breach of unidimensionality, this item was also deleted. This gave a non-significant (Bonferroni adjusted) Item-Trait Interaction total chi-square (chi-square = 53.8, df = 32, p = 0.009). The Residual mean value for items was -0.467 with a standard deviation (SD) of 0.850, showing fit to the model. No DIF now existed in this 8-item scale (EPDS-8) and all items showed fit to model expectations.
Figure shows the distributions of persons and item thresholds of the revised scale, with persons on the upper part of the graph, and the item thresholds on the lower part. The average mean person location value of -2.465 suggests that the respondents were well below the average of the scale. However, for a screening instrument this is not necessarily of great concern, as the cut point for a clinical case is the key issue. The PSI Statistic was 0.804, which indicates that the scale has adequate person separation reliability.
A principal component analysis of the residuals revealed a first residual factor accounting for 1.8% of the total variance in the data, or 22% of the variance in the residuals. Two sets of items were found to load positively and negatively on the first residual component. A paired t-test indicated that neither of these two sets gave a person estimate significantly different to the other (p = 0.14) and the effect size of the difference was 0.08. Consequently the assumption of local independence is upheld, and the EPDS-8 can be considered to be a unidimensional scale.
To determine cut points on this revised 8-item scale individuals were first classified according to the original 10-item EPDS cut points [
1]. This allowed each person to be identified as not depressed (range 0–9); minor depression (range 10–12) or more major depression (range 13 or more). For minor depression a cut point of 8 or more on the EPDS-8 maximised the kappa (0.9), identifying 95% of those classified as such by the original 10-item scale. This cut point also identified 96.7% of those identified as not depressed by the original scale. For major depression a cut point of 9 or more on the EPDS-8 identified all those so classified by the original and 91.9% of those without major depression, but the kappa was lower (0.71) than a cut point of 10+ (0.86) which identified 97.2% of those classified as having major depression on the original, and 96.8% of those without major depression.
Figure shows the distribution of scores on the EPDS-8 for each group classified using the original EPDS. The cut point of 8 or more for minor depression, and 10 or more for major depression (shown as the horizontal lines on the graph) clearly separates cases with no evidence of depression, as defined by the original scale, from those with minor and major depression (Kruskal-Wallis: chi-square = 179.1; df = 2; p < .001).
The results of the above analysis suggest that an eight item version of the scale would be more psychometrically robust, in that it would be free of item bias caused by the influence of baby age on Item 7, and also removes Item 8 which showed misfit to the model. It also has high levels of agreement with the original case identification. The scale has an approximate linear range only for the raw score range of 4 to 20 (from a range of 0–24 on the EPDS-8).