Search tips
Search criteria 


Logo of blackwellopenThis ArticleFor AuthorsLearn MoreSubmit
International Journal of Methods in Psychiatric Research
Int J Methods Psychiatr Res. 2010 March; 19(1): 1–17.
PMCID: PMC2896722

Measuring disability across cultures — the psychometric properties of the WHODAS II in older people from seven low- and middle-income countries. The 10/66 Dementia Research Group population-based survey


We evaluated the psychometric properties of the 12-item interviewer-administered screener version of the World Health Organization Disability Assessment Schedule – version II (WHODAS II) among older people living in seven low- and middle-income countries. Principal component analysis (PCA), confirmatory factor analysis (CFA) and Mokken analyses were carried out to test for unidimensionality, hierarchical structure, and measurement invariance across 10/66 Dementia Research Group sites.

PCA generated a one-factor solution in most sites. In CFA, the two-factor solution generated in Dominican Republic fitted better for all sites other than rural China. The two factors were not easily interpretable, and may have been an artefact of differing item difficulties. Strong internal consistency and high factor loadings for the one-factor solution supported unidimensionality. Furthermore, the WHODAS II was found to be a ‘strong’ Mokken scale. Measurement invariance was supported by the similarity of factor loadings across sites, and by the high between-site correlations in item difficulties.

The Mokken results strongly support that the WHODAS II 12-item screener is a unidimensional and hierarchical scale confirming to item response theory (IRT) principles, at least at the monotone homogeneity model level. More work is needed to assess the generalizability of our findings to different populations. Copyright © 2010 John Wiley & Sons, Ltd.

Keywords: disability, elderly, developing countries, WHODAS II, psychometric properties


The number of people aged 60 years and over is estimated to grow worldwide to two billion by 2050 (United Nations, 2007). By 2025, approximately three-quarters will be living in low- and middle-income countries (LAMIC) (United Nations, 2007). The accompanying health transition will see a rapid increase in the burden arising from age-related chronic non-communicable diseases, mainly because of their contribution to years lived with disability (Murray and Lopez, 1996). There is a need for convenient, feasible and valid measures of disability, for population health surveys, health service evaluation and routine clinical practice. The applicability of such measures to older people, and to those living in LAMIC will also be of increasing relevance.

The World Health Organization's (WHO's) International Classification of Functioning, Disability and Health (ICF) defines disability as ‘the negative aspects of the interaction between an individual (with a health condition) and that individual's contextual factors (personal and environmental factors)’ (WHO, 2001a). Interactions include: impairments (affecting the body); activity limitations (affecting actions or behaviour); participation restrictions (affecting experience of life). The WHO Disability Assessment Schedule, version II (WHODAS II) was developed to be consistent with the ICF classification, and to identify the consequences of any type of disorder that has an impact on everyday functioning, treating all disorders at parity when determining level of functioning (Chopra et al., 2004). It was developed, field-tested and validated in 16 languages in 14 different countries (WHO, 2001b).

The psychometric properties of the 36-item WHODAS II have been explored in a variety of clinical populations including those with stroke (Posl et al., 2007), inflammatory arthritis (Baron et al., 2008), back pain (Chwastiak and Von Korff, 2003), ankylosing spondylitis (van Tubergen et al., 2003), systemic sclerosis (Hudson et al., 2008), acquired hearing loss (Chisolm et al., 2005), psychosis (Chopra etal., 2004; McKibbin etal., 2004), depression (Chwastiak and Von Korff, 2003), and among mental health service users (Chavez et al., 2005). It performed well in all of these contexts, with high internal consistency, moderate to good test-retest reliability, and good concurrent validity against indicators of disease severity, disease specific and other generic disability assessments. It seems to be at least as responsive as other generic measures to clinical change in anxiety (Perini et al., 2006), depression (Chwastiak and Von Korff, 2003; Mogga et al., 2006) and back pain (Chwastiak and Von Korff, 2003).

The 36-item WHODAS II has previously been used in two population-based studies, the World Mental Health (WMH) Surveys of adults across 16 countries (Von Korff et al., 2008) and the Kwangju community survey of physical and psychiatric morbidity among older adults in South Korea (Kim et al., 2005). Findings from these surveys again supported internal consistency and concurrent validity; in the WMH surveys WHODAS II scores were consistently correlated with the Sheehan Disability Scale (Von Korff et al., 2008). The Sheehan Disability Scale (SDS) is a self-report measure of levels of mental-health related functional impairment in primary care settings. Patients are asked to respond in a 10-point visual analogue scale how much their symptoms interfered with three domains of life (work, social life, and family life). SDS has shown high internal consistency reliability and good construct validity (Leon et al., 1997). In Korea, physical health, depression and cognitive function explained 40% of the variance in WHODAS II scores and effects of socio-demographic variables were no longer apparent after controlling for these health outcomes.

The WHO website reports a clear unidimensional structure for the 36-item WHODAS II with very high loadings of all six domain scores on a Global Disability latent variable (WHO, 2001b). For the WMH survey, confirmatory factor analysis (CFA) provided only relatively weak support for unidimensionality in the four countries in which this was carried out (Von Korff et al., 2008). However, the domain subscale scores all loaded >0.40 on a Global Disability latent variable and, to this extent the utility of a Global Disability score was supported.

The shorter 12-item ‘screener'version of the WHODAS II has been little used to date (Norton et al., 2004; Rehm et al., 1999). This is surprising, since it takes only five minutes to administer, and covers all six domains of the full WHODAS II (Rehm et al., 1999). In the WHO pilot studies, the correlation between the score from the screener and the score of the WHODAS II was 0.95, meaning that the screener explained more than 90% of the total variation of the full 36-item WHODAS II (Rehm et al., 1999). CFA indicated a unidimensional scale with good classical scaling properties. However, its lack of compatibility with item response theory (IRT) was considered to limit its cross-cultural applicability (Rehm et al., 1999). Accordingly, five of the 12 items were subsequently replaced with others from the 36-item version to improve its IRT characteristics, resulting in the version currently approved by the WHO. Changes were made in the following domains: (1) understanding and communication - items addressing difficulties in understanding and remembering were replaced with items on difficulties in concentrating and learning, (2) self-care -the item on difficulties with feeding was substituted by an item on difficulties with dressing, and (3) participation in society - items addressing difficulties in carrying out plans and in living in dignity were replaced with items addressing difficulties in joining community activities and with being emotionally affected.

Our aim in the current analysis was to explore the psychometric properties of the 12-item screener version of the WHODAS II in an epidemiological survey of older adults across a wide variety of LAMIC settings. Specifically, we wished to examine whether the 12-item WHODAS II would meet criteria for measurement invari-ance across cultures, a requirement for international comparative research, as well as to assess its factor structure and unidimensionality. The 10/66 Dementia Research Group (10/66 DRG) studies in 11 sites in seven LAMIC provide an opportunity to address these aims, at the same time redressing the imbalance in previous research towards younger participants in high income countries.



A secondary analysis of data from the 10/66 DRG surveys of representative samples of older people in seven developing countries (urban sites in Cuba, Dominican Republic and Venezuela, and rural and urban sites in Mexico, Peru, China and India) were carried out. Full details of the study protocol can be found elsewhere (Llibre Rodriguez et al., 2008; Prince et al. 2008a). Briefly, a cross-sectional one phase survey was carried out in geographically defined catchment areas. All residents aged 65 years and over were included in the survey and an informant was also interviewed. The sample size for each country was between 2000 and 3000 participants. All studies were approved by local ethical committees and by the King's College London ethical committee.

Disability assessment

A copy of the 12-item interview-administered WHODAS II screener assessment is provided in the Appendix [this and the full 36-item version are also available on the WHODAS II website (WHO, 2001b). Both versions cover six domains encompassing: understanding and communicating with the world; moving and getting around; self care; getting along with people; life activities; and participation in society. Scores for each question range from zero (no difficulty) to four (extreme difficulty/can not do). The standardized global score ranges from zero (non-disabled) to 100 (maximum disability).

Statistical analysis

For each site, we estimated the mean WHODAS II global disability score, the proportion with non-zero scores, and the 90th centile. We also calculated the mean age, and the proportion of participants reporting three or more chronic limiting physical impairments.

Principal component analysis (PCA) of WHODAS II items was based on covariance matrix of polychoric correlations used for analysis of ordinal variables (Joreskog and Sorbom, 1993; Joreskog, 1994). The cut off used to assume that an item loaded on a given factor was 0.60. A varimax rotation was carried out with an eigenvalue of one as initial extraction criterion (Castro-Costa et al. 2008). Given the apriori hypothesis of unidimensionality (Rehm et al., 1999), we then tested and compared between sites the goodness-of-fit of a one-factor solution, using CFA.

CFA and Mokken scaling analysis can be used to assess both the dimensionality of a scale, and its ability to measure the same latent trait in the same way across different groups, in our case older people from different countries and cultures. This property has been referred to as psychometric equivalence, distinct from functional and conceptual equivalence, which are a sine qua non of cross-cultural comparability. CFA models contain parameters that are (a) fixed to a certain value, (b) constrained to be equal to other parameters, and (c) free to take on any unknown value (Castro-Costa et al., 2008). In testing for psychometric invariance, two models are fitted and then compared for goodness-of-fit; one in which the factor loadings are unconstrained that is estimated separately for all sites, and the second in which they are constrained to be equal across sites, the null hypothesis being that items load to a similar extent on the same latent trait or traits across sites. Markedly superior fit of the first model would challenge the hypothesis of measurement invariance. Chi square statistics can be used to evaluate the absolute fit for each model tested in CFA, but in large data sets trivial differences in goodness-of-fit may be highly statistically significant. Therefore it is recommended to use other absolute and relative indices: the Akaike's Information Criterion (AIC: Akaike, 1987) - the lower the AIC value, the better the fit of the model (Burnham and Anderson, 1998); the Tucker-Lewis Index (TLI: Tucker and Lewis, 1973) - values near 1.0 indicate good fit and those greater than 0.90 are considered satisfactory (Dunn et al., 1993; Marsh et al., 1996); and the root mean square error of approximation (RMSEA) (Browne, 1990) -values of less than 0.05 indicate close fit and 0.05 to 0.08 reasonable fit for the model.

Mokken scaling involves the application of a non-parametric item response model (Mokken, 1971) to measure the hierarchical properties of items in a scale, assessing if the items can be ordered by degree of difficulty, so that any individual who endorses a particular item will also endorse all the items ranked lower in difficulty. Three basic assumptions are required for a monotone homogeneity model (MHM): (1) unidimensionality (one latent variable summarizes the variation in the item scores in the questionnaire), (2) local independence (after conditioning on the position on the latent trait, the item scores are statistically independent), and (3) monotonic-ity (for all items the probability of a positive response increases monotonically with increasing values of the latent trait). These assumptions being met, an individual's position on the latent trait can conveniently be estimated as the rank of the highest item in the hierarchy that they endorse, or their total number of positive responses (Dijkstra et al., 1999). Double monotonicity models (DMMs) require in addition that for any value of the latent trait, the probability of a positive response decreases with the difficulty of the item. This means that the order of item difficulties remains invariant over all values of the latent trait and thus, that the item response function curves do not intersect (Sijtsma et al., 2008; Van der Ark et al., 2007). To assess single monotonicity, we estimated Loevinger coefficients for each item (Hi) and for the whole scale (H), where values between 0.3 and 0.4 suggest weak scalability, values between 0.4 and 0.5 moderate, and values above 0.5 strong scalability. We also tested formally for violations of monotonicity (using the Stata loevH monotonicity command) and non-intersection (using the Stata loevH nipmatrix command) between pairs of items (minimum violation 0.03, alpha = 0.05), using overall criteria values as an indication of the likelihood of assumption violation; <40 ‘satisfactory', 40 to 79 ‘questionable violation', 80 and over ‘strongly suggesting an assumption violation’ (Molenaar and Sijtsma, 2000).

To establish the agreement between the difficulties (the proportions of people falling above each of the four cut-points) for each of the 12 items, we formed the correlation matrix separately for each cut-point and then pooled them using Fisher's hyperbolic arctangent transformation - performing the correlation on all four cut-points simultaneously would fail to take account of the fact that difficulties within items must be non-decreasing. Overall agreement between the 11 sites was assessed with an intraclass correlation coefficient.

Descriptive statistics and PCA were conducted with STATA 10.0 (Stata Corporation, 2007). Mokken model was calculated in STATA 10.0 after downloading the LoevH add-on program from Intraclass correlation coefficient was calculated in SPSS 15.0 (SPSS Inc., 2005), CFA was carried out with AMOS 5 (SPSS Inc., 2003), and Fisher's correlation was calculated in R (R Development Core Team, 2007).


Sample characteristics

The 10/66 Release 1.7 dataset was used in this analysis. Response rates for the survey were excellent varying from 72% in urban India to 95% in Dominican Republic (Llibre Rodriguez et al., 2008).

The 12 items contributing to the WHODAS global disability score were highly internally consistent with Cronbach's alpha varying between 0.90 and 0.97 by site (Table 1). There was considerable variation in the distribution of WHODAS II scores between sites (Figure 1 and Table 1). However, most of this was accounted for by two outlier sites, urban China with a very low proportion of non-zero scores (22.2%) and rural India, with a very high proportion (97.7%). There were also marked compositional differences between the sites in their age distributions (younger mean age in rural China, and in the Indian sites), and in the proportion of respondents reporting three or more limiting physical impairments (high proportions in Dominican Republic and Venezuela, and low proportions in Cuba, rural Peru, rural China and urban India).

Table 1
The 10/66 DRG release 1.7 – socio-demographic characteristics and the WHODAS II mean total scores (n = 14 991)
Figure 1
Distribution of WHODAS II scores by study site (box plot): o - outlier (more than one and a half box lengths above the 75th centile); x - extreme value (more than three box lengths above the 75th centile).

Principal component analyses (PCA)

The PCA gave rise to a one-factor solution in most countries, with the exception of Cuba, the Dominican Republic, rural China and rural India, where a two-factor solution was generated. The second factor varied between centres, constituting the getting around, participation in society, self care and life activities domains in Cuba; the self care and getting along with people domains in the Dominican Republic; the getting around and life activities domains and the learning item (understanding and concentration) in rural India and the standing (getting around), household tasks (life activities) and learning (understanding and concentration) items in rural China. In sites where a one-factor solution was found, this single factor explained between 69.1% and 91.1% of the variance with eigenvalues ranging from 8.29 to 10.94. In sites that generated a two-factor solution (Cuba, Dominican Republic, rural China and rural India) the first factor explained between 43.7% and 60.2% of the variance (eigenvalues between 7.69 and 9.76) and the second factor 28% to 40.4% (eigenvalues between 1.02 and 1.11) (Table 2). When data were pooled from all sites a one-factor solution emerged, accounting for 75.1% of the variance (eigenvalue = 9.01).

Table 2
Factor structure and loadingsa from principal component analysis based on polychoric correlation matrix for WHODAS II

Confirmatory factor analysis (CFA)

We next tested the goodness-of-fit, in each site, of the one-factor solution, and the different two-factor solutions arising from the PCA, using CFA. Upon inspection of the four two-factor solutions, it was clear that that arising from the Dominican Republic fitted best across all sites, other than in rural China where the locally generated two-factor solution provided a better fit. The best fitting (Dominican Republic) two-factor solution was then compared with the one-factor solution (Table 3). The one-factor solution provided only a moderately poor fit with RMSEA varying between 0.13 and 0.24 and TLI varying between 0.49 and 0.84. The two-factor solution provided a better fit in all sites with lower RMSEA (0.09 to 0.23), higher TLI (0.57 to 0.90) and lower AICs. Although the fit of the two-factor solution was generally better, TLI only exceeded 0.90 in one site, and was between 0.85 and 0.90 in a further five sites, all in Latin America.

Table 3
Goodness-of-fi t within centres between 10/66 sites

Next we tested for measurement invariance for the one-factor (Table 4) and best-fitting two-factor solution (Table 5) across all sites with loadings not constrained (model 1) and constrained (model 2). In each case, there was little difference in the goodness-of-fit of the constrained and unconstrained models, although according to the AIC the unconstrained models fitted somewhat better [one-factor solution - AIC (unconstrained) = 23455.6, AIC (constrained) = 25855.4; two-factor solution - AIC (unconstrained) = 14946.3, AIC (constrained) = 17645.4]. Again, the overall fit of constrained and unconstrained models was much better for the two-factor solution than for the one-factor solution.

Table 4
Confi rmatory factor analysis - one-factor solution without constraints, and goodness-of-fi t parameters with and without constraints
Table 5
Confi rmatory factor analysis – two-factor solution without constraints, and goodness-of-fi t parameters with and without constraints

Mokken analysis

Item and scale Loevinger H coefficients were estimated separately for each site, using a polytomous Mokken analysis. There was robust evidence that the WHODAS II and its 12 constituents items conformed to a ‘strong’ Mokken scale (Table 6). The item scalability coefficients from the Mokken analysis exceeded 0.40 for all items in all sites, and exceeded 0.50 in most cases. The coefficientH values for the scale as a whole varied between 0.52 and 0.81 by site. There were only two statistically significant violations of monotonicity, learning a new task in rural China and walking a kilometre in rural India. There were however a number of violations with respect to non-intersection (double monotonicity), all of which were statistically significant; and several of which were linked with overall criteria values >80, strongly suggesting an assumption violation. Certain items and sites seemed particularly implicated. Intersection violations were most apparent in Cuba, Dominican Republic, rural China, and rural India. With respect to items, intersection violations were most apparent for walking, standing, and learning a new task. The item difficulties (for the no difficulty/ some difficulty threshold) across the 11 sites are summarized in Table 6. Lower item difficulties (easy items) suggest a high probability of item endorsement by those with low scores on the trait. Standing (followed by walking) had the lowest item difficulty overall, while maintaining friendship and dressing had the highest. The rank order of item difficulty was similar in all sites. The correlation coefficients for the set of WHODAS II item difficulties between pairs of 10/66 sites ranged from 0.70 to 0.98, other than those that included rural China, which ranged between 0.43 and 0.61 (Table 7). Closer inspection of the pattern of item difficulties by site (Table 6) suggested some possibility of differential item functioning for the household responsibilities and learning a new task, both of which had relatively high levels of endorsement, particularly at the higher thresholds of difficulty.

Table 6
Polytomous Mokken analysis: Loevinger H values and item diffi culties by 10/66 sites
Table 7
Between site correlation coeffi cients for WHODAS II after Fisher's hyperbolic arctangent transformation

Overall, the intraclass correlation coefficient for consistency of item difficulty (location) across all 11 sites was 0.64 (95% confidence interval, 0.44-0.84).


Little research has been carried out into the psychometric properties of the 12-item ‘screening’ version of WHODAS II. The original version was found to have unfavourable hierarchical scaling properties (Rehm et al., 1999), but the revised version (WHO, 2001b), which we have used, has not previously been tested. The cross-cultural measurement properties of the WHODAS II have not previously been investigated in depth. In the modified 36 item version of the WHODAS II used in the WMH Survey, CFA (of a one-factor solution) could only be attempted in four countries (US, Israel, New Zealand and Ukraine) because of sample size limitations (Von Korff etal., 2008). In a subsequent publication from the European Study of the Epidemiology of Mental Disorders (ESEMED) network of European WMH survey sites, cross-cultural measurementinvariancebetweenMediterraneanandnon-Mediterranean countries was formally assessed using CFA (Buist-Bouwman et al., 2008). However, because of the filter questions used in the WMH version of the WHODAS II IRT analyses were not applied to these data sets. Since filter questions were not used in the WHODAS II screener, we were not restricted in this way. Another strength of the current analysis was the sampling of older people, who were sparsely represented in the WHODAS development (Rehm et al., 1999) and WMH survey data sets. Older people have a much higher prevalence of cognitive, physical and mental impairments, reflected in the much higher proportion of non-zero scores for the WHODAS II and its items. This strength is also a limitation, in that our findings may not be safely generalized to younger populations.

Is the WHODAS II a unidimensional scale?

In common with many other studies (Chisolm et al. 2005; Chwastiak and Von Korff, 2003; McKibbin et al. 2004) we found that the WHODAS II was a highly internally consistent scale, with Cronbach's alpha ranging from 0.90 to 0.97 by site. Rehm reports an excellent fit for a one-factor solution for his original version of the 12-item WHODAS II, with average factor loadings for the six domains onto a ‘global disability’ trait of 0.81 or 0.83 (Rehm etal., 1999). A similar factor structure was reported from the ESEMED survey (Buist-Bouwman et al., 2008), using a modified version of the full 36 item WHODAS II, particularly when the frequency items (not included in the 12-item WHODAS II) were omitted. However, Von Korff et al. (2008) in their analysis of the WMH survey data found that factor loadings at the dimension level were consistently lower, and concluded that ‘the fit statistics did do not support the hypothesis that the five modified WHODAS domains form a unidimensional latent variable of Global Disability'. Our findings are more consistent with those of Von Korff. However, as with the WMH analysis, we found that, for all sites other than rural India, factor loadings for the one-factor solution all exceeded 0.40, hence based on classical test theory, a meaningful unidimensional scale could be constructed by summing items. Our a priori hypothesis was for a one-factor solution, as supported by previous research. However, two-factor solutions emerged from PCA in Cuba, Dominican Republic, rural China and rural India, and one of these, from the Dominican Republic, clearly fitted much better across all sites in the subsequent CFA than did the one-factor solution. This grouped the getting around, life activities, understanding and communication, and participation in society domains in the first dominant factor and the self care and getting along with people domains in the second factor. The underlying constructs were not easily interpretable, other than on the basis of the item difficulty of the associated items. The generation of artefactual ‘difficulty’ factors from otherwise unidimensional scales is well recognized (Gillespie etal., 1987; Hattie, 1985). Our Mokken scale analysis gave further strong support for the WHODAS II as a unidimensional hierarchical scale (see later).

Does the WHODAS II conform to IRT principles?

The hierarchality of the original version of the WHODAS II screener was tested using Mokken scaling analysis, the conclusion being that it did not conform to IRT principles given only moderate scalability with significant non-intersection violation (Rehm et al., 1999). Our Mokken analysis of the revised WHODAS II screener suggests that it does conform to IRT principles, at least at the level of a MHM. Item scalability coefficients were all positive and easily exceeded the threshold of 0.3, generally accepted as signifying that items meet MHM assumptions. Coefficient H, a weighted mean of item coefficients, exceeded 0.50 in all sites, suggesting that the WHODAS II screener is a ‘strong’ Mokken scale capable of ordering individuals on the latent disability trait. Consistent with these findings, monotonicity diagnostics were satisfactory for all items in nearly all sites. It is not so clear that the revised WHODAS II screener meets the more stringent criteria for a DMM, in which the rank ordering of item difficulties is invariant across all levels of the trait. While there were a number of non-intersection violations these were few with respect to the number of active pairs (n = 7040), and the largest amongst them were close to the 0.03 threshold. Also, no non-intersection violations, or only ‘questionable’ violations were noted in five of the 11 sites. Measurement properties could perhaps be improved with respect to DMM by omitting items one (difficulty standing) and seven (difficulty walking), but to do so would remove the entire ‘getting around’ domain from the scale. It should be noted that non-intersection is not among the fundamental IRT assumptions (unidimensionality, local independence and monotonicity), and the majority of IRT models do not imply invariant item ordering (Sijtsma and Hemker, 2000). However, overall, there is strong evidence that the current revised version of the 12-item WHODAS II screener is a hierarchical, unidimensional scale conforming to IRT principles, and as such, the sum of its scores can be taken as a measure of the underlying trait.

Measurement invariance

The marked similarity of the goodness-of-fit of the one-factor and two-factor CFA models when loadings were constrained (to be equal across sites) or not constrained (estimated freely in each site) strongly supports measurement invariance with respect to a common underlying factor structure and factor loadings. Buist-Bownman et al. (2008) have previously reported measurement invariance for the full version of the WHODAS II, between Mediterranean and non-Mediterranean countries for all but one item (embarrassment) not included in the 12-item WHODAS II screener. Mokken analysis constitutes a less conservative test of measurement invariance than parametric IRT models. Nevertheless, Mokken gives information about the dimensionality of the data and the properties of the items, with an intuitive relationship between individual measurement values and item difficulty and discrimination values. ‘Dressing’ and ‘maintaining friendship’ were, consistently, the most difficult items (only likely to be endorsed by those with the highest scores on the trait), while ‘getting around’ (walking and standing) had the lowest item difficulties (likely to be endorsed by those with relatively low scores on the trait). The high inter-site correlations and overall intraclass correlation for WHODAS II item difficulties across sites supports measurement invariance with respect to common hierarchical scaling properties and common item difficulties between diverse countries and cultures. The exception, rural China, seemed to have been accounted for by the unusually high proportion in that site (19.2%) reporting at least severe difficulty with learning a new task and with household responsibilities (7.7%). This suggests the possibility of some degree of culturally determined differential item function with respect to these two items. In rural China a high proportion (73.2%) (Prince et al., 2008b) of older people lived with their family. It was noted in the course of our field research that all older people, regardless of functional ability, were given support with personal care and core activities of daily living, and were not expected to perform any more complex instrumental activities of daily living (cooking, shopping, managing household budgets). Hence taking on household responsibilities and learning new tasks would be seen as exceptional activities and, hence, possibly more challenging than in other sites, even for those with little disability.

Distributional properties in different populations

Others have remarked upon the skewed and zero-inflated character of the WHODAS II distribution, and on the large distributional differences between survey populations from different countries (Buist-Bouwman et al. 2008; Rehm et al., 1999; Von Korff et al., 2008). Zero-inflation seems not to be accounted for by the insensitive filter questions in the WMH survey version of the WHODAS II, since the phenomenon was also evident in our samples, where no filter questions were used. Furthermore, our data suggests that variation in the extent of zero-inflation may account for much of the between site variance in WHODAS II scores. Standardizing WHODAS II distributions by dichotomizing at the 90th centile for each population has been proposed (Von Korff et al., 2008). However, this has the disadvantage both of loss of measurement precision, and loss of the ability to model and explore country differences. We would propose zero-inflated negative binomial regression as the appropriate model for dealing with both overdispersion and zero-inflation, including variation in zero-inflation between samples in cross-cultural research. This model allows for ‘excess zeros’ in count models under the assumption that the population is characterized by two groups, one where members always have zero counts, and one where members have zero or positive counts. The likelihood of being a certain zero is estimated using a logit specification (an effect of country here could be interpreted as a culturally determined propensity for ‘nay-saying'), while the counts in the second group are estimated using a negative binomial specification.


In this analysis, we assessed the psychometric properties of 12-item WHODAS II in large and representative community samples of older people in 11 sites across seven LAMIC. Explanatory factor analysis gave rise to a one-factor solution in seven of the 11 sites studied. While CFA demonstrated that a two-factor solution fitted better, the underlying constructs were not easily interpretable, other than on the basis of the item difficulty of the associated items. Mokken scale analysis gave strong support for the WHODAS II as a unidimensional hierarchical scale conforming to IRT principles at least at the monotone homogeneity model level. Measurement invariance was demonstrated both for item calibrations, and for underlying factor structures and factor loadings. These are all highly desirable properties for a brief disability assessment to be used in cross-cultural comparative research. The brief 12-item version of the WHODAS II has been little used since its development, perhaps because of a more negative report on an earlier version of the scale (Rehm et al., 1999). More work is needed to assess the generalizability of our findings to younger aged samples, and to confirm the reliability and validity of the new version of the WHODAS II screener. The demonstration of robust cross-cultural measurement properties for the WHODAS II screener opens the way for further analyses to explore explanations for observed differences in the distribution of scores between sites. The effect of varying degrees of zero-inflation will need to be accounted for, and the influence of both compositional factors (age, gender, education, socio-economic circumstances, physical, mental and cognitive health) and contextual differences explored.


The 10/66 Dementia Research Group population based surveys were funded by: The Wellcome Trust (UK) (GR066133); the World Health Organization; the US Alzheimer's Association (IIRG – 04 – 1286); and the Fondo Nacional de Ciencia Y Tecnologia, Consejo de Desarrollo Cientifi co Y Humanistico, Universidad Central de Venezuela (Venezuela).


Table A1

The 12-items WHODAS II

H1How do you rate your overall health in the past 30 days?Very goodGoodModerateBadVery bad
In the last 30 days, how much diffi culty did you have in:
S1Standing for long periods such as 30 minutes?NoneMildModerateSevereExtreme/can't do
S2Taking care of your household responsibilities?NoneMildModerateSevereExtreme/can't do
S3Learning a new task, for example, learning how to get to a new place?NoneMildModerateSevereExtreme/can't do
S4How much of a problem did you have joining in community activities (for example, festivities, religious or other activities) in the same way as anyone else can?NoneMildModerateSevereExtreme/can't do
S5How much have you been emotionally affected by your health problems?Not at allMildlyModeratelySeverelyExtremely
S6Concentrating on doing something for 10 minutes?NoneMildModerateSevereExtreme/can't do
S7Walking a long distance such as a kilometre (or equivalent)?NoneMildModerateSevereExtreme/can't do
S8Washing your whole body?NoneMildModerateSevereExtreme/can't do
S9Getting dressed?NoneMildModerateSevereExtreme/can't do
S10Dealing with people you do not know?NoneMildModerateSevereExtreme/can't do
S11Maintaining a friendship?NoneMildModerateSevereExtreme/can't do
S12Your day to day work?NoneMildModerateSevereExtreme/can't do


  • Akaike H. Factor Analysis and AIC. Psychometrika. 1987;52:317–332.
  • Baron M, Schieir O, Hudson M, Steele R, Kolahi S, Berkson L, Couture F, Fitzcharles MA, Gagne M, Garfield B, Gutkowski A, Kang H, Kapusta M, Ligier S, Mathieu JP, Menard H, Starr M, Stein M, Zummer M. The clinimetric properties of the World Health Organization Disability Assessment Schedule II in early inflammatory arthritis. Arthritis & Rheumatism. 2008;59(3):382–390. DOI: 10.1002/art.23314. [PubMed]
  • Browne M. MUTMUM PC: User's Guide. Ohio State University; 1990.
  • Buist-Bouwman MA, Ormel J, De Graff R, Vilagut G, Alonso J, Van Sonderen E, Vollebergh WA. Psychometric properties of the World Health Organization Disability Assessment Schedule used in the European Study of the Epidemiology of Mental Disorders. International Journal of Methods in Psychiatric Research. 2008;17(4):185–197. DOI: 10.1002/mpr.261. [PubMed]
  • Burnham KP, Anderson DR. Model Selection and Inference: A Practical Information-theoretic Approach. Springer-Verlag; 1998.
  • Castro-Costa E, Dewey M, Stewart R, Banerjee S, Huppert F, Mendonca-Lima C, Bula C, Reisches F, Wancata J, Ritchie K, Tsolaki M, Mateos R, Prince M. Ascertaining late-life depressive symptoms in Europe: An evaluation of the survey version of the EURO-D scale in 10 nations. The SHARE project. International Journal of Methods in Psychiatric Research. 2008;17(1):12–29. DOI: 10.1002/mpr.236. [PubMed]
  • Chavez LM, Canino G, Negron G, Shrout PE, Matias-Carrelo LE, Guilar-Gaxiola S, Hoppe S. Psychometric properties of the Spanish version of two mental health outcome measures: World Health Organization Disability Assessment Schedule II and Lehman's Quality Of Life Interview. Mental Health Services Research. 2005;7(3):145–59. [PubMed]
  • Chisolm TH, Abrams HB, McArdle R, Wilson RH, Doyle PJ. The WHO-DAS II: Psychometric properties in the measurement of functional health status in adults with acquired hearing loss. Trends in Amplification. 2005;9(3):111–126. [PubMed]
  • Chopra PK, Couper JW, Herrman H. The assessment of patients with long-term psychotic disorders: Application of the WHO Disability Assessment Schedule II. Australian and New Zealand Journal of Psychiatry. 2004;38(9):753–759. [PubMed]
  • Chwastiak LA, Von Korff M. Disability in depression and back paEvaluation of the World Health Organization Disability Assessment Schedule (WHO DAS II) in a primary care setting. Journal of Clinical Epidemiology. 2003;56(6):507–514. DOI: 10.1016/S0895-4356(03)00051-9. [PubMed]
  • Dijkstra A, Buist G, Moorer P, Dassen T. Construct validity of the Nursing Care Dependency Scale. Journal of Clinical Nursing. 1999;8(4):380–388. [PubMed]
  • Dunn G, Everitt B, Pickles A. Modelling Covariances and Latent Variables using EQS. Chapman & Hall; 1993.
  • Gillespie M, Tenvergert E, Kingma J. Using Mokken scale analysis to develop unidimensional scales. Do the six abortion items in the NORC GSS form one or two scales? Quality Quant. 1987;21:393–408.
  • Hattie J. Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement. 1985;9:139–164.
  • Hudson M, Steele R, Taillefer S, Baron M, Canadian SR. Quality of life in systemic sclerosis: psychometric properties of the World Health Organization Disability Assessment Schedule II. Arthritis & Rheumatism. 2008;59(2):270–278. DOI: 10.1002/art.23343. [PubMed]
  • Joreskog K, Sorbom D. New Features in PRELIS 2. Chicago University Press; 1993.
  • Joreskog K. On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika. 1994;59(3):381–389.
  • Kim JM, Stewart R, Glozier N, Prince M, Kim SW, Yang SJ, Shin IS, Yoon JS. Physical health, depression and cognitive function as correlates of disability in an older Korean population. International Journal of Geriatric Psychiatry. 2005;20(2):160–167. DOI: 10.1002/gps.1266. [PubMed]
  • Leon AC, Olfson M, Portera L, Farber L, Sheehan DV. Assessing psychiatric impairment in primary care with the Sheehan Disability Scale. International Journal of Psychiatry in Medicine. 1997;27(2):93–105. [PubMed]
  • Llibre Rodriguez JJ, Ferri CP, Acosta D, Guerra M, Huang Y, Jacob KS, Krishnamoorthy ES, Salas A, Sosa AL, Acosta I, Dewey ME, Gaona C, Jotheeswaran AT, Li S, Rodriguez D, Rodriguez G, Kumar PS, Valhuerdi A, Prince M, Dementia Research Group Prevalence of dementia in Latin America, India, and China: a population-based cross-sectional survey. Lancet. 2008;372(9637):464–474. DOI: 10.1016/S0140-6736(08)61002-8. [PMC free article] [PubMed]
  • Marsh H, Balla J, Hau K. An evaluation of incremental fit indices: A clarification of mathematical and empirical properties. In: Marcoulides G, Schumacker R, editors. Advanced Structural Equation Modelling: Issues and Techniques. Lawrence Erlbaum Associates; 1996. pp. 315–355.
  • McKibbin C, Patterson TL, Jeste DV. Assessing disability in older patients with schizophrenia: results from the WHODAS-II. Journal of Nervous and Mental Disease. 2004;192(6):405–413. DOI: 10.1097/01.nmd.0000130133.32276.83. [PubMed]
  • Mogga S, Prince M, Alem A, Kebede D, Stewart R, Glozier N, Hotopf M. Outcome of major depression in Ethiopia: population-based study. British Journal of Psychiatry. 2006;189:241–246. DOI: 10.1192/bjp.bp.105.013417. [PubMed]
  • Mokken R. Theory and Procedure of Scale Analysis. Mouton & Co; 1971.
  • Molenaar AW, Sijtsma K. User's Manual MSP5 for Windows. iecProGAMMA; 2000.
  • Murray CJL, Lopez AD. The Global Burden of Disease: A Comprehensive Assessment of Mortality and Disability from Diseases, Injuries, and Risk Factors in 1990 and Projected to 2020. Harvard University Press; 1996.
  • Norton J, de Roquefeuil G, Benjamins A, Boulenger JP, Mann A. Psychiatric morbidity, disability and service use amongst primary care attenders in France. European Psychiatry. 2004;19(3):164–167. DOI: 10.1016/j.eurpsy.2003.11.003. [PubMed]
  • Perini SJ, Slade T, Andrews G. Generic effectiveness measures: Sensitivity to symptom change in anxiety disorders. Journal of Affective Disorders. 2006;90(2–3):123–130. [PubMed]
  • Posl M, Cieza A, Stucki G. Psychometric properties of the WHODASII in rehabilitation patients. Quality of Life Research. 2007;16(9):1521–1531. DOI: 10.1007/s11136-007-9259-4. [PubMed]
  • Prince MJ, de Rodriguez JL, Noriega L, Lopez A, Acosta D, Albanese E, Arizaga R, Copeland JR, Dewey M, Ferri CP, Guerra M, Huang Y, Jacob KS, Krishnamoorthy ES, McKeigue P, Sousa R, Stewart RJ, Salas A, Sosa AL, Uwakwa R, Dementia Research Group The 10/66 Dementia Research Group's fully operationalised DSM-IV dementia computerized diagnostic algorithm, compared with the 10/66 dementia algorithm and a clinician diagnosis:A population validation study. BMC Public Health. 2008a;8:219. DOI: 10.1186/1471-2458-8-219. [PMC free article] [PubMed]
  • Prince M, Acosta D, Albanese E, Arizaga R, Ferri CP, Guerra M, Huang Y, Jacob KS, Jimenez-Velazquez IZ, Rodriguez JL, Salas A, Sosa AL, Sousa R, Uwakwe R, van der Poel R, Williams J, Wortmann M. Ageing and dementia in low and middle income countries – using research to engage with public and policy makers. International Review of Psychiatry. 2008b;20(4):332–343. DOI: 10.1080/09540260802094712. [PMC free article] [PubMed]
  • R Development Core Team. R Foundation for Statistical Computing. Vienna; 2007. R: A language and environment for statistical computing.
  • Rehm J, Ustun TB, Saxena S. On the development and psychometric testing of the WHO screening instrument to assess disablement in the general population. International Journal of Methods in Psychiatric Research. 1999;8:110–122.
  • Sijtsma K, Emons WH, Bouwmeester S, Nyklicek I, Roorda LD. Nonparametric IRT analysis of Quality-of-Life Scales and its application to the World Health Organization Quality-of-Life Scale (WHOQOL-Bref) Quality of Life Research. 2008;17(2):275–290. DOI: 10.1007/s11136-007-9281-6. [PMC free article] [PubMed]
  • Sijtsma K, Hemker BT. A taxonomy of IRT models for ordering persons and items using simple sum scores. Journal of Educational and Behavioral Statistics. 2000;25(4):391–415. DOI: 10.3102/10769986025004391.
  • SPSS Inc. Amos 5.0. SPSS Inc; 2003.
  • SPSS Inc. SPSS for Windows. 2005. SPSS Inc. Rel. 15.0.
  • Stata Corporation. Stata Statistical Software: Release 10.0. Stata Corporation; 2007.
  • Tucker L, Lewis C. A reliability coefficient for maximum likelihood factor analysis. Psychometrika. 1973;38:1–10.
  • United Nations. World Population Prospects: The 2006 Revision, Highlights. Department of Economic and Social Affairs, Population Division, United Nations; 2007.
  • Van der Ark LA, Croon MA, Sijtsma K. Possibilities and challenges in Mokken scale analysis using marginal models. In: Shigemasu K, Okada A, Imaizumi T, Hoshino T, editors. New Trends in Psychometrics. Universal Academic Press; 2007.
  • van Tubergen A, Landewe R, Heuft-Dorenbosch L, Spoorenberg A, Van der Heijde D, Van Der Tempel H, Van Der Linden S. Assessment of disability with the World Health Organisation Disability Assessment Schedule II in patients with ankylosing spondylitis. Annals of the Rheumatic Diseases. 2003;62(2):140–145. DOI: 10.1136/ard.62.2.140. [PMC free article] [PubMed]
  • Von Korff M, Crane PK, Alonso J, Vilagut G, Angermeyer MC, Bruffaerts R, de Girolamo G, Gureje O, De Graaf R, Huang Y, Iwata N, Karam EG, Kovess V, Lara C, Levinson D, Posada-Villa J, Scott KM, Ormel J. Modified WHODAS-II provides valid measure of global disability but filter items increased skewness. Journal of Clinical Epidemiology. 2008;61(11):1132–1143. DOI: 10.1016/j.jclinepi.2007.12.009. [PubMed]
  • World Health Organization (WHO) International Classification of Functioning, Disability and Health (ICF) WHO; 2001a.
  • World Health Organization (WHO) 2001b. WHODAS II Disability Assessment Schedule accessed 7 July 2008.

Articles from Wiley-Blackwell Online Open are provided here courtesy of Wiley-Blackwell, John Wiley & Sons