Work engagement is a positive work-related state of fulfillment characterized by vigor, dedication, and absorption. Previous studies have operationalized the construct through development of the Utrecht Work Engagement Scale. Apart from the original three-factor 17-item version of the instrument (UWES-17), there exists a nine-item shortened revised version (UWES-9).
The current study explored the psychometric properties of the Chinese version of the Utrecht Work Engagement Scale in terms of factorial validity, scale reliability, descriptive statistics, and construct validity.
A cross-sectional questionnaire survey was conducted in 2009 among 992 workers from over 30 elderly service units in Hong Kong.
Confirmatory factor analyses revealed a better fit for the three-factor model of the UWES-9 than the UWES-17 and the one-factor model of the UWES-9. The three factors showed acceptable internal consistency and strong correlations with factors in the original versions. Engagement was negatively associated with perceived stress and burnout while positively with age and holistic care climate.
The UWES-9 demonstrates adequate psychometric properties, supporting its use in future research in the Chinese context.
Work engagement; Validity; Reliability; Chinese
Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQLTM 4.0 Generic Core Scales.
The PedsQLTM 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories.
The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order.
This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
quality of life; school children; Iran; Rasch model
Aims: To examine associations of leisure time physical activity and physical strenuousness of work with physical functioning 28 years later.
Methods: A cohort (n = 902) of metal industry employees was studied for exercise and housework activity in 1973 and 1978, and for BMI, current smoking, strenuousness of work, grip strength, and chronic diseases in 1973. Of the 670 survivors in 2000, 529 (79%) responded to all studied items in a follow up questionnaire including the SF-36 Physical Functioning (PF) scale. Belonging to the lower quartile of the PF scale denoted poor functioning.
Results: Vigorous exercise and housework activity were inversely associated with poor PF 28 years later in both white-collar and blue-collar workers. Engaging in activities of any intensity was similarly associated among the blue-collar workers. In a multiple logistic regression model including as independent variables age, sex, occupational class, the number of chronic diseases, vigorous leisure time physical activity, BMI, physical work strenuousness, and smoking (all measured at baseline), the risk of poor PF at follow up was decreased by vigorous leisure time physical activity and increased by high physical strenuousness of work, high BMI, and smoking. The effect of work strenuousness was mainly due to that among the blue-collar group. Allowing for baseline grip strength did not materially alter the results.
Conclusion: Vigorous leisure time physical activity decreased the risk of poor physical functioning as perceived considerably later in life, while high work strenuousness, smoking, and overweight increased it. Among blue-collar workers a beneficial association was observed with all leisure time activity, including that of lower intensity.
In clinical ophthalmology as in other fields, measuring patient-reported outcomes imposes a burden on patients. To decrease that burden, we used item-response theory (IRT) to develop and test a short version of the National Eye Institute's Visual Function Questionnaire (VFQ).
We analyzed VFQ data from 276 adults in Japan. Most of them had glaucoma, cataract, or macular degeneration. Their visual acuity (Snellen fraction) averaged 20/120 (range: 20/13 to 20/2000) for the better eye, and 20/200 (range: 20/13 to 20/2000) for the worse eye. We used a polytomous IRT model, the Generalized Partial Credit Model as implemented in software for parameter scaling of rating data (PARSCALE). To select items for inclusion in the short version we examined each item's location on the latent-trait continuum, its slope, and its frequency of missing data. We also ensured representation of all 7 domains that are important in Japan. To examine the characteristics of the resulting scale, we computed its test information (an index of precision that can vary with the value of the latent trait), and carried out validation testing.
From 32 of the original VFQ items, we selected 11. The scale comprising those 11 items (the VFQ-J11) had test information greater than 9 for values of the latent trait between −2.0 and +0.8. The item thresholds were well-targeted for patients with vision problems. Scores on the VFQ-J11 correlated strongly and in the expected direction with measures of visual field and corrected visual acuity. As expected for a valid measure, those scores also improved by a large amount (almost one standard deviation) after cataract surgery.
This 11-item instrument can provide reliable and the valid data on visual functioning in patients with ophthalmic problems. It is expected to be less of a burden on respondents, while it maintains good psychometric properties.
To evaluate existing measures of health numeracy using Item Response Theory (IRT).
A cross-sectional study was conducted. Participants completed assessments of health numeracy measures including the Lipkus Expanded Health Numeracy Scale (Lipkus), and the Medical Data Interpretation Test (MDIT). The Lipkus and MDIT were scaled with IRT utilizing the 2-parameter logistic model.
Three-hundred and fifty-nine (359) participants were surveyed. Classical test theory parameters and IRT scaling parameters of the numeracy measures found most items to be at least moderately discriminating. Modified versions of the Lipkus and MDIT were scaled after eliminating items with low discrimination, high difficulty parameters, and poor model fit. The modified versions demonstrated a good range of discrimination and difficulty as indicated by the Test Information Functions.
An IRT analysis of the Lipkus and MDIT indicate that both health numeracy scales discriminate well across a range of ability.
Health numeracy skills are needed in order for patients to successfully participate in their medical care. The accurate assessment of health numeracy may help health care providers to tailor patient education interventions to the patient’s level of understanding and ability. Item response theory scaling methods can be used to evaluate the discrimination and difficulty of individual items as well as the overall assessment.
Item Response Theory; Numeracy; Health Literacy; Measurement
Engaged employees are an asset to any organization. They are instrumental in ensuring good commercial outcomes through continuous innovation and incremental improvement. A health care facility is similar to a regular work setting in many ways. A health care provider and a patient have roles akin to a team leader and a team member/stakeholder, respectively. Hence it can be argued that the concept of employee engagement can be applied to patients in health care settings in order to improve health outcomes.
Patient engagement data were collected using a survey instrument from a primary care clinic in the northern Indian state of Punjab. Canonical correlation equations were formulated to identify combinations which were strongly related to each other. In addition, the cause-effect relationship between patient engagement and patient-perceived health outcomes was described using structural equation modeling.
Canonical correlation analysis showed that the first set of canonical variables had a fairly strong relationship, ie, a magnitude > 0.80 at the 95% confidence interval, for five dimensions of patient engagement. Structural equation modeling analysis yielded a β ≥ 0.10 and a Student’s t statistic ≥ 2.96 for these five dimensions. The threshold Student’s t statistic was 1.99. Hence it was found the β values were significant at the 95% confidence interval for all census regions.
A scaled reliable survey instrument was developed to measured patient engagement. Better patient engagement is associated with better patient-perceived health outcomes. This study provides preliminary evidence that patient engagement has a causal relationship with patient-perceived health outcomes.
patient engagement; health outcomes; communication; provider effectiveness; patient incentive
Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods.
item response theory; health outcomes; differential item functioning; computer adaptive testing
Occupational health service (OHS) for small-scale enterprises (SSEs) is still limited in many countries. Both Japan and the Netherlands have universal OHS systems for all employees. The objective of this survey was to examine the activities of occupational physicians (OPs) in the two countries for SSEs and to investigate their proposals for the improvement of service.
Questionnaires on types and sizes of the industries they serve, allocation of service hours (current and desired), sources of information for occupational health activities etc. were mailed in 2006 to 461 and 335 Japanese and Dutch OPs, respectively, who have served in small- and medium-scale enterprises. In practice, 107 Japanese (23%) and 106 Dutch physicians (32%) replied, respectively.
Results and Conclusions
Total service time per month was longer for OPs in the Netherlands than OPs in Japan. Japanese OPs spent more hours for health and safety meetings, worksite rounds, and prevention of overwork-induced ill health (14–16% each). Dutch OPs used much more hours for the guidance of absent workers (48%). Thus, service conditions were not the same for OPs in the two countries. Nevertheless, both groups of OPs unanimously considered that employers are the key persons for the improvement of OHS especially in SSEs and their education is important for better OHS. The conclusions should be taken as preliminary, however, due to study limitations including low response rates in both groups of physicians.
Education; Employer; Occupational physician; Occupational health services; Small-scale enterprises
The 17-item Hamilton Rating Scale for Depression (HRSD17) and the Montgomery Äsberg Depression Rating Scale (MADRS) are two widely used clinicianrated symptom scales. A 6-item version of the HRSD (HRSD6) was created by Bech to address the psychometric limitations of the HRSD17. The psychometric properties of these measures were compared using classical test theory (CTT) and item response theory (IRT) methods. IRT methods were used to equate total scores on any two scales. Data from two distinctly different outpatient studies of nonpsychotic major depression: a 12-month study of highly treatment-resistant patients (n=233) and an 8-week acute phase drug treatment trial (n=985) were used for robustness of results.
MADRS and HRSD6 items generally contributed more to the measurement of depression than HRSD17 items as shown by higher item-total correlations and higher IRT slope parameters. The MADRS and HRSD6 were unifactorial while the HRSD17 contained 2 factors. The MADRS showed about twice the precision in estimating depression as either the HRSD17 or HRSD6 for average severity of depression. An HRSD17 of 7 corresponded to an 8 or 9 on the MADRS and 4 on the HRSD6.
The MADRS would be superior to the HRSD17 in the conduct of clinical trials.
MADRS; HRSD; item response theory; classical test theory; psychometrics
Relatively recently in Japan, immature-type depression, frequently classified in the bipolar II spectrum, has increased among workers in their twenties to forties. This study explored whether affective temperaments moderate the relationship between work-related stressors and depressive symptoms among this age group. In July 2004, self-administered questionnaires were distributed to all employees of a Japanese company. Eight hundred seventy-four employees (63%) returned the questionnaires, with 728 completed. Questionnaires included the 12-item General Health Questionnaire for assessing depressive symptoms, the Temperament Evaluation of Memphis, Pisa, Paris, and San Diego-Autoquestionnaire version for assessing affective temperaments, the Effort-Reward Imbalance Questionnaire to assess work-related stressors and overcommitment, and questions regarding individual attributes and employment characteristics. Multivariate logistic regression analysis showed that affective temperaments moderated the relationship between work-related stressors and depressive symptoms. Effort (OR = 1.078), which represents job demands and/or obligations imposed on employees, and the upper tertile of overcommitment (OR = 1.589), which represents hyperadaptation to the workplace, were risk factors for depressive symptoms. Additionally, the results for cyclothymic (OR = 11.404) and anxious temperaments (OR = 1.589) suggested that depressive symptoms among this age group may be related to immature-type depression.
This study applied Item Response Theory (IRT) and Computer Adaptive Test (CAT) methodologies to develop a prototype function and disability assessment instrument for use in aging research. Herein, we report on the development of the CAT version of the Late-Life Function & Disability instrument (Late-Life FDI) and evaluate its psychometric properties.
We employed confirmatory factor analysis, IRT methods, validation, and computer simulation analyses of data collected from 671 older adults residing in residential care facilities. We compared accuracy, precision, and sensitivity to change of scores from CAT versions of two Late-Life FDI scales with scores from the fixed-form instrument. Score estimates from the prototype CAT versus the original instrument were compared in a sample of 40 older adults.
Distinct function and disability domains were identified within the Late-Life FDI item bank and used to construct two prototype CAT scales. Using retrospective data, scores from computer simulations of the prototype CAT scales were highly correlated with scores from the original instrument. The results of computer simulation, accuracy, precision, and sensitivity to change of the CATs closely approximated those of the fixed-form scales, especially for the 10- or 15-item CAT versions. In the prospective study each CAT was administered in less than 3 minutes and CAT scores were highly correlated with scores generated from the original instrument.
CAT scores of the Late-Life FDI were highly comparable to those obtained from the full-length instrument with a small loss in accuracy, precision, and sensitivity to change.
outcome assessment (Health Care); geriatrics; rehabilitation
OBJECTIVES—To explore the association between the prevalence of hypertension in a Japanese working population and job strain (a combination of low control over work and high psychological demands), and to estimate this association in different sociodemographic strata.
METHODS—From a multicentre community based cohort study of Japanese people, sex specific cross sectional analyses were performed on 3187 men and 3400 women under 65 years of age, all of whom were actively engaged in various occupations throughout Japan. The baseline period was 1992-4. The association between job characteristics—measured with a Japanese version of the Karasek demand-control questionnaire—and the prevalence of hypertension defined by blood pressure and from clinical diagnoses were examined. Adjustments were made for possible confounders. The analyses were repeated for stratified categories of occupational class, educational attainment, and age group.
RESULTS—In men, the level of job strain (the ratio of psychological job demands to job control) correlated with the prevalence of hypertension. In a multiple logistic regression model, job strain was significantly related to hypertension (odds ratio 1.18; 95% confidence interval 1.05 to 1.32), after adjustment for age, employment (white collar v blue collar), marital status, family history of hypertension, cigarette smoking, alcohol intake, physical activity, and body mass index. The stratified analyses showed significant excess risks in the subordinate groups compared with managers, blue collar workers, less educated workers, and the older age groups. This association was not significant in women. Multiple linear regression analyses, with systolic and diastolic blood pressures as dependent variables, did not show any significant association.
CONCLUSIONS—The findings provided limited proof that job strain is related to hypertension in Japanese working men. Older men in a lower social class may be more vulnerable to the hypertensive effects of job strain.
Keywords: hypertension; stress; psychological; work
The measurement of empathy is important in the assessment of physician competence and patient outcomes. The prevailing view is that female physicians have higher empathy scores compared with male physicians. In Japan, the number of female physicians has increased rapidly in the past ten years. In this study, we focused on female Japanese physicians and addressed factors that were associated with their empathic engagement in patient care.
The Jefferson Scale of Empathy (JSE) was translated into Japanese by using the back-translation procedure, and was administered to 285 female Japanese physicians. We designed this study to examine the psychometrics of the JSE and group differences among female Japanese physicians.
The item-total score correlations of the JSE were all positive and statistically significant, ranging from .20 to .54, with a median of .41. The Cronbach’s coefficient alpha was .81. Female physicians who were practicing in “people-oriented” specialties obtained a significantly higher mean empathy score than their counterparts in “procedure-” or “technology-oriented” specialties. In addition, physicians who reported living with their parents in an extended family or living close to their parents, scored higher on the JSE than those who were living alone or in a nuclear family.
Our results provide support for the measurement property and reliability of the JSE in a sample of female Japanese physicians. The observed group differences associated with specialties and living arrangement may have implications for sustaining empathy. In addition, recognizing these factors that reinforce physicians’ empathy may help physicians to avoid career burnout.
Empathy; Female physicians; Career development
Item Response Theory (IRT) is increasingly applied in health research to combine information from multiple-item responses. IRT posits that a person's susceptibility to a symptom is driven by the interaction of the characteristics of the symptom and person. This article describes the statistical background of incorporating IRT into a multilevel framework and extends this approach to longitudinal health outcomes, where the self-report method is used to construct a multi-item scale.
A secondary analysis of data from 2 descriptive longitudinal studies is performed. The data include 21 symptoms reported across time by 350 women with breast cancer. A 3-level hierarchical linear model (HLM) was used for the analysis. Level 1 models the item responses, consisting of symptom presence or absence. Level 2 models the trajectory of each individual, representing change over time of the IRT-created latent variable symptom experience. Level 3 explains that trajectory using person-specific characteristics such as age and location of care. The purpose of the analysis is to examine if older and younger women with breast cancer differ in their symptom experience trajectory after controlling for location of care.
Fatigue and pain were the most prevalent symptoms. The symptom experience of women with breast cancer was found to improve over time. Neither age nor location of care was significantly associated with the symptom experience trajectory.
Embedding IRT into an HLM framework produces several benefits. The example provided demonstrates benefits through the creation of a latent symptom experience variable that can be used either as an outcome or as a covariate in another model, examining the latent symptom experience trajectory and its relationship with covariates at the individual level, and managing symptom nonresponse.
cancer symptoms; hierarchical linear model; Item Response Theory; women with breast cancer
Recent work using classical test theory (CTT) and item response theory (IRT) has found that the self-report (QIDS-SR16) and clinician-rated (QIDS-C16) versions of the 16-item Quick Inventory of Depressive Symptomatology were generally comparable in outpatients with nonpsychotic major depressive disorder (MDD). This report extends this comparison to a less well-educated, more treatment-resistant sample that included more ethnic/racial minorities using IRT and selected classical test analyses.
The QIDS-SR16 and QIDS-C16 were obtained in a sample of 441 outpatients with nonpsychotic MDD seen in the public sector in the Texas Medication Algorithm Project (TMAP). The Samejima graded response IRT model was used to compare the QIDS-SR16 and QIDS-C16.
The nine symptom domains in the QIDS-SR16 and QIDS-C16 related well to overall depression. The slopes of the item response functions a), which index the strength of relationship between overall depression and each symptom, were extremely similar with the two measures. Likewise, the CTT and IRT indices of symptom frequency (item means and locations of the item response functions, bi) were also similar with these two measures. For example, sad mood and difficulty with concentration/decision making were highly related to the overall depression severity with both the QIDS-C16 and QIDS-SR16. Likewise, sleeping difficulties were commonly reported, even though they were not as strongly related to overall magnitude of depression.
In this less educated, socially disadvantaged sample, differences between the QIDS-C16 and QIDS-SR16 were minor. The QIDS-SR16 is a satisfactory substitute for the more time-consuming QIDS-C16 in a broad range of adult, nonpsychotic, depressed outpatients.
Quick Inventory of Depressive Symptomatology; Inventory of Depressive Symptomatology; Item response theory; Samejima graded response model; depressive symptoms
The objective of the present study is to describe the item response theory (IRT) analysis of the National Institutes of Health (NIH) Patient Reported Out-comes Measurement Information System (PROMIS®) pediatric parent proxy-report item banks and the measurement properties of the new PROMIS® Parent Proxy Report Scales for ages 8–17 years.
Parent proxy-report items were written to parallel the pediatric self-report items. Test forms containing the items were completed by 1,548 parent–child pairs. CCFA and IRT analyses of scale dimensionality and item local dependence, and IRT analyses of differential item functioning were conducted.
Parent proxy-report item banks were developed and IRT parameters are provided. The recommended unidimensional short forms for the PROMIS® Parent Proxy Report Scales are item sets that are subsets of the pediatric self-report short forms, setting aside items for which parent responses exhibit local dependence. Parent proxy-report demonstrated moderate to low agreement with pediatric self-report.
The study provides initial calibrations of the PROMIS® parent proxy-report item banks and the creation of the PROMIS® Parent Proxy-Report Scales. It is anticipated that these new scales will have application for pediatric populations in which pediatric self-report is not feasible.
PROMIS®; Parent proxy report; Item response theory
To examine the self-reported level of work ability among female employees and the relationship between work ability and demographic characteristics, physical health, mental health, and various psychosocial and organizational work environment factors.
Participants were 597 female employees with an average age of 43 years from urban and rural areas in Norway. Trained personnel performed a structured interview to measure demographic variables, physical health, and characteristics of the working environment. Mental health was assessed using the 25-item version of the Hopkins Symptoms Checklist (HSCL-25). Work ability was assessed using a question from the Graded Reduced Work Ability Scale.
Of the 597 female employees, 8.9% reported an extremely or very reduced ability to work. Twenty-four percent reported poor physical health and 21.9% reported mental distress (≥ 1.55 HSCL-25 cut-off). Women, who reported moderately and severely reduced work ability, did not differ a lot. Moderately reduced work ability increased with age and was associated with physical and mental health. Severely reduced work ability was strongly associated only with physical health and with unskilled occupation. Of eight work environment variables, only three yielded significant associations with work ability, and these associations disappeared after adjustment in the multivariate analysis.
Results indicate that ageing, in addition to poor self-reported physical health and unskilled work, were the strongest factors associated with reduced work ability among female employees. Impact of work environment in general was visible only in univariate analysis.
Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS.
Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS.
The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias.
The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity.
There is growing interest in the use of item response theory (IRT) for creation of measures of health-related quality of life (HRQOL). A first step in IRT modeling is development of item banks. Our aim is to describe the value of including librarians, and to describe processes used by librarians, in the creation of such banks.
Working collaboratively with PROMIS researchers at the University of Pittsburgh, a team of librarians designed and implemented comprehensive literature searches in a selected set of information resources, for the purpose of identifying existing measures of patient-reported emotional distress.
A step-by-step search protocol developed by librarians produced a set of 525 key words and controlled vocabulary terms for use in search statements in 3 bibliographic databases. These searches produced 6169 literature citations, allowing investigators to add 444 measurement scales to their item banks.
Inclusion of librarians on the Pittsburgh PROMIS research team allowed investigators to create large initial item banks, increasing the likelihood that the banks would attain high measurement precision during subsequent psychometric analyses. In addition, a comprehensive literature search protocol was developed that can now serve as a guide for other investigators in the creation of IRT item banks.
databases as topic; outcome assessment(health care); librarians; interdisciplinary communication
The Quick Disability of the Arm, Shoulder, and Hand (QuickDASH) questionnaire is a region-specific, self-administered questionnaire, which consists of a disability/symptom (QuickDASH-DS) scale, and the same two optional modules, the work (DASH-W) and the sport/music (DASH-SM) modules, as the DASH. After the Japanese version of DASH (DASH-JSSH) was cross-culturally adapted and developed, we made the Japanese version of QuickDASH (QuickDASH-JSSH) by extracting 11 out of 30 items of the DASH-JSSH regarding disability/symptoms. The purpose of this study was to test the reliability, validity, and responsiveness of QuickDASH-JSSH.
A series of 72 patients with upper extremity disorders completed the QuickDASH-JSSH, the 36-Item Short-Form Health Survey (SF-36), and the Visual Analog Scale (VAS) for pain. Thirty-eight of the patients were reassessed for test–retest reliability 1 or 2 weeks later. Reliability was investigated by the reproducibility and internal consistency. To analyze the validity, a principal component analysis and the correlation coefficients between the QuickDASH-JSSH and the SF-36 were obtained. The responsiveness was examined by calculating the standardized response mean (SRM; mean change/SD) and effect size (mean change/SD of baseline value) after carpal tunnel release of the 17 patients with carpal tunnel syndrome.
Cronbach’s alpha coefficient in the QuickDASH-DS was 0.88. The intraclass correlation coefficient (ICC) for the same was 0.82. The unidimensionality of the QuickDASH-DS was confirmed. The correlation coefficients between the QuickDASH-DS and the DASH-DS, DASH-W, or the DASH-SM were 0.92, 0.81, or 0.76, respectively. The correlation coefficients between the QuickDASH-DS score and the subscales of the SF-36 ranged from −0.29 to −0.73. The correlation coefficient between the QuickDASH-DS score and the VAS for pain was 0.52. The SRM/effect size of QuickDASH-DS was −0.54/−0.37, which indicated moderate sensitivity.
The Japanese version of QuickDASH has equivalent evaluation capacities to the original QuickDASH.
Large scale research projects in behaviour genetics and genetic epidemiology are often based on questionnaire or interview data. Typically, a number of items is presented to a number of subjects, the subjects’ sum scores on the items are computed, and the variance of sum scores is decomposed into a number of variance components. This paper discusses several disadvantages of the approach of analysing sum scores, such as the attenuation of correlations amongst sum scores due to their unreliability. It is shown that the framework of Item Response Theory (IRT) offers a solution to most of these problems. We argue that an IRT approach in combination with Markov chain Monte Carlo (MCMC) estimation provides a flexible and efficient framework for modelling behavioural phenotypes. Next, we use data simulation to illustrate the potentially huge bias in estimating variance components on the basis of sum scores. We then apply the IRT approach with an analysis of attention problems in young adult twins where the variance decomposition model is extended with an IRT measurement model. We show that when estimating an IRT measurement model and a variance decomposition model simultaneously, the estimate for the heritability of attention problems increases from 40% (based on sum scores) to 73%.
Item response theory; MCMC; Bayesian statistics; Measurement; Attention problems; Sum scores
Although the Autobiographical Memory Test (AMT) is widely used its psychometric properties have rarely been investigated. This paper utilises data gathered from a 10-item written version of the AMT, completed by 5792 adolescents participating in the Avon Longitudinal Study of Parents and Children, to examine the psychometric properties of the measure. The results show that the scale derived from responses to the AMT operates well over a wide range of scores, consistent with the aim of deriving a continuous measure of over-general memory. There was strong evidence of group differences in terms of gender, low negative mood, and IQ, and these were in agreement when comparing an item response theory (IRT) approach with that based on a sum score. One advantage of the IRT model is the ability to assess and consequently allow for differential item functioning. This additional analysis showed evidence of response bias for both gender and mood, resulting in attenuation in the mean differences in AMT across these groups. Implications of the findings for the use of the AMT measure in different samples are discussed.
Avon Longitudinal Study of Parents and Children; ALSPAC; Autobiographical Memory Test; AMT; Graded response model; Differential item functioning; Mood congruence
This study has used Item Response Theory (IRT) to examine the psychometric properties of Health-Related Quality-of-Life.
This investigation is a descriptive- analytic study. Subjects were 370 undergraduate students of nursing and midwifery who were selected from Tabriz University of Medical Sciences. All participants were asked to complete the Farsi version of WHOQOL-BREF. Samejima's graded response model was used for the analyses.
The results revealed that the discrimination parameters for all items in the four scales were low to moderate. The threshold parameters showed adequate representation of the relevant traits from low to the mean trait level. With the exception of 15, 18, 24 and 26 items, all other items showed low item information function values, and thus relatively high reliability from low trait levels to moderate levels.
The results of this study indicate that although there was general support for the psychometric properties of the WHOQOL-BREF from an IRT perspective, this measure can be further improved. IRT analyses provided useful measurement information and demonstrated to be a better methodological approach for enhancing our knowledge of the functionality of WHOQOL-BREF.
Item response theory; Psychometrics; Quality of life
Health numeracy is an important factor in how well people make decisions based on medical risk information. However, in many countries, including Japan, numeracy studies have been limited.
To fill this gap, we evaluated health numeracy levels in a sample of Japanese adults by translating two well-known scales that objectively measure basic understanding of math and probability: the 3-item numeracy scale developed by Schwartz and colleagues (the Schwartz scale) and its expanded version, the 11-item numeracy scale developed by Lipkus and colleagues (the Lipkus scale).
Participants’ performances (n = 300) on the scales were much higher than in original studies conducted in the United States (80% average item-wise correct response rate for Schwartz-J, and 87% for Lipkus-J). This high performance resulted in a ceiling effect on the distributions of both scores, which made it difficult to apply parametric statistical analysis, and limited the interpretation of statistical results. Nevertheless, the data provided some evidence for the reliability and validity of these scales: The reliability of the Japanese versions (Schwartz-J and Lipkus-J) was comparable to the original in terms of their internal consistency (Cronbach’s α = 0.53 for Schwartz-J and 0.72 for Lipkus-J). Convergent validity was suggested by positive correlations with an existing Japanese health literacy measure (the Test for Ability to Interpret Medical Information developed by Takahashi and colleagues) that contains some items relevant to numeracy. Furthermore, as shown in the previous studies, health numeracy was still associated with framing bias with individuals whose Lipkus-J performance was below the median being significantly influenced by how probability was framed when they rated surgical risks. A significant association was also found using Schwartz-J, which consisted of only three items.
Despite relatively high levels of health numeracy according to these scales, numeracy measures are still important determinants underlying susceptibility to framing bias. This suggests that it is important in Japan to identify individuals with low numeracy skills so that risk information can be presented in a way that enables them to correctly understand it. Further investigation is required on effective numeracy measures for such an intervention in Japan.
Risk communications; Patient empowerment; Patient education; Risk perception; Decision making
► It is important to maximise the precision of personality measurement in adolescents. ► We apply item response theory (IRT) to the NEO-FFI in an adolescent sample. ► IRT was used to assess item validity and highlight poorly performing indicators. ► Removing poor items reduced measurement error without compromising validity. ► IRT analysis can be used to develop personality measures ensuring item validity.
The present study applied item response theory (IRT) to the NEO five factor inventory (NEO-FFI) completed by a community based sample of adolescents. The results revealed that many of these personality items may not be discriminating well, with some traits demonstrating greater reliability than others. Furthermore, the threshold values highlighted that the majority of the items had skewed responses, suggesting a limited utility of some response categories. Generally, removing poorly discriminating items does not harm external validity, suggesting IRT reduces measurement error and increases reliability without compromising validity.
Personality; Adolescents; Item response theory; External validity