Search tips
Search criteria

Results 1-25 (1024746)

Clipboard (0)

Related Articles

1.  Psychometric Properties of Self-Report Concussion Scales and Checklists 
Journal of Athletic Training  2012;47(2):221-223.
Alla S, Sullivan SJ, Hale L, McCrory P. Self-report scales/checklists for the measurement of concussion symptoms: a systematic review. Br J Sports Med. 2009;43 (suppl 1):i3–i12.
Clinical Question:
Which self-report symptom scales or checklists are psychometrically sound for clinical use to assess sport-related concussion?
Data Sources:
Articles available in full text, published from the establishment of each database through December 2008, were identified from PubMed, Medline, CINAHL, Scopus, Web of Science, SPORTDiscus, PsycINFO, and AMED. Search terms included brain concussion, signs or symptoms, and athletic injuries, in combination with the AND Boolean operator, and were limited to studies published in English. The authors also hand searched the reference lists of retrieved articles. Additional searches of books, conference proceedings, theses, and Web sites of commercial scales were done to provide additional information about the psychometric properties and development for those scales when needed in articles meeting the inclusion criteria.
Study Selection:
Articles were included if they identified all the items on the scale and the article was either an original research report describing the use of scales in the evaluation of concussion symptoms or a review article that discussed the use or development of concussion symptom scales. Only articles published in English and available in full text were included.
Data Extraction:
From each study, the following information was extracted by the primary author using a standardized protocol: study design, publication year, participant characteristics, reliability of the scale, and details of the scale or checklist, including name, number of items, time of measurement, format, mode of report, data analysis, scoring, and psychometric properties. A quality assessment of included studies was done using 16 items from the Downs and Black checklist1 and assessed reporting, internal validity, and external validity.
Main Results:
The initial database search identified 421 articles. After 131 duplicate articles were removed, 290 articles remained and were added to 17 articles found during the hand search, for a total of 307 articles; of those, 295 were available in full text. Sixty articles met the inclusion criteria and were used in the systematic review. The quality of the included studies ranged from 9 to 15 points out of a maximum quality score of 17. The included articles were published between 1995 and 2008 and included a collective total of 5864 concussed athletes and 5032 nonconcussed controls, most of whom participated in American football. The majority of the studies were descriptive studies monitoring the resolution of concussive self-report symptoms compared with either a preseason baseline or healthy control group, with a smaller number of studies (n = 8) investigating the development of a scale.
The authors initially identified 20 scales that were used among the 60 included articles. Further review revealed that 14 scales were variations of the Pittsburgh Steelers postconcussion scale (the Post-Concussion Scale, Post-Concussion Scale: Revised, Post-Concussion Scale: ImPACT, Post-Concussion Symptom Scale: Vienna, Graded Symptom Checklist [GSC], Head Injury Scale, McGill ACE Post-Concussion Symptoms Scale, and CogState Sport Symptom Checklist), narrowing down to 6 core scales, which the authors discussed further. The 6 core scales were the Pittsburgh Steelers Post-Concussion Scale (17 items), Post-Concussion Symptom Assessment Questionnaire (10 items), Concussion Resolution Index postconcussion questionnaire (15 items), Signs and Symptoms Checklist (34 items), Sport Concussion Assessment Tool (SCAT) postconcussion symptom scale (25 items), and Concussion Symptom Inventory (12 items). Each of the 6 core scales includes symptoms associated with sport-related concussion; however, the number of items on each scale varied. A 7-point Likert scale was used on most scales, with a smaller number using a dichotomous (yes/no) classification.
Only 7 of the 20 scales had published psychometric properties, and only 1 scale, the Concussion Symptom Inventory, was empirically driven (Rasch analysis), with development of the scale occurring before its clinical use. Internal consistency (Cronbach α) was reported for the Post-Concussion Scale (.87), Post-Concussion Scale: ImPACT 22-item (.88–.94), Head Injury Scale 9-item (.78), and Head Injury Scale 16-item (.84). Test-retest reliability has been reported only for the Post-Concussion Scale (Spearman r = .55) and the Post-Concussion Scale: ImPACT 21-item (Pearson r = .65). With respect to validity, the SCAT postconcussion scale has demonstrated face and content validity, the Post-Concussion Scale: ImPACT 22-item and Head Injury Scale 9-item have reported construct validity, and the Head Injury Scale 9-item and 16-item have published factorial validity.
Sensitivity and specificity have been reported only with the GSC (0.89 and 1.0, respectively) and the Post-Concussion Scale: ImPACT 21-item when combined with the neurocognitive component of ImPACT (0.819 and 0.849, respectively). Meaningful change scores were reported for the Post-Concussion Scale (14.8 points), Post-Concussion Scale: ImPACT 22-item (6.8 points), and Post-Concussion Scale: ImPACT 21-item (standard error of the difference = 7.17; 80% confidence interval = 9.18).
Numerous scales exist for measuring the number and severity of concussion-related symptoms, with most evolving from the neuropsychology literature pertaining to head-injured populations. However, very few of these were created in a systematic manner that follows scale development processes and have published psychometric properties. Clinicians need to understand these limitations when choosing and using a symptom scale for inclusion in a concussion assessment battery. Future authors should assess the underlying constructs and measurement properties of currently available scales and use the ever-increasing prospective data pools of concussed athlete information to develop scales following appropriate, systematic processes.
PMCID: PMC3418135  PMID: 22488289
mild traumatic brain injuries; evaluation; reliability; validity; sensitivity; specificity
2.  Limitations of True Score Variance to Measure Discriminating Power: Psychometric Simulation Study 
Journal of abnormal psychology  2010;119(2):300-306.
Demonstrating a specific cognitive deficit usually involves comparing patients’ performance on two or more tests. The psychometric confound occurs if the psychometric properties of these tests lead patients to show greater cognitive deficits in one domain. One way to avoid the psychometric confound is to use tests with a similar level of discriminating power, which is a test’s ability to index true individual differences in classic psychometric theory. One suggested way to measure discriminating power is to calculate true score variance (Chapman & Chapman, 1978). Despite the centrality of these formulations, there is no systematic examination of the relationship between the observable property of true score variance and the latent property of discriminating power. We simulated administrations of free response tests and forced choice tests by creating different replicable ability scores for two groups, across a wide ranges of various psychometric properties (i.e., difficulty, reliability, observed variance, and number of items), and computing an ideal index of discriminating power. Simulation results indicated that true score variance had only limited ability to predict discriminating power (explained about 10 % of variance in replicable ability scores). Furthermore, the ability varied across tests with wide ranges of psychometric variables, such as difficulty, observed variance, reliability, and number of items. Discriminating power depends upon a complicated interaction of psychometric properties that is not well estimated solely by a test’s true score variance.
PMCID: PMC2869469  PMID: 20455603
3.  Developing a Short Form of Benton’s Judgment of Line Orientation Test: An Item Response Theory Approach 
The Clinical neuropsychologist  2011;25(4):670-684.
The Judgment of Line Orientation (JLO) test was developed to be, in Arthur Benton’s words, “as pure a measure of one aspect of spatial thinking, as could be conceived.” The JLO test has been widely used in neuropsychological practice for decades. The test has a high test-retest reliability (Franzen, 2000), as well as good neuropsychological construct validity as shown through neuroanatomical localization studies (Tranel, Vianna, Manzel, Damasio, & Grabowski, 2009). Despite its popularity and strong psychometric properties, the full-length version of the test (30 items) has been criticized as being unnecessarily long (Straus, Sherman, & Spreen, 2006). There have been many attempts at developing short forms; however, these forms have been limited in their ability to estimate scores accurately. Taking advantage of a large sample of JLO performances from 524 neurological patients with focal brain lesions, we used techniques from Item Response Theory (IRT) to estimate each item’s difficulty and power to discriminate among various levels of ability. A random item IRT model was used to estimate the influence of item stimulus properties as predictors of item difficulty. These results were used to optimize the selection of items for a shorter method of administration which maintained comparability with the full form using significantly fewer items. This effectiveness of this method was replicated in a second sample of 82 healthy elderly participants. The findings should help broaden the clinical utility of the JLO and enhance its diagnostic applications.
PMCID: PMC3094715  PMID: 21469016
4.  Measurement precision of the disability for back pain scale-by applying Rasch analysis 
The Oswestry Disability Index (ODI) is widely used for patients with back pain. However, few studies have examined its psychometric properties using modern measurement theory. The purpose of this study was to investigate the psychometric properties of the ODI in patients with back pain using Rasch analysis.
A total of 408 patients with back pain participated in this cross-sectional study. Patients were recruited from the orthopedic, neurosurgery, rehabilitation departments and pain clinic of two hospitals. Rasch analysis was used to examine the Chinese version of ODI 2.1 for unidimensionality, item difficulty, category function, differential item functioning, and test information.
The fit statistics showed 10 items of the ODI fitted the model’s expectation as a unidimensional scale. The ODI measured the different levels of functional limitation without skewing toward the lower or higher levels of disability. No significant ceiling and floor effects and gaps among the items were found. The reliability was high and the test information curve demonstrated precise dysfunction estimation.
Our results showed that the ODI is a unidimensional questionnaire with high reliability. The ODI can precisely estimate the level of dysfunction, and the item difficulty of the ODI matches the person ability. For clinical application, using logits scores could precisely represent the disability level, and using the item difficulty could help clinicians design progressive programs for patients with back pain.
PMCID: PMC3717282  PMID: 23866814
Back pain; Rasch analysis; Oswestry disability index; Functional measure; Disability
5.  Item response theory analysis of cognitive tests in people with dementia: a systematic review 
BMC Psychiatry  2014;14:47.
Performance on psychometric tests is key to diagnosis and monitoring treatment of dementia. Results are often reported as a total score, but there is additional information in individual items of tests which vary in their difficulty and discriminatory value. Item difficulty refers to an ability level at which the probability of responding correctly is 50%. Discrimination is an index of how well an item can differentiate between patients of varying levels of severity. Item response theory (IRT) analysis can use this information to examine and refine measures of cognitive functioning. This systematic review aimed to identify all published literature which had applied IRT to instruments assessing global cognitive function in people with dementia.
A systematic review was carried out across Medline, Embase, PsychInfo and CINHAL articles. Search terms relating to IRT and dementia were combined to find all IRT analyses of global functioning scales of dementia.
Of 384 articles identified four studies met inclusion criteria including a total of 2,920 people with dementia from six centers in two countries. These studies used three cognitive tests (MMSE, ADAS-Cog, BIMCT) and three IRT methods (Item Characteristic Curve analysis, Samejima’s graded response model, the 2-Parameter Model). Memory items were most difficult. Naming the date in the MMSE and memory items, specifically word recall, of the ADAS-cog were most discriminatory.
Four published studies were identified which used IRT on global cognitive tests in people with dementia. This technique increased the interpretative power of the cognitive scales, and could be used to provide clinicians with key items from a larger test battery which would have high predictive value. There is need for further studies using IRT in a wider range of tests involving people with dementia of different etiology and severity.
PMCID: PMC3931670  PMID: 24552237
Item response theory; Dementia; Psychometrics; Cognition; Alzheimer disease; MMSE; Systematic review
6.  Better assessment of physical function: item improvement is neglected but essential 
Arthritis Research & Therapy  2009;11(6):R191.
Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank.
The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36's PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects.
We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were > 0.90.
Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes.
PMCID: PMC3003539  PMID: 20015354
7.  Assessment of health-related quality of life in arthritis: conceptualization and development of five item banks using item response theory 
Modern psychometric methods based on item response theory (IRT) can be used to develop adaptive measures of health-related quality of life (HRQL). Adaptive assessment requires an item bank for each domain of HRQL. The purpose of this study was to develop item banks for five domains of HRQL relevant to arthritis.
About 1,400 items were drawn from published questionnaires or developed from focus groups and individual interviews and classified into 19 domains of HRQL. We selected the following 5 domains relevant to arthritis and related conditions: Daily Activities, Walking, Handling Objects, Pain or Discomfort, and Feelings. Based on conceptual criteria and pilot testing, 219 items were selected for further testing. A questionnaire was mailed to patients from two hospital-based clinics and a stratified random community sample. Dimensionality of the domains was assessed through factor analysis. Items were analyzed with the Generalized Partial Credit Model as implemented in Parscale. We used graphical methods and a chi-square test to assess item fit. Differential item functioning was investigated using logistic regression.
Data were obtained from 888 individuals with arthritis. The five domains were sufficiently unidimensional for an IRT-based analysis. Thirty-one items were deleted due to lack of fit or differential item functioning. Daily Activities had the narrowest range for the item location parameter (-2.24 to 0.55) and Handling Objects had the widest range (-1.70 to 2.27). The mean (median) slope parameter for the items ranged from 1.15 (1.07) in Feelings to 1.73 (1.75) in Walking. The final item banks are comprised of 31–45 items each.
We have developed IRT-based item banks to measure HRQL in 5 domains relevant to arthritis. The items in the final item banks provide adequate psychometric information for a wide range of functional levels in each domain.
PMCID: PMC1550394  PMID: 16749932
8.  Bifactor and Item Response Theory Analyses of Interviewer Report Scales of Cognitive Impairment in Schizophrenia 
Psychological assessment  2011;23(1):245-261.
We conducted psychometric analyses of two interview-based measures of cognitive deficits: the 21-item Clinical Global Impression of Cognition in Schizophrenia (CGI-CogS; Ventura et al., 2008), and the 20-item Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006), which were administered on two occasions to a sample of people with schizophrenia. Traditional psychometrics, bifactor analysis, and item response theory (IRT) methods were used to explore item functioning, dimensionality, and to compare instruments. Despite containing similar item content, responses to the CGI-CogS demonstrated superior psychometric properties (e.g., higher item-intercorrelations, better spread of ratings across response categories), relative to the SCoRS. We argue that these differences arise mainly from the differential use of prompts and how the items are phrased and scored. Bifactor analysis demonstrated that although both measures capture a broad range of cognitive functioning (e.g., working memory, social cognition), the common variance on each is overwhelmingly explained by a single general factor. IRT analyses of the combined pool of 41 items showed that measurement precision is peaked in the mild to moderate range of cognitive impairment. Finally, simulated adaptive testing revealed that only about 10 to 12 items are necessary to achieve latent trait level estimates with reasonably small standard errors for most individuals. This suggests that these interview-based measures of cognitive deficits could be shortened without loss of measurement precision.
PMCID: PMC3183749  PMID: 21381848
item response theory; CGI-CogS; SCoRS; schizophrenia and cognitive deficits; computerized adaptive testing
9.  The AMC Linear Disability Score project in a population requiring residential care: psychometric properties 
Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life.
This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items.
Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items).
It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status.
PMCID: PMC514531  PMID: 15291958
10.  The five item Barthel index 
OBJECTIVES—Routine data collection is now considered mandatory. Therefore, staff rated clinical scales that consist of multiple items should have the minimum number of items necessary for rigorous measurement. This study explores the possibility of developing a short form Barthel index, suitable for use in clinical trials, epidemiological studies, and audit, that satisfies criteria for rigorous measurement and is psychometrically equivalent to the 10 item instrument.
METHODS—Data were analysed from 844 consecutive admissions to a neurological rehabilitation unit in London. Random half samples were generated. Short forms were developed in one sample (n=419), by selecting items with the best measurement properties, and tested in the other (n=418). For each of the 10 items of the BI, item total correlations and effect sizes were computed and rank ordered. The best items were defined as those with the lowest cross product of these rank orderings. The acceptability, reliability, validity, and responsiveness of three short form BIs (five, four, and three item) were determined and compared with the 10 item BI. Agreement between scores generated by short forms and 10 item BI was determined using intraclass correlation coefficients and the method of Bland and Altman.
RESULTS—The five best items in this sample were transfers, bathing, toilet use, stairs, and mobility. Of the three short forms examined, the five item BI had the best measurement properties and was psychometrically equivalent to the 10 item BI. Agreement between scores generated by the two measures for individual patients was excellent (ICC=0.90) but not identical (limits of agreement=1.84±3.84).
CONCLUSIONS—The five item short form BI may be a suitable outcome measure for group comparison studies in comparable samples. Further evaluations are needed. Results demonstrate a fundamental difference between assessment and measurement and the importance of incorporating psychometric methods in the development and evaluation of health measures.

PMCID: PMC1737527  PMID: 11459898
11.  The grounded psychometric development and initial validation of the Health Literacy Questionnaire (HLQ) 
BMC Public Health  2013;13:658.
Health literacy has become an increasingly important concept in public health. We sought to develop a comprehensive measure of health literacy capable of diagnosing health literacy needs across individuals and organisations by utilizing perspectives from the general population, patients, practitioners and policymakers.
Using a validity-driven approach we undertook grounded consultations (workshops and interviews) to identify broad conceptually distinct domains. Questionnaire items were developed directly from the consultation data following a strict process aiming to capture the full range of experiences of people currently engaged in healthcare through to people in the general population. Psychometric analyses included confirmatory factor analysis (CFA) and item response theory. Cognitive interviews were used to ensure questions were understood as intended. Items were initially tested in a calibration sample from community health, home care and hospital settings (N=634) and then in a replication sample (N=405) comprising recent emergency department attendees.
Initially 91 items were generated across 6 scales with agree/disagree response options and 5 scales with difficulty in undertaking tasks response options. Cognitive testing revealed that most items were well understood and only some minor re-wording was required. Psychometric testing of the calibration sample identified 34 poorly performing or conceptually redundant items and they were removed resulting in 10 scales. These were then tested in a replication sample and refined to yield 9 final scales comprising 44 items. A 9-factor CFA model was fitted to these items with no cross-loadings or correlated residuals allowed. Given the very restricted nature of the model, the fit was quite satisfactory: χ2WLSMV(866 d.f.) = 2927, p<0.000, CFI = 0.936, TLI = 0.930, RMSEA = 0.076, and WRMR = 1.698. Final scales included: Feeling understood and supported by healthcare providers; Having sufficient information to manage my health; Actively managing my health; Social support for health; Appraisal of health information; Ability to actively engage with healthcare providers; Navigating the healthcare system; Ability to find good health information; and Understand health information well enough to know what to do.
The HLQ covers 9 conceptually distinct areas of health literacy to assess the needs and challenges of a wide range of people and organisations. Given the validity-driven approach, the HLQ is likely to be useful in surveys, intervention evaluation, and studies of the needs and capabilities of individuals.
PMCID: PMC3718659  PMID: 23855504
Health literacy; Measurement; Assessment; Health competencies; Psychometrics; HLQ
12.  Psychometric Properties of Reverse-Scored Items on the CES-D in a Sample of Ethnically Diverse Older Adults 
Psychological assessment  2011;23(2):558-562.
Reverse-scored items on assessment scales increase cognitive processing demands, and may therefore lead to measurement problems for older adult respondents.
To examine possible psychometric inadequacies of reverse-scored items on the Center for Epidemiologic Studies Depression Scale (CES-D) when used to assess ethnically diverse older adults.
Using baseline data from a gerontologic clinical trial (n=460), we tested the hypotheses that the reversed items on the CES-D: (a) are less reliable than non-reversed items, (b) disproportionately lead to intra-individually atypical responses that are psychometrically problematic, and (c) evidence improved measurement properties when an imputation procedure based on the scale mean is used to replace atypical responses.
In general, the results supported the hypotheses. Relative to non-reversed CES-D items, the four reversed items were less internally consistent, were associated with lower item-scale correlations, and were more often answered atypically at an intra-individual level. Further, the atypical responses were negatively correlated with responses to psychometrically sound non-reversed items that had similar content. The use of imputation to replace atypical responses enhanced the predictive validity of the set of reverse-scored items.
Among older adult respondents reverse-scored items are associated with measurement difficulties. It is recommended that appropriate correction procedures such as item re-administration or statistical imputation be applied to reduce the difficulties.
PMCID: PMC3115428  PMID: 21319906
CES-D; depression; reversed item format; older adults
13.  Development of A Promis Item Bank to Measure Pain Interference 
Pain  2010;150(1):173-182.
This paper describes the psychometric properties of the PROMIS Pain Interference (PROMIS-PI) bank. An initial candidate item pool (n=644) was developed and evaluated based on review of existing instruments, interviews with patients, and consultation with pain experts. From this pool, a candidate item bank of 56 items was selected and responses to the items were collected from large community and clinical samples. A total of 14,848 participants responded to all or a subset of candidate items. The responses were calibrated using an item response theory (IRT) model. A final 41-item bank was evaluated with respect to IRT assumptions, model fit, differential item function (DIF), precision, and construct and concurrent validity. Items of the revised bank had good fit to the IRT model (CFI and NNFI/TLI ranged from 0.974 to 0.997), and the data were strongly unidimensional (e.g., ratio of first and second eigenvalue = 35). Nine items exhibited statistically significant DIF. However, adjusting for DIF had little practical impact on score estimates and the items were retained without modifying scoring. Scores provided substantial information across levels of pain; for scores in the T-score range 50-80, the reliability was equivalent to 0.96 to 0.99. Patterns of correlations with other health outcomes supported the construct validity of the item bank. The scores discriminated among persons with different numbers of chronic conditions, disabling conditions, levels of self-reported health, and pain intensity (p< 0.0001). The results indicated that the PROMIS-PI items constitute a psychometrically sound bank. Computerized adaptive testing and short forms are available.
PMCID: PMC2916053  PMID: 20554116
Quality-of-life outcomes; quality-of-life measurement; pain
14.  Adaptive Short Forms for Outpatient Rehabilitation Outcome Assessment 
To develop outpatient adaptive short forms (ASFs) for the Activity Measure for Post-Acute Care (AM-PAC) item bank for use in outpatient therapy settings.
A convenience sample of 11,809 adults with spine, lower extremity, upper extremity and miscellaneous orthopedic impairments who received outpatient rehabilitation in one of 127 outpatient rehabilitation clinics in the US. We identified optimal items for use in developing outpatient ASFs based on the Basic Mobility and Daily Activities domains of the AM-PAC item bank. Patient scores were derived from the AM-PAC computerized adaptive testing (CAT) program. Items were selected for inclusion on the ASFs based on functional content, range of item coverage, measurement precision, item exposure rate, and data collection burden.
Two outpatient ASFs were developed: 1) an 18-item Basic Mobility ASF and 2) a 15-item Daily Activities ASF, derived from the same item bank used to develop the AM-PAC-CAT. Both ASFs achieved acceptable psychometric properties.
In outpatient PAC settings where CAT outcome applications are currently not feasible, IRT-derived ASFs provide the efficient capability to monitor patients’ functional outcomes. The development of ASF functional outcome instruments linked by a common, calibrated item bank has the potential to create a bridge to outcome monitoring across PAC settings and can facilitate the eventual transformation from ASFs to CAT applications easier and more acceptable to the rehabilitation community.
PMCID: PMC3947754  PMID: 18806511
Outcomes Assessment; Rehabilitation; Item Response Theory; Physical Functioning
NeuroRehabilitation  2009;24(1):75-85.
Neuropsychological evaluations conducted in the United States and abroad commonly include the use of tests translated from English to Spanish. The use of translated naming tests for evaluating predominately Spanish-speakers has recently been challenged on the grounds that translating test items may compromise a test’s construct validity. The Texas Spanish Naming Test (TNT) has been developed in Spanish specifically for use with Spanish-speakers; however, it is unlikely patients from diverse Spanish-speaking geographical regions will perform uniformly on a naming test. The present study evaluated and compared the internal consistency and patterns of item-difficulty and -discrimination for the TNT and two commonly used translated naming tests in three countries (i.e., United States, Colombia, Spain). Two hundred fifty two subjects (126 demented, 116 nondemented) across three countries were administered the TNT, Modified Boston Naming Test-Spanish, and the naming subtest from the CERAD. The TNT demonstrated superior internal consistency to its counterparts, a superior item difficulty pattern than the CERAD naming test, and a superior item discrimination pattern than the MBNT-S across countries. Overall, all three Spanish naming tests differentiated nondemented and moderately demented individuals, but the results suggest the items of the TNT are most appropriate to use with Spanish-speakers. Preliminary normative data for the three tests examined in each country are provided.
PMCID: PMC2666471  PMID: 19208960
16.  Development and Reliability of a Clinician-rated Instrument to Evaluate Function in Individuals with Shoulder Pain: A Preliminary Study 
Background and Purpose
Subacromial impingement syndrome (SIS) is a common and disabling condition in the population. Interventions are often evaluated with patient-rated outcome measures. The purpose of this study was to develop a simple clinician-rated measure to detect difficulties in the execution of movement-related tasks among patients with subacromial impingement syndrome.
The steps in the scale development included a review of the clinical literature of shoulder pain to identify condition-specific questionnaires, pilot testing, clinical testing and scale construction. Twenty-one eligible items from thirteen questionnaires were extracted and included in a pilot test. All items were scored on a five-point ordinal scale ranging from 1 (no difficulty) to 5 (cannot perform). Fourteen items were excluded after pilot testing because of difficulties in standardization or other practical considerations. The remaining seven items were included in a clinical test-retest study with outpatients at a hospital. Of these, four were excluded because of psychometric reasons. From the remaining three items, a measure named Shoulder Activity Scale (summed score ranging from 3 to 15) was developed.
A total of 33 men and 30 women were included in the clinical study; age range 27–80 years. The intraclass correlation coefficient results for inter-rater reliability and test-retest reliability were 0.80 (95% CI = 0.51–0.90) and 0.74 (95% CI = 0.58–0.84), respectively. The standard error of measurement and minimal detectable change were 1.19 and 3.32, respectively. The scale was linked to the International Classification of Functioning, Disability and Health second level categories lifting and carrying objects (d430), dressing (d540), hand and arm use (d445) and control of voluntary movement (b760).
The Shoulder Activity Scale showed acceptable reliability in a sample of outpatients at a hospital, rated by clinicians experienced in shoulder rehabilitation. The validity of the scale should be investigated in future studies before application to common practice. © 2013 The Authors. Physiotherapy Research International published by John Wiley & Sons Ltd.
PMCID: PMC4286020  PMID: 23716317
17.  Patient reports of health outcome for adults living with sickle cell disease: development and testing of the ASCQ-Me item banks 
Providers and patients have called for improved understanding of the health care requirements of adults with sickle cell disease (SCD) and have identified the need for a systematic, reliable and valid method to document the patient-reported outcomes (PRO) of adult SCD care. To address this need, the Adult Sickle Cell Quality of Life Measurement System (ASCQ-Me) was designed to complement the Patient Reported Outcome Measurement Information System (PROMIS®). Here we describe methods and results of the psychometric evaluation of ASCQ-Me item banks (IBs).
At seven geographically-disbursed clinics within the US, 556 patients responded to questions generated to assess cognitive, emotional, physical and social impacts of SCD. We evaluated the construct validity of the hypothesized domains using exploratory factor analysis (EFA), parallel analysis (PA), and bi-factor analysis (Item Response Theory Graded Response Model, IRT-GRM). We used IRT-GRM and the Wald method to identify bias in responses across gender and age. We used IRT and Cronbach’s alpha coefficient to evaluate the reliability of the IBs and then tested the ability of summary scores based on IRT calibrations to discriminate among tertiles of respondents defined by SCD severity.
Of the original 140 questions tested, we eliminated 48 that either did not form clean factors or provided biased measurement across subgroups defined by age and gender. Via EFA and PA, we identified three subfactors within physical impact: sleep, pain and stiffness impacts. Analysis of the resulting six item sets (sleep, pain, stiffness, cognitive, emotional and social impacts of SCD) supported their essential unidimensionality. With the exception of the cognitive impact IB, these item sets also were highly reliable across a broad range of values and highly significantly related to SCD disease severity.
ASCQ-Me pain, sleep, stiffness, emotional and social SCD impact IBs demonstrated exceptional measurement properties using modern and classical psychometric methods of evaluation. Further development of the cognitive impact IB is required to improve its sensitivity to differences in SCD disease severity. Future research will evaluate the sensitivity of the ASCQ-Me IBs to change in SCD disease severity over time due to health interventions.
PMCID: PMC4243820  PMID: 25146160
ASCQ-Me; Computer adaptive testing; CAT; Item response theory; IRT; Patient-reported outcomes; PROs; Sickle cell disease; Validity(up to 10 allowed)
18.  Development of the Two Stage Rapid Estimate of Adult Literacy in Dentistry (TS-REALD) 
This work proposes a revision of the 30 item Rapid Estimate of Adult Literacy in Dentistry (REALD-30), into a more efficient and easier-to-use two-stage scale. Using a sample of 1,405 individuals (primarily women) enrolled in a Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), the present work utilizes principles of item response theory and multi-stage testing to revise the REALD-30 into a two-stage test of oral health literacy, named Two-Stage REALD or TS-REALD, which maximizes score precision at various levels of participant ability. Based on the participant’s score on the 5-item first-stage (i.e., routing test), one of three potential stage-two tests is administered: a 4-item Low Literacy test, a 6-item Average Literacy test, or a 3-item High Literacy test. The reliability of scores for the TS-REALD is greater than .85 for a wide range of ability. The TS-REALD was found to be predictive of perceived impact of oral conditions on well-being, after controlling for educational level, overall health, dental health, and a general health literacy measure. While containing approximately one-third of the items on the original scale, the TS-REALD was found to maintain similar psychometric qualities.
PMCID: PMC3165105  PMID: 21592170
Dental Health Literacy; Dental Care; Oral Health Quality of Life; Health Literacy; Psychometrics
19.  Differential Item Functioning of the Boston Naming Test in Cognitively Normal African American and Caucasian Older Adults 
Scores on the Boston Naming Test (BNT) are frequently lower for African American when compared to Caucasian adults. Although demographically-based norms can mitigate the impact of this discrepancy on the likelihood of erroneous diagnostic impressions, a growing consensus suggests that group norms do not sufficiently address or advance our understanding of the underlying psychometric and sociocultural factors that lead to between-group score discrepancies. Using item response theory and methods to detect differential item functioning (DIF), the current investigation moves beyond comparisons of the summed total score to examine whether the conditional probability of responding correctly to individual BNT items differs between African American and Caucasian adults. Participants included 670 adults age 52 and older who took part in Mayo's Older Americans and Older African Americans Normative Studies. Under a 2-parameter logistic IRT framework and after correction for the false discovery rate, 12 items where shown to demonstrate DIF. Six of these 12 items (“dominoes,” “escalator,” “muzzle,” “latch,” “tripod,” and “palette”) were also identified in additional analyses using hierarchical logistic regression models and represent the strongest evidence for race/ethnicity-based DIF. These findings afford a finer characterization of the psychometric properties of the BNT and expand our understanding of between-group performance.
PMCID: PMC2835360  PMID: 19570311
Boston Naming Test; Item response theory; Differential item functioning; Ethnicity; Race; Bias
20.  Iranian Effective Clinical Nurse Instructor evaluation tool: Development and psychometric testing 
Clinical education is the heart of the nursing education program. Effective nursing clinical instructors are needed for graduating the future qualified nurses. There is a well-developed body of knowledge about the effectiveness of clinical teaching and the instructors. However, translating this knowledge into a context-based evaluation tool for measuring the effectiveness of Iranian clinical nursing instructors remains a deficiency. The purpose of this study is to describe the development and psychometric testing process of an instrument to evaluate the characteristics of Iranian effective clinical nurse instructor.
Materials and Methods:
Following a precise review of Iranian literatures and expert consultation, 83 statements about the characteristics that make clinical nurse instructors effective were extracted. In the next phase, the psychometric properties of the instrument were established by looking at the content validity, face validity, and internal consistency. Content validity of the instrument was assessed based on the comments of an expert panel including 10 nursing faculty members. During this phase, 30 items of the instrument were omitted or merged. Face validity of the instrument was assured based on the advices of 10 nursing students and 10 nursing faculty members. Finally, in the pilot test, the data of 168 filled questionnaires were gathered and analyzed by an exploratory factor analysis to reduce the items and identify the factor structure of the instrument.
Through subsequent analyses, of the 83 items, 31 items were merged or omitted. At last, 52 retained items were divided into four subscales including student-centric behaviors, clinical performances, planning ability, and personality traits. The Cronbach's alpha level of the inventory was 0.96, with the value for each domain ranging from 0.87 to 0.94.
Iranian Effective Clinical Nurse Instructor evaluation tool has acceptable psychometric properties and can be used in evaluating the effectiveness of clinical nursing instructors.
PMCID: PMC4020021  PMID: 24834081
Clinical education; effective instructor; evaluation tool; Iran; nursing
21.  The Health and Functioning ICF-60: Development and Psychometric Properties 
This paper describes the development and psychometric properties of the Health and Functioning ICF-60 (HF-ICF-60) measure, based on the World Health Organization (WHO) ‘International Classification of Functioning, Disability and Health: ICF’ (2001). The aims of the present study were to test psychometric properties of the HF-ICF-60, developed as a measure that would be responsive to change in functioning through changes in health and nutritional status, as a prospective measure to monitor health and nutritional status of populations and to explore the relationship of the HF-ICF-60 with quality of life measures such as the World Health Organization WHOQOL-BREF quality of life assessment in relation to non-communicable diseases.
The HF-ICF-60 measure consists of 60 items selected from the ICF by an expert panel, which included 18 items that cover Body Functions, 21 items that cover Activities and Participation, rated on five-point scales, and 21 items that cover Environmental Factors (seven items cover Individual Environmental Factors and 14 items cover Societal Environmental Factors), rated on nine-point scales. The HF-ICF-60 measure was administered to the Russian nationally representative sample within the Russian National Population Quality of Life, Health and Nutrition Survey, in 2004 (n = 9807) and 2005 (n = 9560), as part of the two waves of the Russian Longitudinal Monitoring Survey (RLMS). The statistical analyses were carried out with the use of both classical and modern psychometric methods, such as factor analysis, and based on Item Response Theory, respectively.
The HF-ICF-60 questionnaire is a new measure derived directly from the ICF and covers the ICF components as follows: Body Functions, Activities and Participation, and Environmental Factors (Individual Environmental Factors and Societal Environmental Factors). The results from the factor analyses (both Exploratory Factor Analyses and Confirmatory Factor Analyses) show good support for the proposed structure together with an overall higher-order factor for each scale of the measure. The measure has good reliability and validity, and sensitivity to change in the health and nutritional status of respondents over time. Normative values were developed for the Russian adult population.
The HF-ICF-60 has shown good psychometric properties in the two waves of the nationally representative RLMS, which provided considerable support to using the HF-ICF-60 data as the normative health and functioning values for the Russian population. Similarly, the administration of the WHOQOL-BREF in the same two waves of the nationally representative RLMS has allowed the normative quality of life values for the Russian population to be obtained. Therefore, the objective assessment of health and functioning of the HF-ICF-60 could be mapped onto the subjective evaluation of quality of life of the WHOQOL-BREF to increase the potential usefulness of the surveys in relation to non-communicable diseases. © 2014 The Authors. Clinical Psychology & Psychotherapy. Published by John Wiley & Sons, Ltd.
Key Practitioner Message
The HF-ICF-60 offers a new perspective in measuring change in functioning through changes in lifestyle and diet.
The HF-ICF-60 can be combined with the WHOQOL-BREF to map the objective assessment of health and functioning onto the subjective evaluation of quality of life.
Combined use of the HF-ICF-60 and the WHOQOL-BREF can be especially useful for national and global monitoring and surveillance of implementation of measures to reduce risk factors of non-communicable diseases and to promote healthy lifestyles and healthy diets.
PMCID: PMC4232882  PMID: 24931300
ICF; Health and Functioning; Nutrition; Quality of Life; Psychometrics; Population Surveys
22.  Rasch Analysis of the Fullerton Advanced Balance (FAB) Scale 
Physiotherapy Canada  2011;63(1):115-125.
Purpose: This cross-sectional study explores the psychometric properties and dimensionality of the Fullerton Advanced Balance (FAB) Scale, a multi-item balance test for higher-functioning older adults.
Methods: Participants (n=480) were community-dwelling adults able to ambulate independently. Data gathering consisted of survey and balance performance assessment. Psychometric properties were assessed using Rasch analysis.
Results: Mean age of participants was 76.4 (SD=7.1) years. Mean FAB Scale scores were 24.7/40 (SD=7.5). Analyses for scale dimensionality showed that 9 of the 10 items fit a unidimensional measure of balance. Item 10 (Reactive Postural Control) did not fit the model. The reliability of the scale to separate persons was 0.81 out of 1.00; the reliability of the scale to separate items in terms of their difficulty was 0.99 out of 1.00. Cronbach's alpha for a 10-item model was 0.805. Items of differing difficulties formed a useful ordinal hierarchy for scaling patterns of expected balance ability scoring for a normative population.
Conclusion: The FAB Scale appears to be a reliable and valid tool to assess balance function in higher-functioning older adults. The test was found to discriminate among participants of varying balance abilities. Further exploration of concurrent validity of Rasch-generated expected item scoring patterns should be undertaken to determine the test's diagnostic and prescriptive utility.
PMCID: PMC3024205  PMID: 22210989
aged; balance; fall risk assessment tool; falls; psychometrics; FAB Scale; aînés; chutes; équilibre; outil d'évaluation du risque de chute; psychométrie
23.  Development and preliminary psychometric testing of a new OA pain measure – an OARSI/OMERACT initiative 
Osteoarthritis and Cartilage  2008;16(4):409-414.
To evaluate the measurement properties of a new osteoarthritis (OA) pain measure.
The new tool, comprised of 12 questions on constant vs intermittent pain was administered by phone to 100 subjects aged 40+ years with hip or knee OA, followed by three global hip/knee questions, the Western Ontario and McMaster Universities (WOMAC) pain subscale, the symptom subscales of the Hip Disability and OA Outcome Score (HOOS) or Knee Injury and OA Outcome Score (KOOS), and the limitation dimension of the Late Life Function and Disability Instrument (LLFDI). Test-retest reliability was assessed by re-administration after 48–96 h. Item response distributions, inter-item correlations, item-total correlations and Cronbach's alpha were assessed. Principle component analysis was performed and test-retest reliability was assessed by intra-class correlation coefficient (ICC).
There was good distribution of response options across all items. The mean intensity was higher for intermittent vs constant pain, indicating subjects could distinguish the two concepts. Inter-item correlations ranged from 0.37 to 0.76 indicating no item redundancy. One item, predictability of pain, was removed from subsequent analyses as correlations with other items and item-total correlations were low. The 11-item scale had a corrected inter-item correlation range of 0.54–0.81 with Cronbach's alpha of 0.93 for the combined sample. Principle components analysis demonstrated factorial complexity. As such, scoring was based on the summing of individual items. Test-retest reliability was excellent (ICC 0.85). The measure was significantly correlated with each of the other measures [Spearman correlations −0.60 (KOOS symptoms) to 0.81 (WOMAC pain scale)], except the LLFDI, where correlations were low.
Preliminary psychometric testing suggests this OA pain measure is reliable and valid.
PMCID: PMC3268063  PMID: 18381179
Osteoarthritis; Hip; Knee; Pain; Outcome measure; Validation; Instrument development
24.  Development and assessment of floor and ceiling items for the PROMIS physical function item bank 
Arthritis Research & Therapy  2013;15(5):R144.
Disability and Physical Function (PF) outcome assessment has had limited ability to measure functional status at the floor (very poor functional abilities) or the ceiling (very high functional abilities). We sought to identify, develop and evaluate new floor and ceiling items to enable broader and more precise assessment of PF outcomes for the NIH Patient-Reported-Outcomes Measurement Information System (PROMIS).
We conducted two cross-sectional studies using NIH PROMIS item improvement protocols with expert review, participant survey and focus group methods. In Study 1, respondents with low PF abilities evaluated new floor items, and those with high PF abilities evaluated new ceiling items for clarity, importance and relevance. In Study 2, we compared difficulty ratings of new floor items by low functioning respondents and ceiling items by high functioning respondents to reference PROMIS PF-10 items. We used frequencies, percentages, means and standard deviations to analyze the data.
In Study 1, low (n = 84) and high (n = 90) functioning respondents were mostly White, women, 70 years old, with some college, and disability scores of 0.62 and 0.30. More than 90% of the 31 new floor and 31 new ceiling items were rated as clear, important and relevant, leaving 26 ceiling and 30 floor items for Study 2. Low (n = 246) and high (n = 637) functioning Study 2 respondents were mostly White, women, 70 years old, with some college, and Health Assessment Questionnaire (HAQ) scores of 1.62 and 0.003. Compared to difficulty ratings of reference items, ceiling items were rated to be 10% more to greater than 40% more difficult to do, and floor items were rated to be about 12% to nearly 90% less difficult to do.
These new floor and ceiling items considerably extend the measurable range of physical function at either extreme. They will help improve instrument performance in populations with broad functional ranges and those concentrated at one or the other extreme ends of functioning. Optimal use of these new items will be assisted by computerized adaptive testing (CAT), reducing questionnaire burden and insuring item administration to appropriate individuals.
PMCID: PMC3978724  PMID: 24286166
25.  A functional difficulty and functional pain instrument for hip and knee osteoarthritis 
Arthritis Research & Therapy  2009;11(4):R107.
The objectives of this study were to develop a functional outcome instrument for hip and knee osteoarthritis research (OA-FUNCTION-CAT) using item response theory (IRT) and computer adaptive test (CAT) methods and to assess its psychometric performance compared to the current standard in the field.
We conducted an extensive literature review, focus groups, and cognitive testing to guide the construction of an item bank consisting of 125 functional activities commonly affected by hip and knee osteoarthritis. We recruited a convenience sample of 328 adults with confirmed hip and/or knee osteoarthritis. Subjects reported their degree of functional difficulty and functional pain in performing each activity in the item bank and completed the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). Confirmatory factor analyses were conducted to assess scale uni-dimensionality, and IRT methods were used to calibrate the items and examine the fit of the data. We assessed the performance of OA-FUNCTION-CATs of different lengths relative to the full item bank and WOMAC using CAT simulation analyses.
Confirmatory factor analyses revealed distinct functional difficulty and functional pain domains. Descriptive statistics for scores from 5-, 10-, and 15-item CATs were similar to those for the full item bank. The 10-item OA-FUNCTION-CAT scales demonstrated a high degree of accuracy compared with the item bank (r = 0.96 and 0.89, respectively). Compared to the WOMAC, both scales covered a broader score range and demonstrated a higher degree of precision at the ceiling and reliability across the range of scores.
The OA-FUNCTION-CAT provided superior reliability throughout the score range and improved breadth and precision at the ceiling compared with the WOMAC. Further research is needed to assess whether these improvements carry over into superior ability to measure change.
PMCID: PMC2745788  PMID: 19589168

Results 1-25 (1024746)