PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1137397)

Clipboard (0)
None

Related Articles

1.  Reference values for generic instruments used in routine outcome monitoring: the leiden routine outcome monitoring study 
BMC Psychiatry  2012;12:203.
Introduction
The Brief Symptom Inventory (BSI), Mood & Anxiety Symptom Questionnaire −30 (MASQ-D30), Short Form Health Survey 36 (SF-36), and Dimensional Assessment of Personality Pathology-Short Form (DAPP-SF) are generic instruments that can be used in Routine Outcome Monitoring (ROM) of patients with common mental disorders. We aimed to generate reference values usually encountered in 'healthy' and ‘psychiatrically ill’ populations to facilitate correct interpretation of ROM results.
Methods
We included the following specific reference populations: 1294 subjects from the general population (ROM reference group) recruited through general practitioners, and 5269 psychiatric outpatients diagnosed with mood, anxiety, or somatoform (MAS) disorders (ROM patient group). The outermost 5% of observations were used to define limits for one-sided reference intervals (95th percentiles for BSI, MASQ-D30 and DAPP-SF, and 5th percentiles for SF-36 subscales). Internal consistency and Receiver Operating Characteristics (ROC) analyses were performed.
Results
Mean age for the ROM reference group was 40.3 years (SD=12.6) and 37.7 years (SD=12.0) for the ROM patient group. The proportion of females was 62.8% and 64.6%, respectively. The mean for cut-off values of healthy individuals was 0.82 for the BSI subscales, 23 for the three MASQ-D30 subscales, 45 for the SF-36 subscales, and 3.1 for the DAPP-SF subscales. Discriminative power of the BSI, MASQ-D30 and SF-36 was good, but it was poor for the DAPP-SF. For all instruments, the internal consistency of the subscales ranged from adequate to excellent.
Discussion and conclusion
Reference values for the clinical interpretation were provided for the BSI, MASQ-D30, SF-36, and DAPP-SF. Clinical information aided by ROM data may represent the best means to appraise the clinical state of psychiatric outpatients.
doi:10.1186/1471-244X-12-203
PMCID: PMC3551660  PMID: 23171272
Reference values; Routine outcome monitoring; Questionnaires; Mood disorders; Anxiety disorders; Somatoform disorders
2.  Validation of the Mood and Anxiety Symptom Questionnaire in Korean Adolescents 
Psychiatry Investigation  2014;12(2):218-226.
Objective
The tripartite model categorizes symptoms of depression and anxiety into three groups: 1) non-specific general distress that is shared between depression and anxiety, 2) depression-specific symptoms that include low positive affect and loss of interest, and 3) anxiety-specific symptoms that include somatic arousal. The Mood and Anxiety Symptoms Questionnaire (MASQ) was developed to measure these three factors of depression and anxiety. The purpose of the present study was to test the psychometric properties of the Korean version of the MASQ (K-MASQ) in adolescents.
Methods
Community-dwelling adolescents (n=933) were randomly assigned to two groups. Exploratory factor analysis and confirmatory factor analysis were conducted in each group to identify the factor structure of the K-MASQ. The reliability and validity of the K-MASQ were also evaluated.
Results
Our results support the three-factor structure of the K-MASQ in adolescents. However, we found that the specific items of each factor differed from those of the original MASQ. That is, the depression-specific factor was only related to low positive affect and not loss of interest, and the anxiety-specific factor included more items related to general somatic symptoms of anxiety. The reliability and validity of the K-MASQ were found to be satisfactory.
Conclusion
The K-MASQ supports the tripartite model of depression and anxiety and has satisfactory reliability and validity among Korean adolescents. The K-MASQ can be used to distinguish unique symptoms of depression and anxiety in Korean adolescents.
doi:10.4306/pi.2015.12.2.218
PMCID: PMC4390593  PMID: 25866523
Anxiety; Depression; Assessment; Adolescent; Mood and Anxiety Symptom Questionnaire; Korea
3.  Screening for depressive disorders using the MASQ anhedonic depression scale: A receiver-operator characteristic analysis 
Psychological assessment  2010;22(3):702-710.
The present study examined the utility of the anhedonic depression scale from the Mood and Anxiety Symptoms Questionnaire (MASQ-AD) as a way to screen for depressive disorders. Using receiver-operator characteristic analysis, the sensitivity and specificity of the full 22-item MASQ-AD scale, as well as the 8 and 14-item subscales, were examined in relation to both current and lifetime DSM-IV depressive disorder diagnoses in two nonpatient samples. As a means of comparison, the sensitivity and specificity of a measure of a relevant personality dimension, neuroticism, was also examined. Results from both samples support the clinical utility of the MASQ-AD scale as a means of screening for depressive disorders. Findings were strongest for the MASQ-AD 8-item subscale and when predicting current depression status. Furthermore, the MASQ-AD 8-item subscale outperformed the neuroticism measure under certain conditions. The overall usefulness of the MASQ-AD scale as a screening device is discussed, as well as possible cutoff scores for use in research.
doi:10.1037/a0019915
PMCID: PMC2992834  PMID: 20822283
depressive disorders; anhedonic depression; Mood and Anxiety Symptoms Questionnaire; receiver-operator characteristic analysis; screening
4.  Clinical utility of the Mood and Anxiety Symptom Questionnaire (MASQ) in a sample of young help-seekers 
BMC Psychiatry  2007;7:50.
Background
The overlap between Depression and Anxiety has led some researchers to conclude that they are manifestations of a broad, non-specific neurotic disorder. However, others believe that they can be distinguished despite sharing symptoms of general distress. The Tripartite Model of Affect proposes an anxiety-specific, a depression-specific and a shared symptoms factor. Watson and Clark developed the Mood and Anxiety Symptom Questionnaire (MASQ) to specifically measure these Tripartite constructs. Early research showed that the MASQ distinguished between dimensions of Depression and Anxiety in non-clinical samples. However, two recent studies have cautioned that the MASQ may show limited validity in clinical populations. The present study investigated the clinical utility of the MASQ in a clinical sample of adolescents and young adults.
Methods
A total of 204 Young people consecutively referred to a specialist public mental health service in Melbourne, Australia were approached and 150 consented to participate. From this, 136 participants completed both a diagnostic interview and the MASQ.
Results
The majority of the sample rated for an Axis-I disorder, with Mood and Anxiety disorders most prevalent. The disorder-specific scales of the MASQ significantly discriminated Anxiety (61.0%) and Mood Disorders (72.8%), however, the predictive accuracy for presence of Anxiety Disorders was very low (29.8%). From ROC analyses, a proposed cut-off of 76 was proposed for the depression scale to indicate 'caseness' for Mood Disorders. The resulting sensitivity/specificity was superior to that of the CES-D.
Conclusion
It was concluded that the depression-specific scale of the MASQ showed good clinical utility, but that the anxiety-specific scale showed poor discriminant validity.
doi:10.1186/1471-244X-7-50
PMCID: PMC2151061  PMID: 17868477
5.  An accurate and efficient identification of children with psychosocial problems by means of computerized adaptive testing 
Background
Questionnaires used by health services to identify children with psychosocial problems are often rather short. The psychometric properties of such short questionnaires are mostly less than needed for an accurate distinction between children with and without problems. We aimed to assess whether a short Computerized Adaptive Test (CAT) can overcome the weaknesses of short written questionnaires when identifying children with psychosocial problems.
Method
We used a Dutch national data set obtained from parents of children invited for a routine health examination by Preventive Child Healthcare with 205 items on behavioral and emotional problems (n = 2,041, response 84%). In a random subsample we determined which items met the requirements of an Item Response Theory (IRT) model to a sufficient degree. Using those items, item parameters necessary for a CAT were calculated and a cut-off point was defined. In the remaining subsample we determined the validity and efficiency of a Computerized Adaptive Test using simulation techniques, with current treatment status and a clinical score on the Total Problem Scale (TPS) of the Child Behavior Checklist as criteria.
Results
Out of 205 items available 190 sufficiently met the criteria of the underlying IRT model. For 90% of the children a score above or below cut-off point could be determined with 95% accuracy. The mean number of items needed to achieve this was 12. Sensitivity and specificity with the TPS as a criterion were 0.89 and 0.91, respectively.
Conclusion
An IRT-based CAT is a very promising option for the identification of psychosocial problems in children, as it can lead to an efficient, yet high-quality identification. The results of our simulation study need to be replicated in a real-life administration of this CAT.
doi:10.1186/1471-2288-11-111
PMCID: PMC3199909  PMID: 21816055
6.  Validation of Computerized Adaptive Testing in an Outpatient Non-academic Setting: the VOCATIONS Trial 
Objective
Computerized adaptive tests (CAT) provide an alternative to fixed-length assessments for diagnostic screening and severity measurement of psychiatric disorders. We sought to cross-sectionally validate a suite of computerized adaptive tests for mental health (CAT-MH) in a community psychiatric sample.
Methods
145 adult psychiatric outpatients and controls were prospectively evaluated with CAT for depression, mania and anxiety symptoms, compared to gold-standard psychiatric assessments including: Structured Clinical Interview for DSM IV-TR (SCID), Hamilton Rating Scale for Depression (HAM-D25), Patient Health Questionnaire (PHQ-9), Center for Epidemiologic Studies Depression Scale (CES-D), and Global Assessment of Functioning (GAF).
Results
Sensitivity and specificity for the computerized adaptive diagnostic test for depression (CAD-MDD) were .96 and .64, respectively (.96 and 1.00 for major depression versus controls). CAT for depression severity (CAT-DI) correlated well to standard depression scales HAM-D25 (r=.79), PHQ-9 (r=.90), CES-D (r=.90) and had OR=27.88 for current SCID major depressive disorder diagnosis across its range. CAT for anxiety severity (CAT-ANX) correlated to HAM-D25 (r=.73), PHQ-9 (r=.78), CES-D (r=.81), and had OR=11.52 for current SCID generalized anxiety disorder diagnosis across its range. CAT for mania severity (CAT-MANIA) did not correlate well to HAM-D25 (r=.31), PHQ-9 (r=.37), CES-D (r=.39), but had an OR=11.56 for a current SCID bipolar diagnosis across its range. Participants found the CAT-MH suite of tests acceptable and easy to use, averaging 51.7 items and 9.4 minutes to complete the full battery.
Conclusions
Compared to current gold-standard diagnostic and assessment measures, CAT-MH provides an effective, rapidly-administered assessment of psychiatric symptoms.
doi:10.1176/appi.ps.201400390
PMCID: PMC4910384  PMID: 26030317
7.  Assessment of self-reported negative affect in the NIH Toolbox 
Psychiatry research  2012;206(1):88-97.
We report on the selection of self-report measures for inclusion in the NIH Toolbox that are suitable for assessing the full range of negative affect including sadness, fear, and anger. The Toolbox is intended to serve as a “core battery” of assessment tools for cognition, sensation, motor function, and emotional health that will help to overcome the lack of consistency in measures used across epidemiological, observational, and intervention studies. A secondary goal of the NIH Toolbox is the identification of measures that are flexible, efficient, and precise, an agenda best fulfilled by the use of item banks calibrated with models from item response theory (IRT) and suitable for adaptive testing. Results from a sample of 1,763 respondents supported use of the adult and pediatric item banks for emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®) as a starting point for capturing the full range of negative affect in healthy individuals. Content coverage for the adult Toolbox was also enhanced by the development of a scale for somatic arousal using items from the Mood and Anxiety Symptom Questionnaire (MASQ) and scales for hostility and physical aggression using items from the Buss-Perry Aggression Questionnaire (BPAQ).
doi:10.1016/j.psychres.2012.09.034
PMCID: PMC3561498  PMID: 23083918
sadness; fear; anger; item response theory; measurement
8.  A factor analytic investigation of the Tripartite model of affect in a clinical sample of young Australians 
BMC Psychiatry  2008;8:79.
Background
The Mood and Anxiety Symptom Questionnaire (MASQ) was designed to specifically measure the Tripartite model of affect and is proposed to offer a delineation between the core components of anxiety and depression. Factor analytic data from adult clinical samples has shown mixed results; however no studies employing confirmatory factor analysis (CFA) have supported the predicted structure of distinct Depression, Anxiety and General Distress factors. The Tripartite model has not been validated in a clinical sample of older adolescents and young adults. The aim of the present study was to examine the validity of the Tripartite model using scale-level data from the MASQ and correlational and confirmatory factor analysis techniques.
Methods
137 young people (M = 17.78, SD = 2.63) referred to a specialist mental health service for adolescents and young adults completed the MASQ and diagnostic interview.
Results
All MASQ scales were highly inter-correlated, with the lowest correlation between the depression- and anxiety-specific scales (r = .59). This pattern of correlations was observed for all participants rating for an Axis-I disorder but not for participants without a current disorder (r = .18). Confirmatory factor analyses were conducted to evaluate the model fit of a number of solutions. The predicted Tripartite structure was not supported. A 2-factor model demonstrated superior model fit and parsimony compared to 1- or 3-factor models. These broad factors represented Depression and Anxiety and were highly correlated (r = .88).
Conclusion
The present data lend support to the notion that the Tripartite model does not adequately explain the relationship between anxiety and depression in all clinical populations. Indeed, in the present study this model was found to be inappropriate for a help-seeking community sample of older adolescents and young adults.
doi:10.1186/1471-244X-8-79
PMCID: PMC2561028  PMID: 18799017
9.  Using Computerized Adaptive Testing to Reduce the Burden of Mental Health Assessment 
Objective
This study investigated the combination of item response theory and computerized adaptive testing (CAT) for psychiatric measurement as a means of reducing the burden of research and clinical assessments.
Methods
Data were from 800 participants in outpatient treatment for a mood or anxiety disorder; they completed 616 items of the 626-item Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to design and evaluate a CAT version of the MASS by using post hoc simulation. The second confirmed the functioning of CAT in live testing.
Results
Tests of competing models based on item response theory supported the scale’s bifactor structure, consisting of a primary dimension and four group factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Both simulated and live CAT showed a 95% average reduction (585 items) in items administered (24 and 30 items, respectively) compared with administration of the full MASS. The correlation between scores on the full MASS and the CAT version was .93. For the mood disorder subscale, differences in scores between two groups of depressed patients—one with bipolar disorder and one without—on the full scale and on the CAT showed effect sizes of .63 (p<.003) and 1.19 (p<.001) standard deviation units, respectively, indicating better discriminant validity for CAT.
Conclusions
Instead of using small fixed-length tests, clinicians can create item banks with a large item pool, and a small set of the items most relevant for a given individual can be administered with no loss of information, yielding a dramatic reduction in administration time and patient and clinician burden.
doi:10.1176/appi.ps.59.4.361
PMCID: PMC2916927  PMID: 18378832
10.  Development of a Computerized Adaptive Test for Depression 
Archives of general psychiatry  2012;69(11):1104-1112.
Context
Unlike other areas of medicine, psychiatry is almost entirely dependent on patient report to assess the presence and severity of disease; therefore, it is particularly crucial that we find both more accurate and efficient means of obtaining that report.
Objective
To develop a computerized adaptive test (CAT) for depression, called the Computerized Adaptive Test–Depression Inventory (CAT-DI), that decreases patient and clinician burden and increases measurement precision.
Design
Case-control study.
Setting
A psychiatric clinic and community mental health center.
Participants
A total of 1614 individuals with and without minor and major depression were recruited for study.
Main Outcome Measures
The focus of this study was the development of the CAT-DI. The 24-item Hamilton Rating Scale for Depression, Patient Health Questionnaire 9, and the Center for Epidemiologic Studies Depression Scale were used to study the convergent validity of the new measure, and the Structured Clinical Interview for DSM-IV was used to obtain diagnostic classifications of minor and major depressive disorder.
Results
A mean of 12 items per study participant was required to achieve a 0.3 SE in the depression severity estimate and maintain a correlation of r=0.95 with the total 389-item test score. Using empirically derived thresholds based on a mixture of normal distributions, we found a sensitivity of 0.92 and a specificity of 0.88 for the classification of major depressive disorder in a sample consisting of depressed patients and healthy controls. Correlations on the order of r=0.8 were found with the other clinician and self-rating scale scores. The CAT-DI provided excellent discrimination throughout the entire depressive severity continuum (minor and major depression), whereas the traditional scales did so primarily at the extremes (eg, major depression).
Conclusions
Traditional measurement fixes the number of items administered and allows measurement uncertainty to vary. In contrast, a CAT fixes measurement uncertainty and allows the number of items to vary. The result is a significant reduction in the number of items needed to measure depression and increased precision of measurement.
doi:10.1001/archgenpsychiatry.2012.14
PMCID: PMC3551289  PMID: 23117634
11.  Psychometric Properties of the PROMIS® Pediatric Scales: Precision, Stability, and Comparison of Different Scoring and Administration Options 
Objectives
The objectives of the present study are to investigate the precision of static (fixed-length) short forms versus computerized adaptive testing (CAT) administration, response pattern scoring versus summed score conversion, and test-retest reliability (stability) of the Patient Reported Outcomes Measurement Information System (PROMIS®) pediatric self-report scales measuring the latent constructs of depressive symptoms, anxiety, anger, pain interference, peer relationships, fatigue, mobility, upper extremity functioning and asthma impact with polytomous items.
Methods
Participants (N = 331) between the ages of 8 and 17 were recruited from outpatient general pediatrics and subspecialty clinics. Of the 331 participants, 137 were diagnosed with asthma. Three scores based on item response theory (IRT) were computed for each respondent: CAT response pattern expected a posteriori estimates, short form response pattern expected a posteriori estimates, and short form summed score expected a posteriori estimates. Scores were also compared between participants with and without asthma. To examine test-retest reliability, 54 children were selected for retesting approximately two weeks after the first assessment.
Results
A short CAT (maximum 12 items with a standard error of 0.4) was found, on average, to be less precise than the static short forms. The CAT appears to have limited usefulness over and above what can be accomplished with existing static short forms (8–10 items). Stability of the scale scores over a two week period was generally supported.
Conclusions
The study provides further information on the psychometric properties of the PROMIS pediatric scales and extends the previous IRT analyses to include precision estimates of dynamic versus static administration, test-retest reliability, and validity of administration across groups. Both the positive and negative aspects of using CAT vs. short forms are highlighted.
doi:10.1007/s11136-013-0544-0
PMCID: PMC4312615  PMID: 24085345
PROMIS; pediatrics; self-report; patient reported outcomes; item response theory; computerized adaptive testing
12.  Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms 
Purpose
Short-form patient-reported outcome measures are popular because they minimize patient burden. We assessed the efficiency of static short forms and computer adaptive testing (CAT) using data from the Patient-Reported Outcomes Measurement Information System (PROMIS) project.
Methods
We evaluated the 28-item PROMIS depressive symptoms bank. We used post hoc simulations based on the PROMIS calibration sample to compare several short-form selection strategies and the PROMIS CAT to the total item bank score.
Results
Compared with full-bank scores, all short forms and CAT produced highly correlated scores, but CAT outperformed each static short form in almost all criteria. However, short-form selection strategies performed only marginally worse than CAT. The performance gap observed in static forms was reduced by using a two-stage branching test format.
Conclusions
Using several polytomous items in a calibrated unidimensional bank to measure depressive symptoms yielded a CAT that provided marginally superior efficiency compared to static short forms. The efficiency of a two-stage semi-adaptive testing strategy was so close to CAT that it warrants further consideration and study.
doi:10.1007/s11136-009-9560-5
PMCID: PMC2832176  PMID: 19941077
Computer adaptive testing; PROMIS; Item response theory; Short form; Two-stage testing
13.  Development of a Computerized Adaptive Test to Assess Health-related Quality of Life in Adults with Asthma 
The Journal of Asthma  2011;49(2):190-200.
Objective
The purpose of this research was to calibrate an item bank for a computerized adaptive test (CAT) of asthma impact on health-related quality of life (HRQOL), test CAT versions of varying lengths, conduct preliminary validity testing, and evaluate item bank readability.
Methods
Asthma Impact Survey (AIS) bank items that passed focus group, cognitive testing, and clinical and psychometric reviews were administered to adults with varied levels of asthma control. Adults self-reporting asthma (N=1106) completed an Internet survey including 88 AIS items, the Asthma Control Test (ACT), and other HRQOL outcome measures. Data were analyzed using classical and modern psychometric methods, real-data CAT simulations, and known groups validity testing.
Results
A bi-factor model with a general factor (asthma impact) and several group factors (cognitive function, fatigue, mental health, physical function, role function, sexual function, self-consciousness/stigma, sleep, and social function) was tested. Loadings on the general factor were above 0.5 and were substantially larger than group factor loadings, and fit statistics were acceptable. Item functioning for most items and fit to the model was acceptable. CAT simulations demonstrated several options for administration and stopping rules. AIS distinguished between respondents with differing levels of asthma control.
Conclusions
The new 50-item AIS item bank demonstrated favorable psychometric characteristics, preliminary evidence of validity, and accessibility at moderate reading levels. Developing item banks for CAT can improve the precise, efficient, and comprehensive monitoring of asthma outcomes, and may facilitate patient-centered care.
doi:10.3109/02770903.2011.633674
PMCID: PMC3320653  PMID: 22115275
asthma control; Asthma Impact Survey; item response theory; patient-reported outcome; health-related quality of life
14.  Construct Validation of a Multidimensional Computerized Adaptive Test for Fatigue in Rheumatoid Arthritis 
PLoS ONE  2015;10(12):e0145008.
Objective
Multidimensional computerized adaptive testing enables precise measurements of patient-reported outcomes at an individual level across different dimensions. This study examined the construct validity of a multidimensional computerized adaptive test (CAT) for fatigue in rheumatoid arthritis (RA).
Methods
The ‘CAT Fatigue RA’ was constructed based on a previously calibrated item bank. It contains 196 items and three dimensions: ‘severity’, ‘impact’ and ‘variability’ of fatigue. The CAT was administered to 166 patients with RA. They also completed a traditional, multidimensional fatigue questionnaire (BRAF-MDQ) and the SF-36 in order to examine the CAT’s construct validity. A priori criterion for construct validity was that 75% of the correlations between the CAT dimensions and the subscales of the other questionnaires were as expected. Furthermore, comprehensive use of the item bank, measurement precision and score distribution were investigated.
Results
The a priori criterion for construct validity was supported for two of the three CAT dimensions (severity and impact but not for variability). For severity and impact, 87% of the correlations with the subscales of the well-established questionnaires were as expected but for variability, 53% of the hypothesised relations were found. Eighty-nine percent of the items were selected between one and 137 times for CAT administrations. Measurement precision was excellent for the severity and impact dimensions, with more than 90% of the CAT administrations reaching a standard error below 0.32. The variability dimension showed good measurement precision with 90% of the CAT administrations reaching a standard error below 0.44. No floor- or ceiling-effects were found for the three dimensions.
Conclusion
The CAT Fatigue RA showed good construct validity and excellent measurement precision on the dimensions severity and impact. The dimension variability had less ideal measurement characteristics, pointing to the need to recalibrate the CAT item bank with a two-dimensional model, solely consisting of severity and impact.
doi:10.1371/journal.pone.0145008
PMCID: PMC4692469  PMID: 26710104
15.  The PROMIS Physical Function Item Bank Was Calibrated to a Standardized Metric and Shown to Improve Measurement Efficiency 
Journal of clinical epidemiology  2014;67(5):516-526.
Objective
To document the development and psychometric evaluation of the Patient-Reported Outcomes Measurement Information System (PROMIS) Physical Function (PF) item bank and static instruments.
Study Design and Setting
Items were evaluated using qualitative and quantitative methods. 16,065 adults answered item subsets (n>2,200/item) on the Internet, with over-sampling of the chronically ill. Classical test and item response theory (IRT) methods were used to evaluate 149 PROMIS PF items plus 10 SF-36 and 20 HAQ-DI items. A graded response model was used to estimate item parameters, which were normed to a mean of 50 (SD=10) in a US general population sample.
Results
The final bank consists of 124 PROMIS items covering upper, central, and lower extremity functions and IADL. In simulations, a 10-item Computerized Adaptive Test (CAT) eliminated floor and decreased ceiling effects, achieving higher measurement precision than any comparable-length static tool across four standard deviations of the measurement range. Improved psychometric properties transferred to the CAT’s superior ability to identify differences between age and disease groups.
Conclusion
The item bank provides a common metric and can improve the measurement of PF by facilitating the standardization of PRO measures and implementation of CATs for more efficient PF assessments over a larger range.
doi:10.1016/j.jclinepi.2013.10.024
PMCID: PMC4465404  PMID: 24698295
Item Response Theory; Computer Adaptive Test; physical function; health status; questionnaire
16.  Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS®) in a three-month observational study 
The Patient-Reported Outcomes Measurement Information System (PROMIS®) is an NIH Roadmap initiative devoted to developing better measurement tools for assessing constructs relevant to the clinical investigation and treatment of all diseases—constructs such as pain, fatigue, emotional distress, sleep, physical functioning, and social participation. Following creation of item banks for these constructs, our priority has been to validate them, most often in short-term observational studies. We report here on a three-month prospective observational study with depressed outpatients in the early stages of a new treatment episode (with assessments at intake, one-month follow-up, and three-month follow-up). The protocol was designed to compare the psychometric properties of the PROMIS depression item bank (administered as a computerized adaptive test, CAT) with two legacy self-report instruments: the Center for Epidemiological Studies Depression scale (CESD; Radloff, 1977) and the Patient Health Questionnaire (PHQ-9; Spitzer et al., 1999). PROMIS depression demonstrated strong convergent validity with the CESD and the PHQ-9 (with correlations in a range from .72 to .84 across all time points), as well as responsiveness to change when characterizing symptom severity in a clinical outpatient sample. Identification of patients as “recovered” varied across the measures, with the PHQ-9 being the most conservative. The use of calibrations based on models from item response theory (IRT) provides advantages for PROMIS depression both psychometrically (creating the possibility of adaptive testing, providing a broader effective range of measurement, and generating greater precision) and practically (these psychometric advantages can be achieved with fewer items—a median of 4 items administered by CAT—resulting in less patient burden).
doi:10.1016/j.jpsychires.2014.05.010
PMCID: PMC4096965  PMID: 24931848
depression; item response theory; measurement; self-report; patient-reported outcomes
17.  Validation of a computer-adaptive test to evaluate generic health-related quality of life 
Background
Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument.
Methods
Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires.
Results
396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions.
Conclusions
Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings.
doi:10.1186/1477-7525-8-147
PMCID: PMC3022567  PMID: 21129169
18.  COMPUTER-ADAPTIVE BALANCE TESTING IMPROVES DISCRIMINATION BETWEEN COMMUNITY-DWELLING ELDERLY FALLERS AND NON-FALLERS 
Objective
To build an item response theory based computer-adaptive balance test (CAT) from three traditional, fixed-form balance measures: Berg Balance Scale (BBS), Performance-Oriented Mobility Assessment (POMA), and Dynamic Gait Index (DGI); and examine whether CAT psychometric performance exceeded that of individual measures.
Design
Secondary analysis combining two existing datasets.
Setting
Community-based.
Participants
187 community-dwelling older adults, 65 years or older, mean age 75.2±6.8 years, 69% female.
Interventions
Not applicable.
Main Outcome Measure(s)
BBS, POMA, and DGI items were compiled into an initial 38-item bank. Rasch Partial Credit Model was used for final item bank calibration. CAT simulations were conducted to identify the ideal CAT. CAT score accuracy, reliability, floor and ceiling effects, and validity were examined. Floor and ceiling effects and validity of CAT and individual measures were compared.
Results
A 23-item bank met model expectations. A 10-item CAT was selected, showing very strong association with full item bank scores (r=0.97), and good overall reliability (0.78). Reliability was better in low- to mid-balance ranges due to better item targeting to balance ability, compared with highest balance ranges. No floor effect was noted. CAT ceiling effect (11.2%) was significantly lower than POMA (40.1%) and DGI (40.3%) ceiling effects (p<0.0001 per comparison). The CAT outperformed individual measures, being the only test to discriminate between fallers and non-fallers (p=0.0068), and strongest predictor of self-reported function.
Conclusions
The balance CAT showed excellent accuracy, good overall reliability, and excellent validity compared with individual measures, being the only measure to discriminate between fallers and non-fallers. Prospective examination, particularly in low- functioning elderly and clinical populations with balance deficits, is recommended. Development of an improved CAT based on an expanded item bank containing higher difficulty items is also recommended.
doi:10.1016/j.apmr.2014.03.013
PMCID: PMC4090089  PMID: 24685388
computer-adaptive testing; postural balance; aged
19.  A Web-Based Computerized Adaptive Testing (CAT) to Assess Patient Perception in Hospitalization 
Background
Many hospitals have adopted mobile nursing carts that can be easily rolled up to a patient’s bedside to access charts and help nurses perform their rounds. However, few papers have reported data regarding the use of wireless computers on wheels (COW) at patients’ bedsides to collect questionnaire-based information of their perception of hospitalization on discharge from the hospital.
Objective
The purpose of this study was to evaluate the relative efficiency of computerized adaptive testing (CAT) and the precision of CAT-based measures of perceptions of hospitalized patients, as compared with those of nonadaptive testing (NAT). An Excel module of our CAT multicategory assessment is provided as an example.
Method
A total of 200 patients who were discharged from the hospital responded to the CAT-based 18-item inpatient perception questionnaire on COW. The numbers of question administrated were recorded and the responses were calibrated using the Rasch model. They were compared with those from NAT to show the advantage of CAT over NAT.
Results
Patient measures derived from CAT and NAT were highly correlated (r = 0.98) and their measurement precisions were not statistically different (P = .14). CAT required fewer questions than NAT (an efficiency gain of 42%), suggesting a reduced burden for patients. There were no significant differences between groups in terms of gender and other demographic characteristics.
Conclusions
CAT-based administration of surveys of patient perception substantially reduced patient burden without compromising the precision of measuring patients’ perceptions of hospitalization. The Excel module of animation-CAT on the wireless COW that we developed is recommended for use in hospitals.
doi:10.2196/jmir.1785
PMCID: PMC3222179  PMID: 21844001
Computerized adaptive testing; computer on wheels; classic test theory; IRT; item response theory; nonadaptive testing
20.  Computerized adaptive testing of population psychological distress: simulation-based evaluation of GHQ-30 
Purpose
Goldberg’s General Health Questionnaire (GHQ) items are frequently used to assess psychological distress but no study to date has investigated the GHQ-30’s potential for adaptive administration. In computerized adaptive testing (CAT) items are matched optimally to the targeted distress level of respondents instead of relying on fixed-length versions of instruments. We therefore calibrate GHQ-30 items and report a simulation study exploring the potential of this instrument for adaptive administration in a longitudinal setting.
Methods
GHQ-30 responses of 3445 participants with 2 completed assessments (baseline, 7-year follow-up) in the UK Health and Lifestyle Survey were calibrated using item response theory. Our simulation study evaluated the efficiency of CAT administration of the items, cross-sectionally and longitudinally, with different estimators, item selection methods, and measurement precision criteria.
Results
To yield accurate distress measurements (marginal reliability at least 0.90) nearly all GHQ-30 items need to be administered to most survey respondents in general population samples. When lower accuracy is permissible (marginal reliability of 0.80), adaptive administration saves approximately 2/3 of the items. For longitudinal applications, change scores based on the complete set of GHQ-30 items correlate highly with change scores from adaptive administrations.
Conclusions
The rationale for CAT-GHQ-30 is only supported when the required marginal reliability is lower than 0.9, which is most likely to be the case in cross-sectional and longitudinal studies assessing mean changes in populations. Precise measurement of psychological distress at the individual level can be achieved, but requires the deployment of all 30 items.
doi:10.1007/s00127-015-1157-4
PMCID: PMC4889635  PMID: 26687370
Computerized adaptive testing; Item response theory; Bifactor model; Measurement invariance; General Health Questionnaire
21.  Development of the CAT-ANX: A Computerized Adaptive Test for Anxiety 
The American journal of psychiatry  2014;171(2):187-194.
Objective
The authors developed a computerized adaptive test for anxiety that decreases patient and clinician burden and increases measurement precision.
Method
A total of 1,614 individuals with and without generalized anxiety disorder from a psychiatric clinic and community mental health center were recruited. The focus of the present study was the development of the Computerized Adaptive Testing–Anxiety Inventory (CAT-ANX). The Structured Clinical Interview for DSM-IV was used to obtain diagnostic classifications of generalized anxiety disorder and major depressive disorder.
Results
An average of 12 items per subject was required to achieve a 0.3 standard error in the anxiety severity estimate and maintain a correlation of 0.94 with the total 431-item test score. CAT-ANX scores were strongly related to the probability of a generalized anxiety disorder diagnosis. Using both the Computerized Adaptive Testing–-Depression Inventory and the CAT-ANX, comorbid major depressive disorder and generalized anxiety disorder can be accurately predicted.
Conclusions
Traditional measurement fixes the number of items but allows measurement uncertainty to vary. Computerized adaptive testing fixes measurement uncertainty and allows the number and content of items to vary, leading to a dramatic decrease in the number of items required for a fixed level of measurement uncertainty. Potential applications for inexpensive, efficient, and accurate screening of anxiety in primary care settings, clinical trials, psychiatric epidemiology, molecular genetics, children, and other cultures are discussed.
doi:10.1176/appi.ajp.2013.13020178
PMCID: PMC4052830  PMID: 23929270
22.  Comparing CESD-10, PHQ-9, and PROMIS Depression Instruments in Individuals with Multiple Sclerosis 
Rehabilitation psychology  2014;59(2):220-229.
Purpose
This study evaluated psychometric properties of the Patient Health Questionnaire-9 (PHQ-9), the Center for Epidemiological Studies Depression Scale-10 (CESD-10), and the eight-item PROMIS Depression Short Form (PROMIS-D-8; 8b short form) in a sample of individuals living with multiple sclerosis (MS).
Research Method
Data were collected by a self-reported mailed survey of a community sample of people living with MS (n=455). Factor structure, inter-item reliability, convergent/discriminant validity and assignment to categories of depression severity were examined.
Results
A one factor, confirmatory factor analytic model had adequate fit for all instruments. Scores on the depression scales were more highly correlated with one another than with scores on measures of pain, sleep disturbance, and fatigue. The CESD-10 categorized about 37% of participants as having significant depressive symptoms. At least moderate depression was indicated for 24% of participants by PHQ-9. PROMIS-D-8 identified 19% of participants as having at least moderate depressive symptoms and about 7% having at least moderately-severe depression. None of the examined scales had ceiling effects, but the PROMIS-D-8 had a floor effect.
Conclusions
Overall, scores on all three scales demonstrated essential unidimensionality and had acceptable inter-item reliability and convergent/discriminant validity. Researchers and clinicians can choose any of these scales to measure depressive symptoms in individuals living with MS. The PHQ-9 offers validated cut off scores for diagnosing clinical depression. The PROMIS-D-8 measure minimizes the impact of somatic features on the assessment of depression and allows for flexible administration, including Computerize Adaptive Testing (CAT). The CESD-10 measures two aspects of depression, depressed mood and lack of positive affect, while still providing an interpretable total score.
doi:10.1037/a0035919
PMCID: PMC4059037  PMID: 24661030
depression; multiple sclerosis; CESD-10; PHQ-9; PROMIS
23.  The Work Disability Functional Assessment Battery (WD-FAB): Feasibility and Psychometric Properties 
Objectives
To assess the feasibility and psychometric properties of eight scales covering two domains of the newly developed Work Disability Functional Assessment Battery (WD-FAB): physical function (PF) and behavioral health (BH) function.
Design
Cross-sectional.
Setting
Community.
Participants
Adults unable to work due to a physical (n=497) or mental (n=476) disability.
Interventions
None.
Main Outcome Measures
Each disability group responded to a survey consisting of the relevant WD-FAB scales and existing measures of established validity. The WD-FAB scales were evaluated with regard to data quality (score distribution; percent “I don’t know” responses), efficiency of administration (number of items required to achieve reliability criterion; time required to complete the scale) by computerized adaptive testing (CAT), and measurement accuracy as tested by person fit. Construct validity was assessed by examining both convergent and discriminant correlations between the WD-FAB scales and scores on same-domain and cross-domain established measures.
Results
Data quality was good and CAT efficiency was high across both WD-FAB domains. Measurement accuracy was very good for the PF scales; BH scales demonstrated more variability. Construct validity correlations, both convergent and divergent, between all WD-FAB scales and established measures were in the expected direction and range of magnitude.
Conclusions
The data quality, CAT efficacy, person fit and construct validity of the WD-FAB scales were well supported and suggest that the WD-FAB could be used to assess physical and behavioral health function related to work disability. Variation in scale performance suggests the need for future work on item replenishment and refinement, particularly regarding the Self-Efficacy scale.
doi:10.1016/j.apmr.2014.11.025
PMCID: PMC4762370  PMID: 25528263
Validation Studies; Disability Evaluation; US Social Security Administration; Outcomes Assessment; Psychometrics
24.  Electronic Quality of Life Assessment Using Computer-Adaptive Testing 
Background
Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult.
Objective
Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL.
Methods
We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met.
Results
The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items.
Conclusions
Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items.
doi:10.2196/jmir.6053
PMCID: PMC5065679  PMID: 27694100
25.  Development and psychometric properties of the client’s assessment of treatment scale for supported accommodation (CAT-SA) 
BMC Psychiatry  2016;16:43.
Background
Patient-Reported Outcome Measures (PROMs) are important for evaluating mental health services. Yet, no specific PROM exists for the large and diverse mental health supported accommodation sector. We aimed to produce and validate a PROM specifically for supported accommodation services, by adapting the Client’s Assessment of Treatment Scale (CAT) and assessing its psychometric properties in a large sample.
Methods
Focus groups with service users in the three main types of mental health supported accommodation services in the United Kingdom (residential care, supported housing and floating outreach) were conducted to adapt the contents of the original CAT items and assess the acceptability of the modified scale (CAT-SA). The CAT-SA was then administered in a survey to service users across England. Internal consistency was assessed using Cronbach’s alpha. Convergent validity was tested through correlations with subjective quality of life and satisfaction with accommodation, as measured by the Manchester Short Assessment of Quality of Life (MANSA).
Results
All seven original items of the CAT were regarded as relevant to appraisals of mental health supported accommodation services, with only slight modifications to the wording required. In the survey, data were obtained from 618 clients. The internal consistency of the CAT-SA items was 0.89. Mean CAT-SA scores were correlated with the specific accommodation item on the MANSA (rs = 0.37, p˂.001).
Conclusions
The content of the CAT-SA has relevance to service users living in mental health supported accommodation. The findings from our large survey show that the CAT-SA is acceptable across different types of supported accommodation and suggest good psychometric properties. The CAT-SA appears a valid and easy to use PROM for service users in mental health supported accommodation services.
Electronic supplementary material
The online version of this article (doi:10.1186/s12888-016-0755-3) contains supplementary material, which is available to authorized users.
doi:10.1186/s12888-016-0755-3
PMCID: PMC4766675  PMID: 26911904
Patient Reported Outcome; Supported Accommodation; Treatment Satisfaction; Mental Health

Results 1-25 (1137397)