On March 14, 2002, METRIC researchers convened 17 experts in Houston to debate some difficult measurement topics. The expert panel consisted of seven national measurement experts (Drs. Lee Sechrest, Jack Clark, Robert DeVellis, Jay Magaziner, Colleen McHorney, Evelyn Perloff, and Stephen Zyzanski). In addition the panel included 11 METRIC measurement experts (Drs. Carol Ashton, Karon Cook, Marvella Ford, P. Adam Kelly, Robert Morgan, Kimberly O'Malley, A. Lynn Snow, Paul Swank, Nelda Wray, and Mary York).
The measurement topics covered at that meeting were ones we had found to present significant problems for clinicians and researchers in their efforts to improve public health. The ensuing discussions by the expert panel led to the development of the articles included in this supplement. We believe that measurement is the foundation on which health decisions are made, both inside and outside the VA health care system, yet clinicians and researchers are frequently not given training that enables them to address difficult measurement issues. The articles in this supplement are relevant not only for measurement issues involving the VA system and the veteran population, but also pertain to health services research in general. These issues fall into three major categories:
- (1) Measurement validity within a specific context or setting;
- (2) Identifying populations of interest and appreciating measurement challenges associated with these populations; and
- (3) Consideration of alternative methodologies to measure the same construct.
Measurement Validity within a Specific Context or Setting
The first three articles discuss the importance of, and barriers to, ensuring the validity of findings. The goal of the first manuscript, “Validity of Measures Is No Simple Matter,” by Lee Sechrest, is to promote a better understanding of the nature of measurement, the special problems posed by measurement in the social sciences, and the inevitable limitations on inferences in science (so that results of any sort are not overinterpreted). Sechrest proposes that judgments about degree of measurement validity can be guided by careful consideration of the measurement process, measurement context, and purpose of the measurement. The manuscript highlights the difficulty of measurement using blood pressure as an example. Suggestions for the promotion of good measurement practice are offered.
The second manuscript, “Integrating Validity Theory with Use of Measurement Instruments in Clinical Settings,” by Kelly, O'Malley, Kallen, and Ford, challenges the rationales investigators often give for their decisions to use specific measurement instruments in clinical settings. The authors offer a step-by-step, three-level decision rubric that focuses on key considerations in assessing validity. At each level, specific questions are posed and solutions grounded in validity theory are suggested. This manuscript will assist investigators in clinical settings organize and focus evaluation of, and justification for, using specific measurement instruments.
ICD codes are now used in many applications, including research, reimbursement, identification of medical errors, and denoting causes of death on death certificates. Because ICD-9-CM and ICD-10-CM codes are widely applied to “measure” diagnoses, their accuracy is of considerable interest. Understanding sources of error in the application of these codes is critical to the evaluation of their usefulness and limitations. In the third article, “Measuring Diagnoses: ICD Code Accuracy,” O'Malley, Cook, Price, Raiford Wildes, Hurdle, and Ashton conceptualize diagnostic coding as a measurement problem. The authors summarize the process for assigning diagnostic codes, identify sources of errors in the process, summarize research related to code accuracy, and review methods for quantifying the accuracy of diagnostic codes. By understanding potential sources of errors in the process of assigning diagnostic codes, code users (i.e., researchers, clinicians, payers, etc.) can make better decisions about the usefulness of the codes in various applications.
Identifying Populations of Interest and Appreciating Measurement Challenges Associated with these Populations
The next two articles address issues involved in identifying populations of interest. The fourth manuscript, “Measurement Issues in Health Disparities Research,” by Ramírez, Ford, Stewart, and Teresi provides an overview of measurement instruments in diverse populations. Members of minority groups have higher rates of morbidity and mortality than their nonminority counterparts for almost all categories of disease. Racial and ethnic health disparities may be because of comorbidity, access to care, attitudes and perceptions, or disease etiology. However, in order to conduct comparative studies in a meaningful manner, investigators must face the challenge of addressing cross-cultural measurement equivalence, including issues of validity and reliability.
In turn, valid and reliable measures of race and ethnicity are needed to ensure accurate assessment of disease prevalence and incidence, and accounts of appropriate health services utilization, in different population groups. Thus, the purpose of the fifth manuscript, “Conceptualizing and Categorizing Race and Ethnicity in Health Services Research,” by Ford and Kelly, is threefold. First, this manuscript provides an overview of different methods currently employed to assess the constructs of race and ethnicity. Second, the authors illustrate consistent standards for measuring these constructs. The authors conclude with suggestions for measurement methods that would improve future research.
Consideration of Alternative Methodologies to Measure the Same Construct
The final set of articles examines the properties and use of two important measurement methodologies. In “Proxies and Other External Raters: Methodological Considerations,” Snow, Cook, Lin, Morgan, and Magaziner review past research on proxy reports and examine ways to increase the reliability and validity of these types of reports. Many of the constructs health services researchers attempt to measure are those that have no single objective gold standard (e.g., quality of life, pain). Thus, the accrual of validity evidence for a health services research assessment instrument is often a daunting task. The difficulty of this task is increased further when it is not possible to obtain a self-report from the patient, or when the self-report is suspect. Thus, the use of reports from external raters (e.g., family members, clinicians) is a necessary strategy for establishing evidence of validity. The authors differentiate between external rater reports gathered for the purpose of substituting for self-report and those gathered to supplement the self-report. The authors explain why appropriate use of externally rated data requires careful consideration of the nature of the data and how it will be analyzed and interpreted.
In the seventh and final article, “Dynamic Assessment of Health Outcomes: Time to Let the CAT Out of the Bag?” Cook, O'Malley, and Roddey discuss CAT in the assessment of patient reported outcomes. CAT-based measures are “dynamic” in that the set of items presented to a patient is individualized based on continually updated estimations of the patient's level of the trait being measured. This tailored approach to assessment returns increased measurement precision without increased response burden. The article by Cook, O'Malley, and Roddey describes what CAT is (and is not) and discusses its promise and its problems.