|Home | About | Journals | Submit | Contact Us | Français|
Pain is a difficult outcome to measure due to its multifaceted and subjective nature. The need for selecting proper outcome measures is high because of the increasing demand for scientifically valid demonstrations of treatment efficacy. This article discusses some basic topics in the measurement of pain outcomes and addresses issues such as statistical versus clinical significance, daily home data collection, appropriate length of outcome measurement packets, and the possibility of objective pain measurements. This article also reviews some of the more commonly used tools for measuring pain and pain-related disability. By selecting the proper tools and employing them correctly, we can obtain highly reliable and valid measures of pain outcomes in research and clinical care.
Pain is a complex and subjective experience that poses a number of measurement challenges. However, in the current culture of evidence-based medicine, it is important that clinicians and researchers utilize sensitive and accurate pain outcome measures. Currently, there exists no valid and reliable method of objectively quantifying an individual’s experience of pain. Therefore, we rely mainly on self-report measures to determine the impact of pain. Despite the challenges that pain measurement presents, a number of tools and approaches can be employed to collect useful pain estimates. This article discusses some important considerations in selecting and utilizing pain outcome measures for clinical and research purposes. We also review some of the commonly used pain measurement tools, provide a few additional recommendations in preparing a pain measurement procedure, and discuss the future of objective pain measurement. For a more comprehensive look at pain measurement, we refer readers to the excellent 2001 text by Turk and Melzack, Handbook of Pain Assessment .
Outcome measures provide a metric by which to gauge changes in the experience of pain; however, human interpretation is always needed to determine if a meaningful change has occurred. Generally, a treatment should demonstrate a statistically significant effect and clinically significant effect . These two types of significance are complementary and answer two different questions. One type of significance cannot be inferred from the other. In a basic sense, statistical significance answers the question, “Is this a real effect?” and clinical significance answers the question, “Is this an important effect?”
Statistical and clinical significance are oftentimes confused when one misinterprets a P value as an indicator of a treatment’s strength. The P value is an extremely common parameter that is used to indicate how likely it is that the observed treatment effect occurred purely by chance, and does not truly indicate the treatment’s efficacy. If a statistical test showed drug A to be superior to drug B at reducing pain with P = 0.05, there is a 5% chance that the observed difference is not real, and would probably not be seen if the study was conducted again with new patients. At P = 0.0001, there is only a 0.01% chance that the results are erroneous, and we can be much more confident that the drug is having a true impact on the outcome. However, smaller P values say nothing about the actual strength of the effect. Many factors have a drastic impact on P values. For example, as sample size goes up, P values go down, even if the strength of the effect never changes. This phenomenon can lead very large studies (1000–2000 patients) to show significant statistical results even when the changes in outcomes are very small, and meaningless to patients and clinicians.
To circumvent this problem of interpretability, readers should rely more on the effect size. For clinical trials, effect size is generally expressed in clinically relevant terms, such as the reduction of pain scores that occurs when switching from placebo to the active drug. To determine clinical significance, the clinician or researcher must first choose a metric (eg, percent reduction of pain), and then choose a cutoff that indicates a clinically meaningful change (eg, 30% reduction of pain).
When deciding on a cutoff value for clinical significance, we must determine the minimal amount of change in pain that would be valuable and important to patients. Many different approaches to determining this minimal important clinical difference have been proposed. A well-researched cutoff method suggests that a 30% reduction of pain can be considered clinically significant . This level corresponds with a “much improved” or “very much improved” response from patients on a global impression of change, or 2 points on a 0 to 11 pain intensity numerical rating scale (these scales are discussed in more depth later in the article). Farrar et al.  offer several other conversions of this standard, and Kvien et al. [5•] offer other methods for determining clinical significance. For example, the PASS (Patient Acceptable Symptom State) method has been used to determine that, for many chronic pain conditions, a reduction in daily pain of 35 mm on a 100-mm visual analogue scale indicates a satisfactory result for the patient. A recent consensus statement by the IMMPACT (Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials) group provides additional recommendations on detecting clinically important changes [6••].
If a treatment is tested on a small number of patients, then obtaining statistical significance may be impossible, even if a large clinical effect is observed. One method for dealing with very small sample sizes involves running a crossover design, in which each patient receives all treatment conditions. The use of crossover designs over parallel designs (in which each patient receives one treatment condition) can reduce the required sample size by up to 90% . These required sample sizes may be further reduced by collecting the outcome measure multiple times at baseline, and within each treatment condition .
When daily measurements are used in patients with chronic pain, these assessments usually have to be self-reported in a natural environment. There are a number of challenges in collecting these real-world measures, such as the lack of control over measure completion and the poor control over the assessment environment and circumstances. Data collection methods, such as paper diaries or daily phone calls, are widely used but each has significant drawbacks. Paper diaries are prone to backfilling (leading to inaccurate retrospective guesses), and phone calls are work-intensive to study staff and may be disruptive to the patient. A newer approach to collecting home-based outcome measures involves the use of handheld computers. New units can be bought for under $100, and free software, such as the Experiential Sampling Program, is available for administering the measures . However, these units may be hard to operate by elderly or impaired patients. Another drawback of using handheld computers is the possibility that the units will be lost, stolen, or damaged. To avoid potential data loss, it is recommended that data be frequently downloaded to the study site’s computers, which may be done in person or via Internet-based transfer services. Wright  further discusses the important data considerations in biomedical research.
Because resources and time are always limited, we are forced to make decisions on which outcomes to include in our measurements. In some cases, a simple measure of pain intensity may be the most logical primary outcome variable. In other cases, a general indicator of work or social functioning may be more clinically relevant. Pain clinicians will recognize cases in which an individual is profoundly disabled by seemingly low pain intensity, and cases in which an individual maintains a productive and fulfilling lifestyle despite reporting a high degree of pain. Some interventions may have little impact on pain intensity scores, but may benefit mood, motivation, and functioning. Therefore, one of the most important decisions to make in testing a new treatment is determining what outcomes are most clinically relevant. We now review a few of the available pain outcome measures, which range from simple and narrowly defined, to large and multidimensional. Each has its proper place in measuring pain outcomes. We also refer readers to the IMMPACT recommendation on a core set of outcome measures .
In the busy clinical setting, pain measures must be simple, quick to administer, and easily understood by the patient. Unidimensional scales fill this role by providing fast (often one-item) measures of pain that can be administered multiple times with minimal administrative effort. One commonly used unidimensional tool is the numerical rating scale (NRS). Although variations exist, the instrument typically consists of scores from 0 to 10 (or 0–100), with the far left being described as “no pain” and the far right described as “worst pain imaginable.” The NRS has the advantage of being administered verbally, thus not requiring patient mobility.
As an alternative to the NRS, a visual analogue scale (VAS) may be used. The patient marks anywhere along a 10-cm line to indicate their current pain intensity, which can be measured in millimeters to yield a 101-point scale. To assist in the scoring process, slide-rule-like devices have been developed. The VAS provides a high degree of resolution and is probably the most sensitive single-item measure for clinical pain research. Another alternative is the verbal rating scale (VRS), which is sometimes used for individuals who have trouble translating their pain experience into a number value. The anchors are instead replaced by descriptors, such as no pain, mild pain, moderate pain, and severe pain. This type of measure has several statistical drawbacks and is usually used only when patient characteristics require it.
Attempts at capturing pain improvement more broadly can also be achieved with single-item measures. For example, the PGIC (Patient Global Impression of Change) scale provides a single, general estimate of improvement. The typical PGIC asks patients to rate their current status as: 1) very much improved, 2) much improved, 3) minimally improved, 4) no change, 5) minimally worse, 6) much worse, or 7) very much worse. This measure has the advantage of being applicable to a wide variety of conditions and treatments, but lacks sensitivity required for some statistical analyses.
In many situations, a simple, one-item instrument is not sufficient to truly capture pain or quality of life. Many comprehensive measures of pain exist. These instruments typically measure several dimensions of pain, with differing combinations of (among other things): pain intensity, quality, affect, interference with functioning, and effects on general quality of life. By assessing the pain experience in a more complex way, these scales may circumvent the commonly observed lack of association between pain intensity and disability.
The short-form McGill Pain Questionnaire (SF-MPQ) is a well-validated measure with extensive clinical research use . Patients rate their pain in sensory terms (eg, sharp or stabbing) and affective terms (eg, sickening or fearful), with 15 total descriptors. Each item is rated on a 4-point scale that ranges from none to severe. The SF-MPQ also has a single VAS item for pain intensity and a VRS for rating the overall pain experience.
The Brief Pain Inventory short form (BPI-SF) captures two broad pain domains: 1) the sensory intensity of pain, and 2) the degree to which pain interferes with different areas of life . The 17-item scale also captures pain location, pain medication use, and response to treatments.
The West Haven-Yale Multidimensional Pain Inventory (WHYMPI) is a comprehensive pain outcomes measure that contains 52 items and 12 subscales . Some of the subscales include perceived interference of pain in a variety of areas, response from significant others, pain severity, perceived life control, affect, and participation in various work, social, and personal activities. Items are assessed on a 7-point scale. The scale can also yield clinically useful types of pain patients, such as dysfunctional, interpersonally depressed, and adaptive copers.
The Treatment Outcomes of Pain Survey (TOPS) is one of the longest and most comprehensive instruments for patients with chronic pain . The survey is an extension of the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36), an instrument designed to measure quality of life and function in a variety of patient populations. The survey also contains items from the WHYMPI, BPI, the Oswestry Disability Questionnaire, and items that assess coping style, fear avoidance beliefs, substance abuse, patient satisfaction with treatment, and demographic variables. The full scale contains 120 items, and there is a 61-item follow-up version. The following scores are obtained from the TOPS: pain symptoms, functional limitations, perceived family/social disability, objective family/social disability, patient satisfaction, fear avoidance, passive coping, solicitous responses, work limitations, and life control. The questionnaire may be burdensome to some patients because of its length.
Behavioral scales are often used in noncommunicative patients for whom direct assessment of pain self-report is not possible. These tools often measure facial or bodily movements as proxies for pain. Li et al.  offer a critical review of the tools available for adult patients. Hummel and van Dijk  offer a similar review for nonverbal infants. These scales are clinically necessary in some cases, but are generally not acceptable as outcomes for clinical trial reporting.
There is no established maximum length for an outcome packet. In general, as a questionnaire packet increases in length, there is an increasing risk that the patient will lose interest in the scale. Scales placed later in the packet may have incomplete or unreliable data. An individual’s ability to persist through a questionnaire depends on a number of individual and environmental factors (eg, attention span, interest in the scale, dedication to the project, incentives, outside distracters, or item complexity). Conservatively, questionnaire packets should be able to be completed by the majority of individuals in under 25 minutes. These longer packets may also be combined with more frequently administered, single-item measures to provide a balance of depth of information and temporal resolution.
The field of pain management would benefit enormously from an objective, physiologic marker of pain . Several physiologic variables have been measured for this purpose, such as skin conductance  and heart rate . In general, however, these markers do not correlate strongly enough with pain to warrant their use as a surrogate measure of pain [21-23]. Pain can exist in the absence of changes in these measures, and these measures can fluctuate drastically with no change in pain. These peripheral measures indicate general autonomic activity, which can be influenced by many factors other than pain, such as other forms of arousal. Also, treatments may directly impact those physiologic variables, further reducing their reliability as a clean pain measure. Work still continues in this area, with tests of more sophisticated measurement approaches [24,25] or biomarkers of pain intensity .
Because pain is more than just the peripheral and spinal transmission of nociceptive information, many new attempts at objectively measuring pain have focused on the brain. These neurologic markers of pain attempt to encompass the large number of emotional, situational, and attentional factors that can accentuate or attenuate the pain experience. The pursuit of a neuroimaging approach to measuring pain has intensified with better technology and increases in spatial and temporal resolution. Several brain regions show pain-related activations, and some degree of pain intensity encoding has been described . Methods such as magnetoencephalography, functional MRI, and positron emission tomography are used to explore the supraspinal, neural correlates of pain. However, no neuroimaging technique has been established as a reliable method of measuring pain. The technology has not yet advanced to the point where we can accurately classify someone as being in pain based purely on a scan; we are further still from judging the amount of pain change based on changes in neural signals.
Even if a measure of acute pain is developed, it does not necessarily follow that such a measure could accurately measure chronic pain. The way an individual interprets the pain, or learns to cope with its unpredictability, can greatly moderate the way pain affects his or her life. Given the complex interplay between sensory, affective, and cognitive components of pain, it is unlikely that an objective measure will soon emerge that captures the experience of pain in a clinically useful way.
Many physical functioning and performance tests, such as range-of-motion, exist and have been used as a proxy for objective pain measurement . Examples of standardized performance/functioning tests for chronic pain include the following: the loaded forward-reach test for chronic back pain , timed “Up & Go” test for osteoarthritis , and grip strength for rheumatoid arthritis . In general, these performance tests only modestly predict self-reported pain, with correlations rarely exceeding 0.30 [32-36]. These results suggest that pain is just one component of physical performance, and other factors, such as fear of pain, may heavily impact performance scores [37,38]. Therefore, although clinic-based tests of functioning can complement self-reported pain measures in chronic conditions, they are not useful as a pain-report substitute.
Despite the difficulty inherent to measuring pain, there are a number of accepted tools for tracking pain-related treatment outcomes. The proper use of these tools can allow clinicians and researchers to demonstrate both statistically and clinically significant treatment effects. These instruments range from quick, one-item assessments of pain intensity, to long surveys that tap into multiple dimensions of the pain experience and overall functioning. Until more objective physiologic/neurologic measurement techniques are perfected, clinicians who study pain will rely on the careful use of established self-report pain measures.
No potential conflicts of interest relevant to this article have been reported.
Papers of particular interest, published recently, have been highlighted as:
• Of importance
•• Of major importance