Injury of the spine and back are classified as the most frequent cause of limited activity among people younger than 45 years [1
]. Approximately 10 percent of the adult population has neck pain at any one time [3
], and 80% of the population will experience low back pain (LBP) at some time in their lives [4
]. Five to 10 percent of the workforce is off work annually because of LBP. Indeed, LBP is second only to headache among the leading causes of pain. Approximately 80–90% of LBP is mechanical (non-organic musculoskeletal dysfunction) in origin [5
]. Patients with mechanical spinal pain often seek and receive spinal manipulation by chiropractic, osteopathic and allopathic clinicians, physical therapists or other health care professionals [6
Health care professionals have utilized spinal palpatory diagnostic procedures and manual manipulative treatment for several millennia to treat back injury and pain [7
]. Along with the history of illness and physical exam, examiners utilize specific spinal palpatory diagnostic tests in order to identify spinal neuro-musculoskeletal dysfunction. Spinal neuro-musculoskeletal dysfunction refers to an alteration of spinal joint position, motion characteristics and/or related palpable paraspinal soft tissue changes. The scientific committee of the International Federation of Manual Medicine has stated: "beneficial outcomes and effectiveness of spinal manipulative procedures rely on appropriate and skilled treatment that is based on an accurate diagnosis, which in turn depends upon the accuracy of the palpatory procedures used "[9
Spinal palpatory procedures have been described in journals [10
] and textbooks [13
]. Static palpation of anatomical landmarks for symmetry, palpation of spinal vertebral joints before, during and after active and passive motion tests, spinal and paraspinal soft tissue palpatory assessment for abnormalities or altered sensitivity are most common.
Several narrative reviews of the literature on the validity and reliability of spinal palpatory diagnostic procedures have been published [21
]. However, most reviews are discipline-specific despite the fact that similar spinal palpatory procedures are used across disciplines. Only two systematic reviews of spinal palpatory validity studies have been published [29
]. One study was a limited review of chiropractic literature on palpatory diagnostic procedures for the lumbar-pelvic spine [29
] and the other concentrates on validity studies at the sacroiliac joint [30
]. An annotated bibliography [31
] and a systematic review of the primary reliability research studies published between 1971 and 2001 are in progress.
Validity and reliability are concepts that are often used interchangeably, but the concepts are quite different. Validity is the accuracy of a measurement of the true state of a phenomenon [32
], while reliability measures the concordance, consistency or repeatability of outcomes [25
]. However, even if a measurement is consistent and reliable, it is not necessarily valid (e.g., an arrow may consistently hit the target area, but never hit the bulls-eye).
There are various types of validity studies. The concept of validity differs in qualitative and quantitative research [32
]. Though it can be argued that palpatory diagnostic procedures are subjective and therefore qualitative, investigators in the field believe they can measure a physiological phenomenon that can be detected by objective means. They maintain that studies addressing the validity of spinal palpatory diagnostic tests are quantitative studies. The types of quantitative validity studies can be distinguished as follows: face validity; construct validity, criterion validity and content validity.
Face validity is the extent to which a test appears
to measure what it is supposed to measure. In other words, whether the proposed test seems to provide a reasonable measure of the concept it is intended to measure. For example, spinal vertebral joint motion palpation tests, which aim to detect the presence of hypomobility, have face validity because they seem to be reasonable measures of the concept they are intended to measure [33
]. Face validity studies have been criticized for being subjective, intuitive and unsubstantiated. Troyanovich and Harrison [33
] pointed out that in spite of the common perception or belief that motion tests are valid and reliable for assessment of presence or absence of restricted vertebral motion, there was no evidence to support this concept. Thus, palpatory vertebral motion diagnostic tests are prime examples of tests accepted on face validity.
Construct validity is the extent to which a test identifies the concept or trait of that which is being measured. A construct is a hypothetical or conceptual idea that may be used to label or explain observed phenomenon [34
]. For example, taking a dysfunctional vertebral joint as the concept, a test demonstrating the ability to identify the presence or absence of that concept or its related components, is said to have construct validity. Feinstein describes construct validity as an appraisal of the effectiveness with which a measure does its job in describing an existing or established construct; i.e. does the measure behave the way one would predict on the basis of the concept it represents? For example, Jull et al [35
] compared cervical spinal static palpation to diagnostic nerve blocks with anesthesia. The construct is that tenderness upon provocative palpation is related to local nerve irritation and nerve conductivity. A local anesthetic nerve block of related spinal segments showed that the identified tender spots no longer elicited a pain response. Thus, they demonstrated that there is a high degree of correlation between the palpatory test that identified a tender spot and the ability of the anesthesia to reverse the results of the provocative test. Therefore, the pain provocative palpatory tests used were demonstrated to have high construct validity.
Construct validity, however, is an artificial framework that is not directly observable [27
]. To establish construct validity of a test or measure, the researcher must determine the extent to which the measure correlates with other measures designed to measure the same thing and whether the measure behaves as expected. Construct validity studies do not measure the same phenomena that palpatory procedures are designed to measure (i.e., resistance to digital pressure or motion), but similar phenomena that are believed to be related to the palpable phenomena. Many construct validity studies on diagnostic spinal palpatory tests compare a test's results to another measurement of abnormal physiology in the same region. Studies using thermography [36
], electromyography [37
], and coronary angiography [38
] fall into this category.
There are other examples of construct validity studies using instruments to measure skin temperature, electrical skin resistance and/or gross range of motion to discern a dysfunctional vertebral segment. These measurements are then compared to those obtained by another examiner who utilizes one or several palpatory procedures that assess resistance to joint motion or paraspinal soft tissue abnormalities to help to discern a dysfunctional vertebral segment. Or, one examiner uses pain provocation, and the other palpatory motion restriction sense to assess for a dysfunctional vertebral segment.
Criterion validity measures the extent to which an intervention allows a researcher to predict behavioral or pathological outcomes. Criterion validity studies, therefore, do not measure the phenomenon being palpated, but attempt to correlate the findings of a palpatory procedure (e.g.) with another measurable outcome like diagnosed visceral disease. For example, Beal [39
] and Tarr [40
] studied the ability of physicians using spinal palpatory procedures to identify, or predict, which patients had visceral disease related to the spinal findings of altered structure, motion and/or soft tissue.
Content validity is the extent to which a measure adequately and comprehensively measures what it claims to be measuring. Although Troyanovich and Harrison [41
] consider face and content validity as synonymous, there is an important distinction: content validity studies employ a reference standard.
A reference standard (also called "gold standard") is a measure accepted by consensus of content experts as the best available for determining the presence or absence of a particular phenomenon. When there is no perfect reference standard, as in the case of measurement of a patient's sense of pain provocation, i.e., pressing on a "tender point" or "trigger point", then pragmatic criteria can be used as a reference standard [42
]. The visual analog pain scale has been used as a pragmatic reference standard for palpatory pain provocation tests.
Ideally, content validity studies attempt to compare a test with a reference standard of the same phenomenon as that which is being palpated, i.e., palpable abnormalities in structure, motion and soft tissue. The Chiropractic Mercy Center Consensus Conference held in January 1993 identified and rated the value of various measurement instruments related to spinal joint functional assessment that could be used as reference standards [43
]. Based on their critical review of the literature, Troyanovich and Harrison [44
] suggested postural assessment instruments and radiographic measurement as valid, reliable and clinically useful objective measurement tools to help identify dysfunctional spinal vertebral joints.
Based on this brief review, it appears that construct and criterion validity studies do not measure the phenomenon being palpated. Instead they attempt to correlate the findings of a palpatory procedure with another measurable outcome. On the other hand, content validity studies measure the same phenomenon as that which is being palpated. Given how important it is to know whether the diagnostic tests used in palpatory exams are valid, we conducted a systematic review to assess the content validity of spinal palpatory tests used to identify spinal neuro-musculoskeletal dysfunction.