|Home | About | Journals | Submit | Contact Us | Français|
Our earlier study demonstrated that ICD-9 codes and other data from the administrative database (ADB) effectively identified somatization. To further develop this simple screening method, we hypothesized, in a new and different population, that ADB screening would identify somatizing patients by increasing numbers of visits, female gender, and greater percent of ICD-9 primary diagnosis codes in musculoskeletal, nervous, gastrointestinal, and ill-defined body systems; we labeled these codes as having “somatization potential.”
Using a prospective observational design in a staff model HMO, we evaluated 1364 patients from 18–65 years old who had 8 or more visits yearly in the two years before study. Clinician raters applied a reliable method of medical chart review to identify patients meeting criteria for somatization. We randomly selected 2/3 for the derivation set (N=901) for logistic regression to evaluate the contribution of potential ADB correlates (age, gender, all encounters, primary diagnosis codes [ICD-9], revenue codes, and charges) of a diagnosis of somatization. This prediction rule was then applied to the remaining 1/3 of subjects, the validation set (N=463).
Patients averaged 47.1 years, 12.8 visits per year, and 71.6% were female; 319 had somatization. Age, visits, and somatization potential were associated with clinician-rated somatization, with a c-statistic 0.719 [95% CI: 0.679, 0.760] in the derivation set and 0.679 [95% CI: 0.625, 0.734] in the validation set.
These data support our earlier findings that selected ICD-9 diagnoses in the ADB predict somatization, suggesting their potential in identifying a common, costly, and usually unrecognized problem. The demonstrated stability of ICD-9 codes for diagnosing somatization indicates that the next step in research be taken, as we outline here.
Patients with somatization are common, and they are defined as having physical symptoms with little or no documented basis in underlying organic disease; when organic disease exists, the symptoms are inconsistent with or out of proportion to it [1–3]. The prevalence of primary care patients with one or more somatizing symptoms ranges from 33% upwards in outpatient settings [1, 4], while the prevalence of patients with three of more concurrent somatic symptoms (multisomatoform disorder) is estimated to be 8% [5, 6]. Although the field has begun to define effective, cost-neutral primary care treatment, somatizing patients seldom receive it because they are not recognized by clinicians or health care systems [7–9]. The treatment they do receive often occasions safety and cost problems, as ill-advised lab testing and ‘trial treatments’ lead to iatrogenic complications and substantially increased costs [10–14].
To improve system-wide diagnosis/identification, our research team  and Barsky et al.  reported that administrative databases (ADB) can provide useful information in identifying somatizing patients. The performance of such ADB screening programs needs replication in different populations before they can be incorporated into system-wide efforts to improve identification (and treatment) of somatization.
From data used to identify subjects for a RCT -- to treat somatization in primary care  -- we report in this paper the test of our earlier ADB screener . The initial screening study demonstrated that selected ICD-9 primary diagnosis codes identified somatization: all codes in the musculoskeletal, nervous, gastrointestinal, and ill-defined body systems, which we labeled as having “somatization potential.” Based on results from this study , we hypothesized in the study reported here that greater somatization potential would identify somatizing patients in addition to female gender and greater health care utilization. We defined somatization as one or more unexplained somatic symptoms identified by clinician raters to test the screening method’s capacity to identify patients with any somatization. The validation of ADB screeners against Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) , somatoform definitions (full and abridged) was avoided because DSM-IV diagnoses miss >75% of somatizing primary care patients with high utilization and significant impairment [18, 19].
We employed a prospective observational design to study patients in a large HMO in Lansing, MI and obtained the Institutional Review Board (IRB) approvals from the HMO and from Michigan State University to conduct the study.
From all the plan’s enrolled patients on the ADB (constructed for clinical and administrative purposes only), we selected (in May 2000) all patients 18–65 years old who had at least 8 visits per year for each of the two years prior to the study. We selected this high-utilizing population because somatization is most prevalent in such populations , and they also are the logical targets for cost-effective intervention. We did not exclude subjects with ICD-9 codes for depression and anxiety because such codes are erratically recorded, if at all, in primary care. We selected patients who were members of the same health care system for the previous two years to generate a comprehensive list of diagnoses, recognizing that this approach precluded somatization identification in new patients and patients using multiple health care systems. Visits were defined as all outpatient clinical encounters with physicians, nurse practitioners, or physician assistants. Visits to nurses were not included; e.g., for blood pressure checks or allergy shots.
Gender – Gender was collected from the ADB and coded with males as the referent.
Age – Age was collected from the ADB and coded as a continuous variable.
Number of Visits – The total number of visits to all physician, nurse practitioner, or physician assistant providers across the two years was collected from the ADB and coded as a continuous variable.
Somatization Potential – Somatization potential was derived from the ADB by examining ICD-9 primary diagnosis codes during the preceding year in the following ranges: 320–389 (nervous system); 520–579 (gastrointestinal system); 710–739 (musculoskeletal system); and 780–799 (ill-defined body systems), as detailed in Appendix A. Somatization potential was measured by the percent of all diagnoses in the above ranges, capturing both organic and potentially somatization diagnoses. Post hoc, we explored other definitions of somatization potential and could demonstrate no benefit over our a priori definition above; e.g., using codes from additional body systems added nothing, likely due to the greater proportions of organic diseases in these systems. Definitions and data for these exploratory evaluations are available from the authors on request. Post hoc, we also removed all ICD-9 codes referring to depression or anxiety in testing somatization potential.
Only predictors available on the ADB were used in this study.
In contrast to most studies, which use structured patient interviews, this study used a reliable, clinician-conducted method for rating patients’ medical records to identify somatization symptoms. Five clinicians (3 senior internal medicine residents, one primary care nurse practitioner faculty, one general medicine faculty) were trained to rate these patients’ medical charts. All patient charts were rated, but each clinician rated randomly selected samples of the paper charts of the high-utilizers identified from the ADB to establish criterion standard ratings of somatization for this study . To offset diagnostic and suspicion biases , the raters were blinded to the hypothesis and predictors, and they did not use gender, age, number of visits, or specific symptom complexes in their diagnostic evaluations. Instead, the patient’s medical record was evaluated entirely for laboratory and other objective data indicating the presence or absence of organic diseases. Using a reliable procedure , raters rated visits to any primary care provider in the clinic (physician, physician assistant, nurse practitioner) over a one-year period; all data in the paper medical record, including laboratory and non-primary care visits, were available to inform ratings. Applying a scoring algorithm to these ratings, we identified patients whose primary problem was rated as somatization, defined for this study (to be most comprehensive, guided by the least stringent definition in DSM-IV) as no documented organic disease to explain one or more symptoms of at least 6 months duration . We found that patients classified as somatization had at least 50% of primary care visits in the previous year designated as having physical symptoms that could not be explained by organic disease. Symptom syndromes identified in charts by patients’ physicians (e.g., irritable bowel syndrome, fibromyalgia, chronic fatigue syndrome), were evaluated by our raters according to the above criteria rather than interpreting the diagnosis given in the chart as evidence of medically unexplained symptoms. Inter-rater agreement on characterizing symptoms as documented organic, documented somatization, or undocumented ranged from 92% – 96% .
A logistic regression model for somatization status was developed on a randomly selected derivation dataset of a 2/3 sample of patients (N=901). The remaining 1/3 of the sample of 463 patients served as a validation dataset. Patient characteristics in the derivation and validation datasets were statistically equivalent. We tested age, gender, past visits, and somatization potential as predictors of somatization status in the derivation sample. From the final model, we estimated the c-statistic, or Area Under the Curve, from an empirical receiver-operator curve (ROC) analysis. The c-statistic measures the ability of the model to discriminate somatization from non-somatization. A c-statistic of 0.68 or greater was deemed acceptable . Sensitivity, specificity, and positive/negative predictive values were computed from the final model in the derivation data. The final model was prospectively validated in the validation dataset and the c-statistic was reported as well. We also tested the goodness-of-fit of the data to the estimated model by use of the Hosmer-Lemeshow test .
The parent RCT  initially identified 1646 patients and eliminated 246 subjects because they had obvious organic diseases. We fully rated the remaining 1400 paper medical records, and eliminated 36 additional subjects because they had incomplete medical records. The remaining 1364 patients are the subjects of this research. As Table 1 displays, patients averaged 47.1 years of age, 12.8 visits per year, and 71.6% were female; 319/1646 (19.4% prevalence) of these high utilizers met criterion standard for somatization. Somatizers differed from non-somatizers in age (p<.003), gender (p<.002), total visits (p<.001), and somatization potential (p<.001).
Table 2 shows the percent of all patients who had somatization potential diagnoses in the four main body systems (left side). For example, 21.2% of patients had gastrointestinal system diagnoses (n=289), and somatizing patients had a higher proportion with these diagnoses as compared with nonsomatizing patients (25.7% versus 19.8%, p-value=0.02). Table 2 also shows (right side) the mean percent of the somatization potential diagnoses among all diagnoses. For example, among patients with GI diagnoses, the mean percent of these diagnoses of the total diagnoses was 18.9 percent among somatizers and 19.1 percent among nonsomatizers (p-value=0.94). The mean percent of musculoskeletal system diagnoses and ill-defined body system diagnoses among all diagnoses were significantly different between the two groups (p-value=0.002 and 0.004, respectively).
A randomly selected, two-thirds sample of 901 patients formed the derivation set for constructing a logistic model for assessing the correlates of somatization. The remaining patients had statistically similar characteristics to the derivation sample patients. The logistic model using information from the ADB contained age, age squared, gender, the number of total visits, and somatization potential (Table 3). There were no significant interactions. The probability of somatization π is estimated by:
The c-statistic was 0.719 [95% CI: 0.679, 0.760]. In this model, the higher order term for age was statistically significant (p<0.01), albeit the magnitude was small. It captures an important feature of the relationship between somatizing and age, i.e., the probability of somatizing first increased by age, peaked at age 38 and then decreased by age.
When we explored restricting the dataset to just subjects from 20–55 years of age, to better match the 2001 dataset, the c-statistic was the same (0.72) and, as expected, age and age squared were no longer statistically significant in the prediction equation (p-value= 09 and p-value=.06, respectively); gender still was not significant. This age restriction does not make the new sample comparable on age distribution and the association of age and somatization.
When we computed the c-statistic using the estimated parameters in Model (2) in the validation dataset, we found a smaller c-statistic of 0.679 [95% CI: 0.625, 0.734], as expected. Applying Model (1) to the validation dataset, we found that the previously published model had a very similar predictive power (c-statistic is 0.70). Applying Model (2) to the original dataset also gave a similar c-statistic of 0.76. Sensitivity and specificity estimates based on Model (2) in the derivation dataset are presented in Table 4. To illustrate, using a cutoff of 0.3 in Model (2) means that a patient is predicted to be a somatizer if the model gave an estimated probability π ≥ 0.3. In this case, the sensitivity and specificity in Model (2) were 46.5% and 82.5% respectively. The corresponding positive predictive value and negative predictive value were 38.9% and 86.5%, respectively, based on a prevalence of 19.4%.
In a post hoc analysis, we dropped all patients with a primary ICD-9 code indicating depression or anxiety prior to the enrollment into the study. In the derivation dataset, a total of 103 patients had a primary diagnosis of depression or anxiety. We re-estimated the model in equation (2) and the resulting c-statistic was 0.706 and all estimates were similar to the estimates in Table 3.
In a cohort where 19.4% of patients met reliable clinician rater criteria for somatization, age, number of visits, and somatization potential were associated with clinician-rated somatization, with a c-statistic 0.72 in the derivation set and 0.68 in the validation set. C-statistics from 0.68 and higher have been considered to have good discriminative ability .
There is encouraging clinical comparability between the prediction equations in the current and previous study. As in the previous study, the likelihood of somatization increases with increasing percentages of all ICD-9 diagnosis codes in the nervous (320–389), gastrointestinal (520–579), musculoskeletal (710–739), and ill-defined (780–799) body systems – those with “somatization potential.” We suspect the greater stability of the current prediction rule (c-statistic drop of 0.04 between the derivation and validation set) compared to the previous study (drop of 0.12 between the derivation and validation set) reflects the more reliable clinician rating of our criterion definition of somatization in the current study.
There are sociodemographic differences in the prediction equations between the current and previous study. The new inclusion of age squared and the exclusion of female gender in the current prediction rule suggest that the research team has not determined the sociodemographic variables and their associated weights in an optimal prediction equation for use across health care systems. This task will require further research in sociodemographically heterogeneous populations. In addition to providing generalizable estimates of sociodemographic predictors, such research can also rigorously test the key assumption that diversely trained clinicians practicing across a range of health care systems comparably code potential somatization and other diagnoses in their ADB diagnoses.
There are other potential limitations for ADB screening. We studied a select, high-utilizing group who are likely to be more severe than many somatizing patients in primary care and, therefore, our data do not apply to screening less severe somatization . Nevertheless, our screening does identify those primary care patients likely to be most in need of treatment.
At the other end of the severity spectrum, the present study population differs from populations identified from structured interviews for diagnosing DSM-IV somatoform disorders . Of the 206 somatization subjects identified for the RCT from the population reported here, another report demonstrated that 124 (60.2%) had a nonsomatoform DSM-IV diagnosis of any type, mostly anxiety and depression . But, only 9 (4.4%) had any full DSM-IV somatoform diagnosis, and only 39 (18.9%) had abridged somatization disorder; i.e., the remaining 158 (76.7%) had neither a full nor a minor DSM-IV somatoform diagnosis . Therefore, this psychologically distressed, high-utilizing primary care somatizing population we report does not resemble a population with DSM-IV somatoform disorders and more closely reflects DSM-IV anxiety and depression patients. This is consistent with other findings that most primary care depression and anxiety patients present, often exclusively, with medically unexplained symptoms [23–31]. We suggest that DSM-IV somatoform diagnoses reflect the even more severe disease seen in specialty settings, and that they are far less applicable in primary care [18, 19]. We believe the population reported here represents a typical high-utilizing primary care population with somatization of the type for which one would most beneficially screen – and target for treatment. Screening for major or minor DSM-IV disorders would miss over 75% of psychologically distressed, high-utilizing primary care patients with unexplained symptoms .
Further, we required continuous enrollment for 2 years in one system, which would preclude ADB screening in patients new to the system and/or using multiple systems. We also note the potential limitation of retrospective chart review to identify the criterion standard for somatization. However reliable our raters may have been, the reliability and validity of the chart itself depends on how aggressively physicians sought organic diseases, how completely they actually recorded their findings and patients’ symptoms, how consistent they were over time, and how similar they were to other physicians in recording their behaviors . Nevertheless, our chart rating procedure does overcome many shortcomings of reliability and validity resulting from using structured clinical interviews to identify DSM-IV somatization [18, 19]. The problems with present definitions of somatization (ICD-9 as well as DSM-IV) and recommended changes presently are being vigorously debated [2, 18, 33].
At this point, used alone, our results are insufficient to support wide scale screening. For example, from Table 4, an HMO might want to screen at the 0.30 cutoff to identify approximately one-half (46.5%) of their somatization patients. But, while the positive predictive value doubles from 19.4% (prevalence; pre-test probability) to 38.9%, this posttest probability is not sufficient for wide scale screening, identifying more false than true positives.
It is important to note that the cut-off point for ADB screening can be adjusted to be highly sensitive (e.g., 90%), which makes this approach an ideal first-stage screener identifying at-risk somatizers. To more accurately identify just somatization cases, however, will require second-stage screening. Rost et al. recently identified a modification of the 15 question Patient Health Questionnaire (PHQ-15) , called the PHQ-15-R [revised]. It has a sensitivity and specificity for identifying somatization of .99 and .98, respectively . As a second stage of screening, applying the PHQ-15-R to the ADB screen-positives (first stage) would remove nearly all false positives and retain almost all somatizers; e.g., at the 0.30 cut point, two-stage screening would identify nearly 1/2 of all somatizing patients and include almost no false positives. These subjects could then be referred for treatment of somatization. While the PHQ-15-R could be administered to all subjects in the first place, the need for a questionnaire would impede simple, inexpensive population-based screening. Rigorous testing of this 2-stage screening will first be required, however, to demonstrate satisfactory positive predictive values. Two-stage screening could then be used by any system that employs the ICD-9 diagnosis codes required for first stage screening.
We conclude that the somatization potential of primary diagnosis ICD-9 codes from the ADB, combined with utilization characteristics, can help identify somatizing patients. We have corroborated this finding in a new data set of representative primary care patients using a reliable criterion standard definition of somatization. To most effectively screen for somatization on a population basis, however, will require combining administrative database screening with more precise second-stage screening – which should be the next phase of study.
This work was funded in part by NIMH grant MH50799 and AHRQ grant HS 14206.
Body systems above numbered 6, 9, 13, and 16 are the four systems with somatization potential.