|Home | About | Journals | Submit | Contact Us | Français|
To identify core symptoms that discriminate premenstrual syndrome (PMS) in prospective daily diary ratings and determine the association of these symptoms with functional impairment.
The study analyzed prospective daily symptom ratings and functional impairment data provided by 1081 women who requested PMS treatment at an academic medical center. The data were obtained before any treatment procedures. A random-split sample design provided separate developmental and validation datasets. Logistic regression was used to identify a reduced set of symptoms that best discriminated PMS. The results were validated in a separate dataset. Optimal cutoff points in the symptom scores were identified for clinical use.
Statistical modeling identified 6 symptoms that discriminated PMS and not PMS as well as 17 symptoms in daily diary ratings. The identified core symptoms included anxiety/tension, mood swings, aches, appetite/food cravings, cramps, and decreased interest in activities. The area under the curve (AUC) was 0.84 in both models. The sums of the premenstrual symptom scores also discriminated PMS and not PMS and correctly classified 84%–86% of the cases.
Six symptoms rated in daily diaries discriminate between PMS and not PMS among women seeking treatment and are significantly associated with functional impairment. The findings suggest that the burden of daily diaries to confirm PMS can be reduced to a smaller number of symptoms that distinguish the patients who meet this requirement. Results also support the concept that a clinical diagnosis of PMS can be developed around a core symptom group.
More than 20% of menstruating women experience premenstrual syndrome (PMS) to a degree that warrants clinical treatment,1,2 yet there is no widely accepted diagnosis of the disorder. There are ongoing efforts to develop uniform diagnostic criteria,3–6 but primary care clinicians who typically manage PMS patients are faced with assessing a broad array of nonspecific symptoms that have been associated with PMS. Diverse diagnostic guidelines and limited clinician time frequently result in a poor diagnosis, which in turn leads to inadequate or inappropriate treatment of a chronic problem that extends over women's reproductive years.7
The numerous symptoms that are linked with PMS are a major impediment to its diagnosis. Considerable evidence shows that many symptoms are entrained to the menstrual cycle,8 but their broad inclusion in diagnostic assessments produces noise that reduces diagnostic accuracy. Furthermore, the inclusion of many symptoms in daily diaries, which are the primary tool in diagnosing PMS, is a considerable burden to both clinicians and patients. However, it is not known which of the many symptoms that are associated with PMS best discriminate the likelihood of the disorder.
There are appreciable differences among the current diagnostic approaches. The guidelines for PMS offered by the American College of Obstetricians and Gynecologists (ACOG) list 6 affective and 4 somatic symptoms.9 At least 1 symptom must meet the defined cyclic pattern linked to the menstrual cycle, be confirmed by prospective daily ratings for one or more menstrual cycles, and be associated with identifiable dysfunction or impairment. Whether or not these criteria discriminate a clinically significant condition has not been demonstrated.
The criteria most frequently used in recent studies of premenstrual symptoms are those of the American Psychiatric Association (APA). These criteria define a severe form of PMS termed premenstrual dysphoric disorder (PMDD) and are described in the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV).10 PMDD requires 5 of 11 listed symptoms, including at least 1 mood symptom, that are severe premenstrually and remit after menses; severe functional impairment; symptoms not the result of another disorder; and prospective confirmation of the symptoms for at least two consecutive menstrual cycles. These stringent criteria identify a severe, dysphoric form of PMS and are met by approximately 5%–8% of reproductive-age women, a much smaller number than those who seek treatment for PMS.1,7 In contrast to the specified criteria for PMDD, the World Health Organizations (WHO) International Classification of Diseases (ICD-10) broadly classifies premenstrual tension syndrome (PMTS) as a gynecological disorder but lists no symptoms and no specific criteria to guide a diagnosis.11
The purpose of this study was to identify a small group of symptoms that discriminate between women who meet prospectively confirmed PMS from those who do not meet this requirement. The study participants were women who believed they had PMS and sought treatment for the symptoms. We hypothesized that a small number of symptoms could predict the likelihood of PMS. We also hypothesized that the identified core symptoms are strongly associated with impairment in family relationships, work, or social activities.
We reviewed the study records of all women who enrolled for PMS treatment in three clinical trials that were supported by the National Institutes of Health (NIH) at an academic medical center between 1998 and 2007.12–14 A total of 1400 women enrolled for PMS treatment. Of these women, 319 women did not have a complete daily diary; 1081 women completed a daily diary for at least 1 month in the untreated screen period and were included in the study. Comparisons of demographic and clinical characteristics between the study sample and the group with no daily diary showed no statistically significant differences. The study was approved by the Institutional Review Board of the University of Pennsylvania.
Characteristics of the participants included ages 18–45 years, regular menstrual cycles of 22–35 days for at least 6 months, and persistent premenstrual symptoms for at least 1 year. At enrollment in the screen period, the participants reported no serious medical problems or current mental disorders. Exclusions at enrollment included major axis 1 psychiatric diagnosis; alcohol or substance abuse within the past year; history of psychosis or bipolar disorder; current use of psychotropic medications or any current prescription, over-the-counter (OTC), herbal, or nonmedical therapies for PMS; pregnancy; breastfeeding; hysterectomy; symptomatic endometriosis; irregular menstrual cycles; and any serious or unstable medical illness.
The study sample included all women with a complete daily dairy for at least one untreated menstrual cycle (n=1081). A random-split sample design of the data provided a developmental dataset (n=541) and a validation dataset (n=540). Modeling was performed using the developmental dataset, and selected models were fit to the validation dataset in order to objectively evaluate model performance.
Power calculations before the study assumed that the prevalence of true PMS in the population of women who seek clinical treatment was approximately 37%. Given that we evaluated 17 predictors (Penn Daily Symptom Report [DSR] symptoms), we required a minimum of 170 cases of true PMS based on the rule of 10 events per candidate predictor to insure sufficient data for precise estimates of model specificity.15 Assuming that 50% of the women would meet criteria for PMS at enrollment and, of these, 25% would be false positive, the calculations indicated that 460 subjects were required in the developmental sample to provide statistical power of 90% with alpha at 0.05. Another 460 subjects were required for the validation sample, for a total of 920 subjects in the study.
PMS symptom scores for the analysis were obtained from the Penn DSR, a validated daily diary that lists 17 PMS symptoms.16 Each symptom was rated daily on a 5-point scale from 0 (none) to 4 (severe): irritability/anger, mood swings, depression, anxiety/tension, feeling out of control, feeling worthless/guilty, decreased interest in usual activities, poor coordination, insomnia, difficulty concentrating/confusion, fatigue, aches, headache, cramps, breast tenderness, swelling/bloating, food cravings/increased appetite.
Premenstrual scores were obtained for each menstrual cycle in the same manner as reported elsewhere.12–14,17,18 Each symptom score was summed for the 6 days before menses, and the symptom scores were summed for a total premenstrual score.12–14,19 Postmenstrual scores were calculated by summing each symptom score for cycle days 5–10 (day 1 was the first day of menstrual bleeding) and summing the symptom scores for a total postmenstrual score. Missing days in a symptom score were treated by carrying forward the data from the previous day. Only 11% of the sample had a missing value in the symptom ratings, and the majority of these were missing only 1 day. The second screen cycle DSR was selected for the analysis because it was the first complete and untreated menstrual cycle after enrollment in the study; the DSR in the first screen cycle was used if it was the only available diary.
Functional impairment was rated in each menstrual cycle on a 5-point scale (0–4: none, mild, moderate, a lot, severe) in the domains of work, family life, and social activities. Clinical and demographic descriptors were obtained from the self-report questionnaire completed at enrollment.
We divided the cases in the developmental sample into two groups, termed PMS and not PMS, to establish a PMS group for the analysis. Because there is no consensus definition of PMS, we employed our previously reported criteria for PMS that have demonstrated reliability and validity in clinical trials.12–15,20 These criteria were applied to the DSR scores in an untreated menstrual cycle as follows: a total premenstrual DSR score >80, a total postmenstrual score >40, a 50% or greater difference between the postmenstrual and premenstrual scores, and a score ≥2 (moderate to severe) on at least one functional impairment item.
All cases with a complete DSR (n=1081) were divided by a random-split to provide the developmental dataset (n=541) and a validation dataset (n=540). The cases were then identified as having or not having PMS as defined. In the developmental dataset, there were 353 women with PMS (65.2%) and 188 women who did meet the PMS criteria.
Logistic regression models were used to identify the symptoms that were independently associated with PMS. We first fit the full model of 17 DSR symptoms to provide a basis for comparison with a reduced set of core symptoms. To reduce the total number of symptoms in the 17-symptom model, a backward elimination strategy removed symptoms with large p values until the remaining symptoms had p values <0.10. To reduce the potential for overfitting the model, a bootstrap resampling procedure of the model selection process was conducted to reduce potential bias.21 It has been shown that this procedure also provides an effective measure of internal validation for predictive logistic models.22 The procedure was conducted by drawing 1000 samples with replacement from the development half of the original dataset. In each of these bootstrap samples, the same logistic regression and variable selection methods were repeated as described above. The symptoms with p<0.10 were identified. The percentage of times each symptom met the criteria to remain in the model was accumulated over the 1000 bootstrap analyses. The results show the 6 items that remained in the model over 50% of the times. The remaining 11 items remained in <50% of the bootstrap models. Further models were then assessed using standard model fit criteria and compared with the fit characteristics of the full 17-symptom model.
We also fit models with summed symptom scores to provide a single unweighted predictor score as a more clinically relevant approach. The models obtained from summed symptom scores were compared to a weighted summary based on logistic model predictions that were obtained from single symptom scores using standard model criteria. External validation was conducted. Models fit to the developmental data were applied to the validation dataset to assess the true performance of the derived prognostic models in the population. Model criteria were computed to determine the degree to which the model based on the developmental data discriminated cases of PMS in the validation data.
Optimal cutoff points for predicting the probability of PMS were computed from the summed symptom score models. The selection of the cutoff point was determined by the maximum percent of subjects correctly classified as having PMS in the developmental dataset. Given a particular cutoff score, sensitivity and specificity were evaluated. The same cutoff points were applied to the validation data, and validation sensitivity and specificity were compared to the corresponding values in the developmental data.
Demographic characteristics were compared between groups with chi-square tests. Symptom scores were initially examined using Pearson correlation coefficients to identify pairwise correlations and Cronbach's alpha coefficient to estimate individual item associations and the overall internal consistency of the symptom scores. Statistical tests were 2-sided, with p≤0.05 considered significant, with the exception of the bootstrapping validation procedures as described.
The mean age in the developmental sample was 33.2 years (standard deviation [SD] 6.86); 78% were employed; 83% had education or training beyond high school; 59% were white, 23% were African American, and 18% were of other or unknown racial status. Comparisons of the demographic variables between the PMS and not PMS groups showed no significant differences, with the exception of race, which was more likely to be white in the PMS group. Comparisons of the variables between the developmental dataset and the validation dataset showed no significant differences.
Table 1 shows the mean premenstrual scores for the 17 DSR symptoms. All symptom scores were significantly higher in the PMS group than in the not PMS group. Affective symptoms (mood swings, anxiety/tension, irritability, feeling out of control) were the most severe symptoms on average and had the largest mean difference between PMS and not PMS groups. Physical symptoms, such as breast tenderness, headaches, and cramps, had the lowest mean scores on average and the least difference between the PMS and not PMS groups. The symptoms had high internal consistency (Cronbach's alpha=0.95). Pairwise correlations of the symptoms were moderate to high. Among the highest correlations were mood swings, irritability, depression, and hopelessness; each was correlated with feeling out of control at r>0.70. Other high correlations included mood swings with anxiety and irritability, depression with hopelessness, and poor concentration with poor coordination, all at r>0.70.
Six of the 17 daily-recorded symptoms were statistically significant predictors of PMS after adjusting for all other symptoms in the model. The significant predictors of PMS were appetite/food cravings (p=0.003), decreased interest in activities (p=0.013), mood swings (p=0.016), cramps (p=0.010), aches (p=0.051), and increased anxiety/tension (p=0.055). The odds ratios (ORs) for these symptoms (Table 2) indicate that the likelihood of PMS increased 5%–8% with each unit increase in the symptom score after adjustment for all other variables in the model.
The internal validation procedures (bootstrap analysis) were applied to the 17-symptom model, and the same 6 symptoms were identified as statistically significant predictors of PMS. Table 2 shows the 6 significant symptoms as confirmed in the validation analysis: anxiety, aches, mood swings, food cravings, no interest in activities, and cramps (each at p<0.01). These 6 symptoms identified PMS as well as 17 symptoms, as indicated by the model criteria scores, which were nearly identical in the 17-symptom and 6-symptom models (Table 2).
Sadness/depression was not a significant predictor of PMS in either the 17-symptom or the 6-symptom model. However, because depression is a clinically important symptom, we examined a 7-symptom model that retained the depression symptom. The results of the 7-symptom model were nearly identical to the results for the 6-symptom model (Table 2), and inclusion of the depression symptom added nothing further to the discrimination of PMS.
Irritability was reported by most participants regardless of their PMS status and did not discriminate between PMS and not PMS (OR 1.10, confidence interval [CI] 0.94-1.08, p=0.83). Irritability was also highly correlated with mood swings (r=0.84) and anxiety/tension (r=0.84), symptoms that proved to be stronger independent predictors of PMS among these treatment-seeking women.
We then compared the performance of a summary score (the total premenstrual symptom score), rather than the logistic regression of single symptom score, as a more efficient approach for clinical practice. The results showed that the summary premenstrual score for 6 symptoms identified PMS as well as the logistic regression scores for single symptoms that are shown in Table 2 (area under the curve [AUC]=0.83). The summary premenstrual score for the 17 symptoms also performed as well as the 17-item logistic model for single symptoms (AUC=0.84). These results indicate that little information was lost by summing the individual symptom scores directly instead of using weights derived via logistic regression.
Figure 1 depicts the curves for the summary score derived from all 17 symptoms and the summary score derived from the 6 symptoms (the receiver operator characteristic, [ROC]). As Figure 1 shows, the discriminate ability of the 6-symptom score is nearly identical to that obtained using 17 symptoms (p=0.557).
Table 3 shows cutoff points in DSR scores that can be used to discriminate PMS in prospective symptom ratings. We reviewed the full range of the total premenstrual DSR scores and show selected cutoff points for the greatest number of cases correctly classified. Supporting the hypothesis of the study, the results show that classifications for 6 symptoms consistently perform as well as the classifications for 17 symptoms. We have used the cutoff point of 80 in the 17-symptom model in many PMS clinical trials.12–14,16 The present study shows that this cutoff point correctly classifies 86% of the cases, whereas 84% of the cases are correctly classified with a cutoff point of 33 in the 6-symptom model. Selecting a classification cutoff point balances the specificity and sensitivity of the measure and varies with the objectives of its use (higher specificity may underidentify PMS, and higher sensitivity may overidentify PMS). The cutoff points provided in Table 3 are in the approximate midrange of specificity for this measure.
The models created in the developmental dataset were fit to the validation dataset. The results indicate that these models are replicable and not simply overfitting the idiosyncrasies of these particular data. The nearly identical AUC and Brier scores in the two datasets indicate there was no degradation in discrimination of PMS. (Validation results are shown in Table 2 in italics [v]).
Approximately 95% of all participants reported some level of impairment in at least one of the assessed domains. PMS was reported to interfere most often in the family domain, followed by the social and work domains. Ratings of severe impairment (ratings of 3 or 4) were reported by 56% for the family domain, 51% for the social domain, and 47% for the work domain among all participants in the developmental dataset (Table 1). Estimations of the association of symptoms with impairment indicated that the likelihood of severe impairment was increased with symptoms approximately 2% with each unit increase in the summed 17-symptom scores (OR 1.02, CI 1.01-1.02, p<0.001) and approximately 3% with each unit increase in the summed 6-symptom model (OR 1.03, 95% CI 1.03-1.04, p<0.001).
Three major findings deserve emphasis. First, 6 symptoms discriminated PMS as well as 17 symptoms rated in daily diaries by women who were seeking treatment for PMS. These parsimonious results clearly indicate that a small number of prospectively rated symptoms can identify the likelihood of PMS in women who seek treatment. With >90% power to detect the core symptoms, the likelihood that the excluded symptoms were false negative (type II error) results is low. Identifying a small group of symptoms that discriminate PMS among women seeking treatment also supports the concept that a clinical diagnosis for PMS might be developed around a core symptom group.
Second, the symptoms that discriminated PMS represent three domains that are widely believed to describe the syndrome and provide further evidence that PMS is a multifactorial disorder that encompasses both emotional and physical symptoms. The strongest independent predictors of PMS were mood swings and anxiety/tension, both in the emotional symptom domain and long considered predominant PMS symptoms.23–25 A recent community-based survey of premenstrual symptoms reported by women in Europe and Latin America also reported that mood swings was one of the most prevalent and severe symptoms and the leading emotional symptom experienced by these women.23 In the behavioral domain, decreased interest in activities and appetite changes/food cravings were independent predictors of PMS; aches and cramps were the independent predictors of PMS in the physical domain.
Although irritability was among the most frequently reported and severe symptoms in this study, it was not a discriminator of PMS. Irritability has long been considered a cardinal PMS symptom and is among the most responsive symptoms to selective serotonin reuptake inhibitor (SSRI) treatment for the disorder.23,24,26 Nearly all women reported irritability, however, and it was also highly correlated with half of the other PMS symptoms, which precluded its ability to discriminate PMS. These findings clearly show that irritability is likely to be part of the condition, but it is a common symptom that does not specifically define PMS.
The depression symptom also did not discriminate PMS in these data. We further examined depression in a 7-symptom model, but its addition did not alter the results of the 6-symptom model and did not add to the prediction of PMS. This suggests that depressed mood does not have a primary role in pure PMS. Depressive symptoms are among the core symptoms of PMDD as listed in the DSM-IV, although whether depressed mood is a core component of PMDD is also not clearly demonstrated.
Although the present study was not designed to compare PMS and PMDD, observations indicated a large overlap in the daily symptom ratings: 47% (255 of 541) of the PMS group also met criteria for PMDD as defined in a single menstrual cycle. As expected, the total symptom scores were higher in the PMDD group (more symptoms were required). Mood swings and decreased interest were even more predominant in the PMDD group, whereas aches were slightly higher in the PMS group. Other studies are needed to determine if there are differences in the primary symptoms of PMS and PMDD.
Third, nearly all participants reported some level of impairment in at least one domain when they sought treatment, and impairment ratings were strongly associated with the DSR scores. A previous study that evaluated criteria to identify PMS indicated that the respondents experienced reduced work productivity and quality of life regardless of the severity criteria of PMS.5 The present findings demonstrate the increase in impairment with increasing symptom severity and add further support to the evidence that PMS is strongly associated with diminished functioning in relationships and the normal activities of daily life.5,27
The sensitivity analysis showed that use of 6 symptoms classified the PMS cases nearly identically to use of 17 symptoms and further supported the hypothesis that a small number of symptoms rated daily can discriminate PMS as well as a longer symptom list. The classification results provide a balance between specificity and sensitivity and allow the clinician or researcher to select more or less restrictive cutoff points that best meet the intended objectives in their use. The classifications in the present study appear to be strong, although we know of no other studies that have identified the specificity and sensitivity of symptom scores to discriminate either PMS or PMDD. Whether correct classifications can be increased to an even higher level than the 84%–86% achieved in this study remains an important question.
Several other limitations can be considered. There is no demonstrated gold standard definition of PMS. The criteria that were used to define PMS in this study have previously demonstrated reliability and validity, but the use of other criteria might yield different results.5 The data represent generally healthy women who seek medical treatment for premenstrual symptoms and may not encompass the entire heterogeneous PMS population, particularly women with other physical or psychiatric disorders that commonly have premenstrual exacerbations. The study identified a parsimonious number of symptoms that discriminate between PMS and not PMS that can be used to evaluate women who believe they have the disorder. However, the study was not designed to provide a validated daily diary or a diagnosis of PMS, and further studies that address these objectives are needed. It is also remains for other studies to identify the associations of the identified core symptoms with treatment response, which is the essential measure of their clinical utility.
The strengths of the study include prospective DSRs that were completed before any treatment interventions, appropriate statistical power, and rigorous analysis that demonstrated notable consistency in the findings and evidence that the results are not idiosyncratic to the study sample. The findings indicate that 6 symptoms can discriminate PMS as well as 17 symptoms when prospectively rated in daily diaries to confirm PMS. The findings suggest that the burden of daily diaries could be reduced by using a smaller number of symptoms and also suggest that a clinical diagnosis for PMS might be developed around a core symptom group. Further studies are needed to construct and validate a brief daily diary and to develop the criteria for a widely accepted diagnosis of PMS.
This work was supported by grants from the National Institutes of Health; the Eunice Kennedy Shriver National Institute of Child Health and Human Development: RO1 HD018633, and the National Cancer Institute: T32 CA93283.
During the last 3 years, E.W.F. received research support (issued to the University of Pennsylvania) from Wyeth, Xanodyne Pharmaceuticals, and Forest Research Institute and honoraria from or served as a consultant to Wyeth, Pherin Pharmaceuticals, Bayer Health Care, and Forest Research Institute. During the past 12 months, K.R. received honoraria and served as a consultant or on an advisory board to Pfizer, Inc., and received research grants (issued to the University of Pennsylvania) from Bristol-Myers Squibb, Epix Pharmaceuticals, Renaissance Pharmaceuticals, Inc. (PgxHealth LLC), Pamlab, Pfizer, Inc., and Wyeth. M.D.S., S.M.H., J.M.L., and H.L. have no competing financial interests.