|Home | About | Journals | Submit | Contact Us | Français|
There are many studies based on self-reported menstrual cycle length, yet little is known about the validity of this measure. The authors used data collected in 1990 from 352 women born in Chicago, Illinois, aged 37–39 years. Women reported their usual cycle length and behavioral and reproductive characteristics at study enrollment and then completed daily menstrual diaries for up to 6 months. The authors compared this observed cycle length (geometric mean) with the reported length by using kappa coefficients. To assess systematic effects, they performed linear regression of the difference between reported and observed cycle length. Agreement between observed and reported cycle length was moderate. The crude overall kappa coefficient was 0.33; the kappa adjusted for within-woman sampling variability was 0.45 (95% confidence interval: 0.36, 0.55). On average, women overestimated their cycle length by 0.7 days (95% confidence interval: 0.3, 1.0). Reporting by sexually active women and women with a history of infertility was more accurate. Parity, body mass index, prior medical evaluation for irregular cycles, and exercise were all associated with systematic reporting differences. Studies that rely on self-reported cycle length could be prone to artifactual findings because of systematic covariate effects on reporting.
Menstrual cycle length is a noninvasive clinical marker of reproductive function (1). Additionally, it has been used to assess the reproductive effects of environmental and occupational exposures such as organic solvents (2, 3), organochlorines (4–7), and pesticides (8). Menstrual cycle length has also been investigated as a predictor of health outcomes including breast cancer (9, 10) and cardiovascular disease risk factors (11). Several of these studies assessed cycle length through the participant's self-report. This method of data collection is susceptible to misclassification and systematic error. Prospective records of menstrual cycle length can be more accurate, but they require much more intensive participation by respondents, with daily recording of bleeding information for an extended period of time. A balance is needed between data accuracy and participant burden. In striking this balance, it would be helpful to know how well self-reported cycle length corresponds to prospectively collected menstrual cycle length.
We carried out an analysis to describe the accuracy of menstrual cycle length reporting. We sought to identify demographic, behavioral, and reproductive characteristics associated with accuracy and any characteristics that have systematic effects on the reporting of cycle length.
The study population was recruited from the daughters of women who participated in a randomized trial of diethylstilbestrol (DES) treatment during pregnancy in Chicago, Illinois, from 1950 to 1952. Enrolled in this trial were 2,162 women, 516 of whom were excluded because of loss of pregnancy, relocation out of the area, or failure to comply with the treatment regimen. The remaining 1,646 women gave birth to 805 females. Of these daughters, 542 (67 percent) completed a telephone interview in early 1990, which collected information on demographic, anthropometric, medical, reproductive, and menstrual characteristics. Eligibility for a prospective study of menstrual cycles was determined during this interview. Women were ineligible for the prospective study of menstrual cycle function if they were currently pregnant, breastfeeding, or using oral contraceptives or other exogenous hormones or had undergone a hysterectomy. Of 453 eligible women, 432 (95 percent) agreed to participate.
Those who agreed were sent a prospective menstrual diary divided into weeks, and each week's data were mailed back at the end of the week. Participation in the diary part of the study lasted for 6 months or until November 25, 1990, whichever occurred first. Fifty women were excluded from the study because they did not return enough cards to determine the length of at least one full menstrual cycle. In addition, data were excluded for women who became ineligible before contributing one menstrual cycle (n = 16), had no bleeding during their participation (n = 2), had only spotting (n = 2), or had more days of bleeding than no bleeding (n = 2), leaving 360 women in the diary study (79 percent of eligible women).
In the telephone interview, the women were asked, “How long is your menstrual cycle, on average? In other words, how many days are there from the first day of one menstrual period to the first day of the next period?” Five women did not answer this question and were excluded. One woman reported a usual cycle length of 120 days, which was almost three times her observed cycle length; we assumed this report was an error, and she was excluded.
We defined menstrual cycle length in the prospective study as the number of days from the first day of one menses up to the first day of the next. The beginning of a menses was the first of 2 consecutive days of bleeding, at least one of which had to be more intense than spotting. It should be noted that whether or not a premenstrual day of spotting is included as menses should not bias the estimate of mean cycle length (either the reported mean or the observed prospective mean). Inclusion of an initial spotting day would lengthen one cycle by a day and shorten the preceding cycle by a day, whereas excluding a spotting day would shorten one cycle and lengthen the other; in either scenario, the mean cycle length is unchanged. We required the first day of menses to be preceded by at least 3 days without bleeding; if information on any of the preceding 3 days was missing, day 1 for that cycle could not be calculated. Women began the daily diary study upon receipt of the mailed diary, which could have occurred at any time during the menstrual cycle. Women were asked to record their last menstrual period at the time of diary receipt. Thus, the length of the first cycle in our study was calculated as the time from a woman's reported last menstrual period until the first menses in the prospective study. We investigated whether these initial cycles were comparable in length to other prospective cycle lengths by computing a median cycle length for each woman and assessing how close the first cycle was to the median.
If 7 or more consecutive days of data were missing starting at least 15 days or more after the first day of the last menses, then that cycle was considered missing. Otherwise, these days were presumed to be nonbleeding days.
On the basis of these criteria, two additional women did not contribute at least one identifiable menstrual cycle. Thus, 352 women remained in our analyses.
Data were collected on demographic factors, menstrual and reproductive history, and lifestyle factors during the telephone interview. The perceived stress level of the participant was assessed through the short version of the Cohen perceived stress scale (12): “During the past month, how often have you felt that things were going your way, Never, Almost Never, Sometimes, Fairly Often or Very Often?”; “During the past month, how often have you felt confident about your ability to handle your personal problems?”; “During the past month, how often have you felt difficulties were piling up so high that you could not overcome them?”, “During the past month, how often have you felt that you were unable to control the important things in your life?” These questions were combined to yield a summary measure ranging from 0 to 16. We classified this scale for analysis into low (0–2), medium (>2–6), and high categories (>6).
For each woman, we averaged the natural logarithms of the observed menstrual cycle lengths and exponentiated this average to obtain the geometric mean. We assessed agreement between self-reported menstrual cycle length and observed geometric mean menstrual cycle length through weighted kappa coefficients. We divided cycle length into 14 categories: <23.5 days, 1-day increments from 23.5 to <35.5, and ≥35.5.
The difference between the reported cycle length and the observed geometric mean cycle length (in days) was calculated for each woman. To identify factors associated with systematic under- or overreporting of cycle length, this difference was modeled with linear regression including the inverse square of the standard error of the mean for each woman as the weight. The 10 women for whom only one cycle of prospective data was available were removed from the analysis. Four women contributed two cycles of exactly the same length. For each of those women, we treated her standard deviation as 1 to avoid infinite weights. To build the regression model, all covariates were assessed as individual predictors. Those that were significant at p ≤ 0.2 were entered into the model simultaneously. We then built a predictive model by backward elimination. Those variables that remained important (p < 0.1) were retained in the final model.
To assess the effects of sampling variability on accuracy, we performed a simulation. For each woman, we drew a random sample of cycles from her hypothesized true distribution, using her reported cycle length as her true mean and her observed standard deviation in the prospective study as an estimate of her variability. The number of cycles drawn for each woman was equal to the number she actually contributed to the prospective study. The kappa derived from these idealized simulations was used to standardize the corresponding empirical kappa to adjust for sampling variability in estimating the empirical averages.
The analyses for this paper were generated with SAS software, version 9.1 of the SAS System for Windows (SAS Institute, Inc., Cary, North Carolina).
The 352 participants in this study contributed 1,970 menstrual cycles. The mean number of cycles contributed by each woman was 6 (median = 6; range, 1–8). Ninety percent of the women contributed more than 3 cycles. Most of the women were aged 38 years, most were of White race, and most had at least some college education. For the 1,970 cycles, onset of menses occurred before diary receipt for 287, making them only partially prospectively observed. These cycles were slightly longer overall (geometric mean of 27.7 days compared with 27.4 without these cycles) but in the woman-specific analysis were not significantly more often above the women-specific median than below it; therefore, we retained these cycles in subsequent analyses.
The distribution of the reported cycle length was right skewed, with peaks at 28 and 30 days (figure 1). The observed distribution showed a similar right skew, but the most frequent cycle lengths were 26, 27, and 28 days. Thus, the distribution of reported cycle length was shifted toward longer lengths compared with the observed distribution. When the reported and observed were compared within women (figure 2), those reporting very short cycles (<25 days) underestimated their cycle length, while those reporting cycle lengths greater than 35 days overestimated their cycle length.
The overall kappa coefficient for agreement between reported and observed was 0.33. Since the reported mean formed the basis for the simulated cycles, the simulation mimicked what we would expect to find if the women were perfectly accurate in their reporting and we sampled the observed number of cycles for each woman. The overall kappa comparing the simulated and the reported cycle lengths was 0.73 based on an average of 1,000 simulations. Because the kappa coefficient was not equal to 1 (perfect reporting), we concluded that within-woman sampling variability will affect the ability to detect “perfect” reporting. We therefore adjusted our overall kappa estimate from the prospective study by dividing by the kappa obtained from the simulation. When we adjusted in this way for sampling variability, the kappa was 0.45 (95 percent confidence interval: 0.36, 0.55).
Table 1 shows reporting accuracy for specific subgroups of the sample. The kappa coefficient was low for women with incomes of $15,000–$29,999, whereas the kappa coefficients for women with both higher and lower incomes were much higher. Women who were sexually active and women with a history of infertility were more accurate in their reporting of cycle length. The accuracy among infertile women became more pronounced when a respondent reported she had sought care for difficulty in conceiving. There was some decrease in accuracy among the most active exercisers (>6 hours per week), but this group was small. Women in the highest perceived stress category were less accurate reporters.
Overall, participants overreported their cycle length by 0.7 days (95 percent confidence interval: 0.3, 1). Because women could be estimating a “usual” length that is closer to an arithmetic mean, we also examined arithmetic means instead of geometric means. The arithmetic mean was 27.8, and the within-woman average difference was 0.43 (95 percent confidence interval: 0.05, 0.82); thus, the overreporting persisted.
Linear regression of the difference between reported and observed cycle lengths identified variables that predicted differential reporting errors. Factors related to systematic reporting effects were prior medical evaluation for irregular cycles, recent changes in the duration of menstrual bleeding, parity, body mass index, and exercise (table 2). Our adjusted analysis did not indicate which group was the most (or least) biased, since the beta coefficients were relative comparisons and did not identify the group closest to zero (no bias). However, relative differences between subgroups are shown in table 2. For example, among women with a prior medical evaluation for irregular cycles, the estimated difference (reported − observed) was 0.6 days less than the estimated difference for women with no prior history of infertility. This systematic reporting difference between subgroups was statistically significant (p = 0.01, two sided).
To provide examples of how reporting differences could affect potential study findings, we examined cycle length by infertility status, smoking, and body mass index (table 3). In our prospective data, women with a history of infertility had a significantly longer cycle length compared with those without such a history. This difference was less apparent when the same women were compared by using their self-reported cycle length. Body mass index showed a similar pattern. For smoking status, there was no detectable difference in the observed cycle length between current, former, and never smokers. However, the reported cycle length indicated an association, with shorter cycles for current smokers. These patterns indicated that differential reporting can either obscure associations present in the prospective data or suggest associations not found in the prospective data.
Reported menstrual cycle length showed moderate agreement with observed cycle length. The reported length slightly overestimated the observed cycle length in this study sample. This difference was not explained by the difference between the arithmetic mean and the geometric mean. It is possible that digit preference explains some of this overestimation because women tended to report 28 and 30 days as their usual cycle length, while the overall observed geometric mean of the woman-specific means was approximately 27 days. Digit preference is unlikely to be the only explanation for the shift however, since, as the scatter plot shows (figure 2), overestimation occurred at all but the shortest reported menstrual cycle lengths.
Our data support the use of the first partially observed cycle in prospective studies. In our sample, these cycles were not significantly different from the woman's cycle length pattern in the prospective study (as defined by the median). These first cycles were slightly longer than the average of the purely prospective cycles. This finding would be expected based on length-biased sampling: the day on which a woman received her diary was a random event, but the longer a cycle, the higher the probability of receiving the diary during that cycle.
We could not assess the effects of age on reporting accuracy and systematic reporting effects. Our population was derived from a randomized trial of pregnancy; thus, all of the offspring were approximately the same age, which may limit the generalizability of our results. The potential effect of age on reporting is unclear. One previous study found no difference in reporting accuracy among women <30 years of age compared with women ≥30 years of age (13). In contrast, other studies found older women to be more accurate in their reporting (14, 15).
Although DES exposure did not predict reporting accuracy or systematic reporting differences, this population as a whole may be more aware of their reproductive characteristics because of their association with the DES trial. It is possible that the women in our study were the most active and vigilant participants from the DES trial. If so, reporting by women in our study may be better than that in a random sample of the population. Our study therefore may represent a best-case scenario for reporting of menstrual cycle length.
The factors that women consider when quantifying their menstrual cycle length are unclear. They may be influenced by the most recent cycle, which is probably the easiest to remember even if it is not the most representative. Furthermore, as women grow older, their cycle length shortens (16); thus, older women may be reporting a lifetime estimate of cycle length that does not reflect their more recent (shorter) cycle lengths. Additionally, some women may count only nonbleeding days (the time from the end of a menses until the beginning of the subsequent menses) when estimating “cycle length.”
Previous studies have suggested that women's self-reported cycle length is inaccurate. A World Health Organization study found that women in developing countries had difficulty accurately predicting the onset of their next bleeding episode, most often overestimating the length of the bleeding-free interval (17). Retrospective self-report of menstrual cycle changes by women aged 45–55 years has been reported to be insensitive when compared with prospective daily diary data (18). In a US–based study, the cycle length women reported at enrollment was slightly longer than the cycle length observed after participating in prospective daily recording (13). In a second US study, 38 percent of the sample had an absolute difference of more than 2 days between their average actual cycle length and their estimated length (14). However, eligibility for both of these US studies was limited to women with a cycle length of 21–35 days, and the second study also required “regular” menstrual cycles of a “consistent” length. Finally, a recent study of women in New York evaluated reporting accuracy in more detail (15). The authors found that 43 percent of women reported usual cycle lengths more than 2 days different from their prospective mean length and concluded that there is sizable measurement error in self-reported cycle length.
We found that women who were married, or unmarried but sexually active, were more accurate in their cycle length reporting compared with unmarried and celibate women. Women who are sexually active may be more aware of their menstrual cycle length in order to avoid or encourage conception. Our result agrees with that of Small et al. (15), who found that married women were more likely to be within 2 days of their mean cycle length compared with single women, but it disagrees with that of Creinen et al. (14), who found no differences by marital status. Women with a history of infertility also showed better agreement between their observed and reported cycle length, potentially as a result of heightened self-awareness in response to their condition. We did not find any previous studies assessing reporting accuracy with respect to a history of infertility. We did not find an important effect of cycle length variability on reporting, as noted by Small et al., which could be due to the smaller range of variability in our study. We did not find an effect of parity on reporting accuracy, but we did observe an effect of parity on systematic differences in reporting. Previous studies are inconclusive, with one finding less accuracy among parous women (13), and two finding no effect (14, 15). In agreement with previous studies, we found no effect of education or body mass index on reporting accuracy (14, 15).
Our investigation of systematic effects of covariates on reporting suggests that studies using reported cycle length as the outcome of interest may be susceptible to artifact. For example, a study might find that women with a history of irregular menstrual cycles report shorter cycles than those without such a history. This is particularly a concern if the observed cycle length differences are half a day or less, which is approximately the magnitude of the systematic reporting differences we detected.
In sum, when asking women about their cycle length, researchers should keep in mind that reproductive and behavioral characteristics may affect reporting behavior.
This research was supported by the Intramural Research Program of the National Institutes of Health, National Institute of Environmental Health Sciences. The authors thank Dr. Matthew Longnecker and Dr. Olga Basso for their insightful comments on this manuscript.
Conflict of interest: none declared.