We assessed the reproducibility over time for a wide spectrum of biomarkers potentially involved in etiological pathways of cancer development among healthy premenopausal and postmenopausal women. Results from our study suggest that a single measure of most plasma carotenoids and fatty acids, resistin, soluble leptin receptor, bioactive somatolactogens, and some MMPs sufficiently represent average levels over at least a several year period, and thus can reliably be used as valid markers to investigate exposure-disease relationships over at least a short-term follow-up. We observed fair or nearly fair reproducibility for plasma and urinary enterolactone, urinary isothiocyanates and plasma melatonin, but poor reproducibility for the majority of the plasma and urinary phytoestrogens and plasma and urine stress hormones. The poor reliability of these biomarkers will attenuate risk estimates and affect the statistical power of prospective studies interested in evaluating biomarker-disease relationships, and thus, should not be used unless other factors that account for their variability are taken into consideration or higher ICCs are documented in populations with greater between-person variation (
22).
Biochemical indicators of dietary intake provide a measure of individual nutrient status that takes into account genetic, metabolic, and lifestyle factors (e.g., physical activity) as well as the intake of other nutrients (
23). This may be an advantage if biochemical status is of primary interest, but may be a disadvantage if dietary intake is the variable that might ultimately be modified. More importantly, biochemical measures are objective and are not affected by the limitations of dietary assessment methods, such as recall bias and some sources of measurement error. Isoflavones (mostly found in soy products) and lignans (present in grain-products, flaxseed, nuts and legumes) are the two main groups of phytoestrogens. Genistein and daidzein are the main phytoestrogens derived from the diet, while equol is a breakdown product of daidzein formed by intestinal bacteria (
24). Enterolactone is a mammalian lignan formed in the proximal colon via the conversion of plant lignans by intestinal micro flora (
24). Cruciferous vegetables (e.g., broccoli, cauliflower, kale) are rich sources of glucosinolates that are metabolized to form isothiocyantes. With the exception of enterolactone, the ICCs of these compounds in our study were poor. This may be explained because of low intake in our population, episodic intake over time, or the rapid metabolism, which could lead to fluctuating levels of some of these biomarkers thus resulting in low ICCs (
25). Similar to our findings, plasma enterolactone ICCs of 0.48–0.66 over 5 week to 3 year periods have been observed in two other studies (
26,
27) and weighted kappa statistic found good agreement over 8 days for urinary enterolactone (0.74) (
28). Further, low ICCs over 3 years for serum daidzein, genistein and equol (≤0.30) were observed in the New York University Women’s Health Study (
27) and only fair agreement was reported for urinary daidzein and genistein over an 8-day period (weighted kappa statistics 0.29 and 0.36, respectively) (
28).
Carotenoids are natural pigments found in fruits and vegetables and serological levels are reflective of fruit and vegetable intake (
29). In the Women’s Healthy Eating and Living Study, the reliability of plasma carotenoids over four-years were fair to good and ranged from 0.47 to 0.66 but were lower than what we reported (
30). One long-term study reported that the difference between carotenoid levels measured 15 years apart did not exceed 26% (
31). A significant difference between ours and prior studies is that we did not take into account plasma cholesterol or triglycerides, both of which are correlated with plasma carotenoid levels (
23). Nonetheless, these findings and our own collectively suggest that plasma measures of carotenoids are an excellent way to evaluate long-term exposure.
Blood fatty acid levels are often utilized as indicators of dietary fat consumption. Circulating fatty acid levels are tightly regulated, thus between-person variability is low relative to within-person variability as suggested by low between- and within-person CVs for some individual fatty acids (e.g., octanoic acid=0.01% and 0.005%, respectively). Although numerous studies have evaluated the validity and reproducibility between fatty acid biomarker levels and dietary records or food frequency questionnaires (e.g., ref. (
32)), data regarding the reproducibility of plasma fatty acid levels among adults are scarce and to our knowledge, there are no studies that have evaluated their stability over time. Overall, we found slightly higher ICCs for the individual fatty acids compared with the summed values. This is likely due to a decrease in between-person variability, which lowers the ICCs for the summed fatty acids, as well as the fact that individual and summed fatty acids are expressed as the percentage of total fatty acids, which therefore more tightly constrains the between-person variation of fatty acids that are a large proportion of the total. Whether ICCs would be similar using measurements of red blood cell fatty acids needs to be explored.
Assessing blood levels of vitamin D provides a better, more integrated measure of vitamin D status than dietary intake data alone given that sun exposure is a major contributor to vitamin D status. Two forms of vitamin D can be measured easily in human plasma: 25(OH)D, the major circulating form of the steroid hormone vitamin D, and 1,25(OH)2D, the bioactive form whose levels are tightly regulated (
33). Given the homeostatic regulation of 1,25(OH)D
2, 25(OH)D is considered a better measure of overall vitamin D status (
33). Both metabolites had good reproducibility in our study.
MMPs are zinc-dependent endopeptidases involved in the degradation of the extracellular matrix and regulation of growth factors (
34). In one study, serum MMP1 (ICC = 0.88) and MMP9 (ICC = 0.63) were strongly correlated for up to two years, while, the ICC for MMP3 was <0.55 (
35). This is somewhat in contrast to our results where the reproducibility was poor for MMP9, although this may be due to differences in study design as Linkov
et al. included both pre- and postmenopausal women while our analysis of MMPs was limited to the latter group.
A limitation of the prolactin immunoassay used in prior studies is that it measures multiple forms of prolactin, which have different biological activities (
36). In contrast, the Nb2 lymphoma cell bioassay we utilized is a sensitive measure of overall somatolactogenic activity in plasma (
37). The assay measures the activity of both prolactin and growth hormone combined (
37), which may capture a more biologically relevant measure of prolactin that may be more strongly associated with cancer risk. The ICC for plasma BSL (ICC = 0.63) was higher than that observed for plasma prolactin as assessed by the immunoassay (ICC = 0.45) in the same dataset (
38).
Previous studies of plasma cortisol assessed stress hormone reproducibility over a much shorter time period, in some cases over the course of hours, and findings differed from our results. In two small studies assessing reproducibility over 1–4 hours, high-intraindividual consistency was observed for plasma cortisol (ICCs=0.64–0.83) and norepinephrine (ICC=0.82) (
39,
40). In one study among 31 men over a six-week period, the authors concluded that the circadian profile of cortisol was highly reproducible; however, no ICCs were reported (
41). The low ICC for cortisol observed in our study likely reflects the diurnal nature of this hormone and its response to various endogenous and exogenous stimuli (
42). We found improvement in the ICCs for total but not free cortisol following averaging across the follicular and luteal phases. Interestingly, the ICC for cortisol decreased after adjustment for time of day of blood draw. In general, adjustment can improve ICCs if the factor explains a portion of the within-person variability (
20) but in our case, time of day explained a portion of the between-person variability.
Melatonin is secreted during the dark phase of the light-dark cycle, following a circadian rhythm of ~24 hours (
43). Urinary 6-sulpha-toxymelatonin (aMT6s) is the major metabolite of melatonin measured in urine and first morning aMT6s levels correlate well with plasma melatonin levels measured during the previous night, reflecting pineal function (
44). However, serum melatonin has a very short half-life and is rapidly metabolized, mainly in the liver. We have previously reported an ICC of 0.72 (95%CI 0.65–0.82) for urinary aMT6s over a 3-year period among premenopausal women (
4). Based on our current findings, premenopausal plasma melatonin had a much lower ICC of 0.32, suggesting that blood levels are not a good measure of melatonin over time among younger women, which may be because one plasma sample does not reflect the nocturnal/circadian pattern of melatonin. Nonetheless, the ICC among postmenopausal women was much higher (0.63). For melatonin, we observed lower within- versus between-person CVs among postmenopausal women while both within- and between- person CVs were similar among premenopausal women. This is likely due to the fact that circadian variation of melatonin is much larger in younger compared with older women (
45).
To our knowledge, there are no other studies of the reproducibility of resisten or soluble leptin receptor over time. The excellent ICCs we observed suggests that these analytes are relatively stable within postmenopausal women not using hormones over one year.
The reproducibility of a biomarker is of particular relevance for epidemiological studies in which we often have only one biologic sample to measure exposure over a long period of time. The ICC is a good measure of reproducibility it takes into account both between and within-person variability. An ICC ≥0.40 indicates that a single measurement of the biomarkers can reasonably represent long-term levels and that the analyte level is relatively stable within individuals over time. This is indicative of relatively low within-person and/or high between-person variation over time. In contrast, a low ICC (<0.40) is suggestive of poor reproducibility and limited stability of the analyte over time. A low ICC may be attributed to high within-person variability and/or low between-person variability, and will result in the attenuation of the relationship between exposure and disease (
46). The majority of the analytes in the current analysis displayed fair to excellent ICCs. Overall, this level of reproducibility is similar to that found for other biological variables such as blood pressure (ICC=0.60–0.64) (
47) and serum cholesterol (ICC=0.65) (
48), exposures considered to be reasonably well-measured and which are consistent predictors of disease in epidemiologic studies.
Biomarker levels can be influenced by various factors including inherent individual factors (e.g., BMI, metabolism), as well as collection and laboratory procedures (e.g., date/time of blood draw); however, adjustment for various potential covariates did not substantially change any of the ICCs in our study except for cortisol and plasma melatonin.
Measurement error correction is one method to account for variability over time and to minimize its impact on effect estimates (
20). These methods use data from a reproducibility study to estimate the true relative risk given the observed relative risk and ICC (
2). Where ICCs are modest and only one analyte measurement is available, investigators can correct relative risks or correlation coefficients and their confidence intervals for random within-person variation to account for the attenuation introduced by this type of error (
2). For example, in our previous study of plasma prolactin concentrations and risk of breast cancer, correcting for within-person variability increased the relative risk comparing the median of the top versus the bottom prolactin quartile from 1.3 to 1.7 (
49). In contrast, where ICCs are high, measurement error correction will have little effect on the final estimate.
To our knowledge, ours is the largest study assessing the reproducibility of multiple plasma and urine biomarkers. A potential limitation of our study was the delay in the processing of the samples given that NHS participants live across the entire US. Nevertheless, we have previously shown that delayed processing up to 48 hours did not affect the stability of various biochemical markers measured in blood (e.g., ref. (
50)) or urine (
4), and we confirmed the stability of the biomarkers included here prior to assessing within-person variability. The within-person variance incorporates laboratory variability such that high CVs can artificially lower the ICC. However, the CVs for most of the analytes in this study were excellent, suggesting that they had limited impact on the ICCs. Whether or not these results apply equally to both pre- and postmenopausal women warrants further study. We could not evaluate the ICC for free epinephrine; however, this may be due to the fact that we had only a spot urine sample rather than 24 hour samples. Those analytes with borderline ICCs but large confidence intervals deserve further evaluation with larger sample sizes when considering their inclusion in epidemiologic studies.
In conclusion, we found that a single measurement of most plasma carotenoids, fatty acids, and MMPs can reliably represent long term levels over time. Plasma melatonin, urinary isothiocyanates and both urinary and plasma enterolactone had borderline to modest ICCs. In contrast, the low ICCs for most of the plasma and urinary phytoestrogens and plasma stress hormones indicate that these are not useful biomarkers, at least in this population. For this reason, they should not be employed in epidemiological studies until the source of their variability is further investigated, or unless higher ICCs are documented in populations with greater between-person variation. More importantly, these data suggest that for those analytes with moderate to high ICCs, one exposure assessment in longitudinal studies is sufficient for use in studies of exposure-disease relationships. Where ICCs are modest, the reproducibility data can be employed for measurement error correction to better estimate the magnitude of associations.