|Home | About | Journals | Submit | Contact Us | Français|
Temporal variability of biomarkers should be evaluated prior to their use in epidemiologic studies.
We evaluated the reproducibility, using intraclass correlation coefficients (ICCs), of 77 plasma and 9 urinary biomarkers over 1–3 years among premenopausal (n=40) and postmenopausal (n=35–70) participants from the Nurses’ Health Study (NHS) and NHSII.
Plasma and urinary stress hormones and melatonin were measured among premenopausal women while melatonin and the remaining biomarkers were measured in postmenopausal women. ICCs were good to excellent for plasma carotenoids (0.73–0.88), vitamin D analytes (0.56–0.72), bioactive somatolactogens (0.62), soluble leptin receptor (0.82), resistin (0.74), and postmenopausal melatonin (0.63). Reproducibility was lower for some of the plasma fatty acids (0.38–0.72), matrix metalloproteinases (0.07–0.91), and premenopausal melatonin (0.44). The ICCs for plasma and urinary phytoestrogens were poor (≤0.09) except for enterolactone (plasma=0.44, urinary=0.52). ICCs for the stress hormones among premenopausal women ranged from 0 (plasma cortisol) to 0.45 (urinary dopamine).
Our results indicate that for the majority of these markers, a single measurement can reliably estimate average levels over a 1–3 year period in epidemiological studies. Where ICCs were fair to good, reproducibility data can be used for measurement error correction. Analytes with poor ICCs should only be used in settings with multiple samples per subject or in populations where ICCs have been shown to be higher.
This paper summarizes the feasibility of the use of more than 80 biomarkers in epidemiologic studies where only one biospecimen is available to represent longer-term exposure.
Currently, many epidemiological studies are integrating biological markers (i.e., biomarkers) as objective indicators of exposure when assessing disease relationships. Common sources of error and bias in biomarker studies include issues related to specimen collection and storage, laboratory assay error, and within-person variability over time (reviewed in (1)). For both cost and logistical reasons, most epidemiologic studies have collected only one biological (e.g., blood, urine) specimen from each participant; however, in studies of diseases with long latency periods such as cancer, it is critical to determine whether a single biomarker measurement accurately reflects an individual’s long-term exposure. While the lack of temporal stability plays an important role in interpreting risk estimates and may attenuate effect estimates when prospectively evaluating the association between a biomarker and disease risk (2), it has not been evaluated for many biomarkers. When the variability is moderate, error correction methods can use estimates of variability to correct measures of association (2).
Studies to assess within-person variability over time are necessary to determine whether a single measurement available in most epidemiological studies is reflective of long-term exposure. To address this, within the Nurses’ Health Study (NHS) and NHSII, we examined the reproducibility of 86 individual plasma and urinary markers of exposure potentially involved in the development of cancer among 40 premenopausal and 75 postmenopausal women over a one to three year period. The decision to measure biomarkers by menopausal status was determined by the structure of the population within which we ultimately planned to use the biomarker.
The NHS was established in 1976 among 121,700 US female registered nurses, ages 30 to 55 years, and the NHSII was established in 1989 among 116,430 female registered nurses, ages 25 to 42 years. All women completed an initial questionnaire and have been followed biennially by mailed questionnaire to update exposure status and disease diagnoses. Data have been collected on numerous risk factors including parity, hormone use, tubal ligation, and family history of cancer.
From 1989–1990, 32,826 NHS participants (ages 43–70 years) provided blood samples and completed a short questionnaire (3). Briefly, women arranged to have their blood drawn in heparin tubes and shipped with an icepack, via overnight courier, to our laboratory, where it was processed into plasma, red blood cells and white blood cells and archived in liquid nitrogen freezers.
Three hundred-ninety NHS participants who gave a first blood sample in 1989–1990 were asked to collect two additional samples over the following 2 to 3 years once in 1991 and again in 1992, using the same collection protocol and materials as in the initial collection. The women were postmenopausal, had not used postmenopausal hormones for at least three months, and had no previous diagnosis of cancer (except nonmelanoma skin cancer); these criteria were applied at each sample collection. Women were considered menopausal if they had no menses for at least 12 months, had a bilateral oophorectomy or, for women who had a hysterectomy without a bilateral oophorectomy, were at least 54 years of age if they were current smokers and at least 56 years if they were nonsmokers (these are the ages when natural menopause had occurred in 90% of the cohort). Of the 390 women, 186 (48%) sent two additional samples. A random sample of these women (n=40–75 per analyte) who had all three samples drawn between 6 a.m. and 12 p.m. was sent for biomarker analysis. Ninety-four percent of the samples were collected after a fast of at least 8 h.
In 2000–2001, all women who gave blood in 1990, were alive, did not participate in giving extra samples in the collection described above, and had no history of cancer diagnosis were asked to provide a second blood sample as well as a first urine sample; 18,473 women participated. The collection protocols were the same as the first collection, except that women were also asked to collect a first morning urine sample on the day of the blood draw. In 2003, 2,005 of these women also provided a second urine sample, using the same protocol and materials for blood and urine collection.
We collected data on age and height on the 1976 study questionnaire. Menopausal status and smoking status have been asked on each biennial study questionnaire, and menopausal status was also asked on the questionnaire completed at the time of blood collection. Data on current weight also were obtained at each blood collection (for one woman who did not complete the questionnaire at the first blood collection, the weight reported on her 1990 study questionnaire was used). BMI (kg/m2) was used as the measure of adiposity in these analyses. At each collection, we asked women if they currently used steroids, antidepressants, or medications for a thyroid disorder or to list all medications used in the previous week.
Between 1996 and 1999, 29,611 NHSII cohort members who were cancer-free and between the ages of 32 and 52 years provided blood and urine samples. Of these, 18,521 were premenopausal participants who provided blood and urine samples timed within the menstrual cycle; the women had not used oral contraceptives, been pregnant, or breastfed within six months. Blood samples were drawn on the 3rd to 5th day of their menstrual cycle (follicular), placed in a refrigerator for 8–24 hours by the participant, and then the plasma was aliquoted into a labeled cryotube included in the kit. This plasma was frozen by the woman until the second (luteal) blood collection. They also provided luteal phase blood and urine samples collected 7–9 days before the anticipated start of their next cycle. Samples were shipped after luteal collection, via overnight courier with an ice-pack, to our laboratory where the blood was processed as in the NHS collection and, along with the urine, was aliquoted (4).
Among the premenopausal women who provided samples timed within the menstrual cycle, a random sample of those who were not planning to be pregnant or lactating over the next 3 years were invited to provide two additional sets of timed follicular and luteal samples (blood and urine) over the following 2 to 3 years, in 1999 and 2000. Details have been published previously (5). Briefly, second and third collection kits, with materials identical to the first kit, were mailed to women who returned the first kit without being reminded and who remained eligible to participate. Of the 412 women invited, 304 (74%) provided a second set of samples and 236 (57%) sent a third set.
For each collection, women completed a questionnaire recording their weight, menstrual cycle length, and first day of the menstrual cycle during which the blood and urine samples were collected, and the time, date, and number of hours since the last meal for each sample. In addition, women returned a postcard recording the first day of their next menstrual cycle, allowing us to back-calculate the luteal day of the collection.
Of the 236 women with collections for 3 different menstrual cycles over a 2- to 3-year period, 40 to 45 women with luteal samples collected between 3 and 11 days before the start of her next menstrual cycle were selected for the reproducibility studies. These women are a subset of the 113 women in the previously published paper on reproducibility of plasma hormones (5) and urinary estrogen metabolites (6).
All blood and urine specimens, from both cohorts, have been archived in liquid nitrogen freezers (≤130°C) from the time samples were received in our lab (i.e., 24 hours after collection) until samples were assayed for the various biomarkers. The study was approved by the Committee on the Use of Human Subjects in Research at Harvard School of Public Health and Brigham and Women’s Hospital.
All plasma or urine samples from a single woman were assayed together; samples were ordered randomly and labeled such that the laboratories could not identify samples from the same woman. All the assays to assess the plasma and urinary biomarkers have been previously described (7–15). Briefly, plasma and urinary phytoestrogens were assayed by liquid chromatography tandem mass spectrometry (LC/MS) (7). The serum assay for bioactive somatolactogens (BSL) was performed using the Nb2 rat lymphoma cell bioassay which assays both prolactin and growth hormone (8). Plasma melatonin was assayed with a radioimmunoassay (RIA) kit that uses a double-antibody RIA based on the Kennaway G280 anti-melatonin antibody (9). Plasma 25-hydroxyvitamin D and 1,25-dihydroxyvitamin D were assayed by RIA (10, 11). Plasma levels of specific carotenoids (α-carotene, β-carotene, β-cryptoxanthin, lutein/zeaxanthin, total lycopene), retinol, and tocopherols were measured using HPLC (12). Plasma and urine stress hormones were assayed by LC/MS (13). Plasma matrix metalloproteinases (MMPs) were assayed by bead-based sandwich immunoassays using color-coded microspheres as the solid support for the capture antibody (16). Sensitivity, reliability, and accuracy are similar to those observed with standard microtiter ELISA procedures. Urinary and plasma isothiocyantes were measured by high-performance liquid chromatography (14). Forty-five plasma fatty acids were quantitated by gas chromatography (15). These were summed to create total values saturated, monounsaturated, polyunsatured-omega-3, polyunsatured-omega-6, and trans fats (see Supplemental Table). Resistin and soluble leptin receptor were measured using a commercially available ELISA (Millipore, MA and Diagnostic Systems Laboratories, Inc, TX, respectively) (17, 18).
The number of participants and the number of draws assayed for each analyte are summarized in the tables. Some biomarkers were only measured in 2 of 3 time points due to financial constraints and the hypothesized stability of the biomarker. Some assays could not be conducted on all the samples sent to the assay lab because of low sample volume or technical difficulties with the assay (n=4–12 for 7 biomarkers). We could not calculate an ICC for urine free epinephrine because 97 of the samples were all below the limit of detection.
We used the natural log-transformed value of each analyte in our analyses because the transformed values were more normally distributed. We identified and excluded statistical outliers using the extreme studentized deviate many-outlier procedure (19). This resulted in the removal of 1–7 values for 7 plasma biomarkers and 4 urine biomarkers. We calculated medians and 5th and 95th percentiles on the natural log scale, and exponentiated the values back to the original scale. Urinary biomarkers were adjusted for creatinine (units/weight creatinine) to account for the potential difference in urine concentration in our spot-urine samples.
For each analyte we examined the assay reproducibility of blinded quality control (QC) replicates using the coefficient of variation (CV), a commonly used statistic to describe laboratory technical error, and determined the effect of delayed sample processing on analyte concentrations (all analytes in this study were stable with delayed processing; data not shown). The CV was determined by estimating the standard deviation of the QC values, dividing by the mean of these values and multiplying by 100.
Between-person and within-person variances were estimated from repeated participant sample measurements using a random effects model, with participant ID as the random variable. We assessed reproducibility over a 1 to 2 year period (draws 1 and 2) for the plasma carotenoids, phytoestrogens, resistin, soluble leptin receptor and fatty acids, as well as urinary phytoestrogens and isothiocyanates, and over a 2 to 3 year period for the urinary cortisol, catecholamines and plasma BSL (draws 1 and 3), as well as plasma vitamin D and cortisol (draws 1, 2 and 3). To assess reproducibility, we calculated intraclass correlation coefficients (ICCs) by dividing the between-person variance by the sum of the within- and between-person variances; 95% confidence intervals (CI) also were calculated (20). An ICC < 0.40 indicates poor reproducibility, 0.40–0.75 indicates fair to good reproducibility, and ≥ 0.75 indicates excellent reproducibility (21). Using a mixed model, we adjusted for factors that could potentially affect reproducibility by including the following variables, as fixed effects, in the mixed model: age at blood draw (continuous), fasting status of draw (≥8 hours since last meal versus more recent intake), body mass index (BMI; continuous, kg/m2), and time at blood draw (1 pm-8 pm, 9 pm-5 am, 6 am-12 pm) for the plasma analytes, and age at urine collection (continuous), BMI (continuous, kg/m2), and first morning urine (yes/no) for the urine analytes. The between- and within-person CVs were determined by taking the square root of the between- and within-person variance components from the random effects mixed model on the ln-transformed scale, with approximate estimates derived by the delta method (21).
The CVs, medians, 5th-95th percentile ranges, and ICCs for the dietary biomarkers, stress hormones, and other cancer-related biomarkers assayed are summarized in Table 1–Table 3, respectively. Overall, the CVs were <15% except for plasma 1,25-dihydroxyvitamin D, equol, MMPs 2 and 7, and BSL (CVs range from 17.5 (MMP7) to 37.7 (equol)). All biomarker values were in the expected ranges for healthy individuals.
The ICCs for the plasma carotenoids were relatively high (ICC ≥ 0.73; range 0.73–0.88) indicating good to excellent reproducibility, while the ICCs were low for the urinary isothiocyanates (ICC = 0.18) and for most of the plasma and urinary phytoestrogens (ICC ≤ 0.40 and ≤ 0.09, respectively) (Table 1). However, both plasma and urinary enterolactone had fair to good ICCs of 0.44 and 0.52. Reproducibility of plasma 25-hydroxyvitamin D was excellent (ICC = 0.75), while that of 1,25-dihydroxyvitamin D was slightly lower (ICC = 0.59). In general, the ICCs for the individual plasma fatty acids (median ICC = 0.57; range 0.00–0.87) used to create the summed variables were higher than the summed values (e.g., octanoic acid and heneicosanoic acid ICCs = 0.84 and 0.56, respectively, while the ICC of the summed saturated fatty acids was 0.36) (see Supplemental Table for ICCs of the 46 individual plasma fatty acids).
The ICCs for urinary and plasma free cortisol were poor (ICC = 0.25 and 0.00–0.09) (Table 2). Further, both plasma follicular and luteal total cortisol ICCs were poor (ICC = 0.25 and 0.38); however, averaging the follicular and luteal measurements improved the ICC (ICC = 0.41). The ICC for the two urinary catecholamines, norepinephrine and dopamine were poor to fair (ICC= 0.38 and 0.44, respectively).
The ICCs for MMP1, MMP2, and MMP7 were high (ICC ≥ 0.70); while, the ICC was good for MMP3 (ICC = 0.52), and poor for MMP9 (ICC = 0.07) (Table 3). The ICC for the plasma BSL was good (ICC = 0.62), while the ICC for plasma melatonin was fair for premenopausal (ICC = 0.44) but higher for postmenopausal women (ICC = 0.63). Both plasma resistin and soluble leptin receptor had excellent ICCs (ICC = 0.74 and 0.82, respectively).
In general, adjustment for age at blood, fasting status, BMI, and time at blood draw for the plasma analytes and first morning urine (yes/no), age at urine, and BMI for the urine analytes did not notably change most of the final ICCs (data not shown). However, adjustment for time at blood draw did change ICCs for mean total plasma cortisol (unadjusted ICC=0.41 versus adjusted ICC=0.26) and plasma melatonin in premenopausal women (unadjusted ICC=0.44 vs. adjusted ICC=0.32). No variable was clearly associated with the change in the ICC for plasma MMP3.
We also reported the within- and between-person CVs for all the analytes to help determine the source of poor ICCs (Table 1–Table 3). For example, for cortisol, the within-person CVs were much higher for the individual phases of the menstrual cycle (e.g., luteal, follicular values) than the within-person CV after averaging the values (Table 2). Among postmenopausal women, the within-person CV for melatonin was much lower than the between-person CV (57.9 vs. 77.4)(Table 3) but this was not the case among premenopausal women where both the within- and between-person CVs were similar (88.2 vs. 80.0), leading to a lower ICC.
We assessed the reproducibility over time for a wide spectrum of biomarkers potentially involved in etiological pathways of cancer development among healthy premenopausal and postmenopausal women. Results from our study suggest that a single measure of most plasma carotenoids and fatty acids, resistin, soluble leptin receptor, bioactive somatolactogens, and some MMPs sufficiently represent average levels over at least a several year period, and thus can reliably be used as valid markers to investigate exposure-disease relationships over at least a short-term follow-up. We observed fair or nearly fair reproducibility for plasma and urinary enterolactone, urinary isothiocyanates and plasma melatonin, but poor reproducibility for the majority of the plasma and urinary phytoestrogens and plasma and urine stress hormones. The poor reliability of these biomarkers will attenuate risk estimates and affect the statistical power of prospective studies interested in evaluating biomarker-disease relationships, and thus, should not be used unless other factors that account for their variability are taken into consideration or higher ICCs are documented in populations with greater between-person variation (22).
Biochemical indicators of dietary intake provide a measure of individual nutrient status that takes into account genetic, metabolic, and lifestyle factors (e.g., physical activity) as well as the intake of other nutrients (23). This may be an advantage if biochemical status is of primary interest, but may be a disadvantage if dietary intake is the variable that might ultimately be modified. More importantly, biochemical measures are objective and are not affected by the limitations of dietary assessment methods, such as recall bias and some sources of measurement error. Isoflavones (mostly found in soy products) and lignans (present in grain-products, flaxseed, nuts and legumes) are the two main groups of phytoestrogens. Genistein and daidzein are the main phytoestrogens derived from the diet, while equol is a breakdown product of daidzein formed by intestinal bacteria (24). Enterolactone is a mammalian lignan formed in the proximal colon via the conversion of plant lignans by intestinal micro flora (24). Cruciferous vegetables (e.g., broccoli, cauliflower, kale) are rich sources of glucosinolates that are metabolized to form isothiocyantes. With the exception of enterolactone, the ICCs of these compounds in our study were poor. This may be explained because of low intake in our population, episodic intake over time, or the rapid metabolism, which could lead to fluctuating levels of some of these biomarkers thus resulting in low ICCs (25). Similar to our findings, plasma enterolactone ICCs of 0.48–0.66 over 5 week to 3 year periods have been observed in two other studies (26, 27) and weighted kappa statistic found good agreement over 8 days for urinary enterolactone (0.74) (28). Further, low ICCs over 3 years for serum daidzein, genistein and equol (≤0.30) were observed in the New York University Women’s Health Study (27) and only fair agreement was reported for urinary daidzein and genistein over an 8-day period (weighted kappa statistics 0.29 and 0.36, respectively) (28).
Carotenoids are natural pigments found in fruits and vegetables and serological levels are reflective of fruit and vegetable intake (29). In the Women’s Healthy Eating and Living Study, the reliability of plasma carotenoids over four-years were fair to good and ranged from 0.47 to 0.66 but were lower than what we reported (30). One long-term study reported that the difference between carotenoid levels measured 15 years apart did not exceed 26% (31). A significant difference between ours and prior studies is that we did not take into account plasma cholesterol or triglycerides, both of which are correlated with plasma carotenoid levels (23). Nonetheless, these findings and our own collectively suggest that plasma measures of carotenoids are an excellent way to evaluate long-term exposure.
Blood fatty acid levels are often utilized as indicators of dietary fat consumption. Circulating fatty acid levels are tightly regulated, thus between-person variability is low relative to within-person variability as suggested by low between- and within-person CVs for some individual fatty acids (e.g., octanoic acid=0.01% and 0.005%, respectively). Although numerous studies have evaluated the validity and reproducibility between fatty acid biomarker levels and dietary records or food frequency questionnaires (e.g., ref. (32)), data regarding the reproducibility of plasma fatty acid levels among adults are scarce and to our knowledge, there are no studies that have evaluated their stability over time. Overall, we found slightly higher ICCs for the individual fatty acids compared with the summed values. This is likely due to a decrease in between-person variability, which lowers the ICCs for the summed fatty acids, as well as the fact that individual and summed fatty acids are expressed as the percentage of total fatty acids, which therefore more tightly constrains the between-person variation of fatty acids that are a large proportion of the total. Whether ICCs would be similar using measurements of red blood cell fatty acids needs to be explored.
Assessing blood levels of vitamin D provides a better, more integrated measure of vitamin D status than dietary intake data alone given that sun exposure is a major contributor to vitamin D status. Two forms of vitamin D can be measured easily in human plasma: 25(OH)D, the major circulating form of the steroid hormone vitamin D, and 1,25(OH)2D, the bioactive form whose levels are tightly regulated (33). Given the homeostatic regulation of 1,25(OH)D2, 25(OH)D is considered a better measure of overall vitamin D status (33). Both metabolites had good reproducibility in our study.
MMPs are zinc-dependent endopeptidases involved in the degradation of the extracellular matrix and regulation of growth factors (34). In one study, serum MMP1 (ICC = 0.88) and MMP9 (ICC = 0.63) were strongly correlated for up to two years, while, the ICC for MMP3 was <0.55 (35). This is somewhat in contrast to our results where the reproducibility was poor for MMP9, although this may be due to differences in study design as Linkov et al. included both pre- and postmenopausal women while our analysis of MMPs was limited to the latter group.
A limitation of the prolactin immunoassay used in prior studies is that it measures multiple forms of prolactin, which have different biological activities (36). In contrast, the Nb2 lymphoma cell bioassay we utilized is a sensitive measure of overall somatolactogenic activity in plasma (37). The assay measures the activity of both prolactin and growth hormone combined (37), which may capture a more biologically relevant measure of prolactin that may be more strongly associated with cancer risk. The ICC for plasma BSL (ICC = 0.63) was higher than that observed for plasma prolactin as assessed by the immunoassay (ICC = 0.45) in the same dataset (38).
Previous studies of plasma cortisol assessed stress hormone reproducibility over a much shorter time period, in some cases over the course of hours, and findings differed from our results. In two small studies assessing reproducibility over 1–4 hours, high-intraindividual consistency was observed for plasma cortisol (ICCs=0.64–0.83) and norepinephrine (ICC=0.82) (39, 40). In one study among 31 men over a six-week period, the authors concluded that the circadian profile of cortisol was highly reproducible; however, no ICCs were reported (41). The low ICC for cortisol observed in our study likely reflects the diurnal nature of this hormone and its response to various endogenous and exogenous stimuli (42). We found improvement in the ICCs for total but not free cortisol following averaging across the follicular and luteal phases. Interestingly, the ICC for cortisol decreased after adjustment for time of day of blood draw. In general, adjustment can improve ICCs if the factor explains a portion of the within-person variability (20) but in our case, time of day explained a portion of the between-person variability.
Melatonin is secreted during the dark phase of the light-dark cycle, following a circadian rhythm of ~24 hours (43). Urinary 6-sulpha-toxymelatonin (aMT6s) is the major metabolite of melatonin measured in urine and first morning aMT6s levels correlate well with plasma melatonin levels measured during the previous night, reflecting pineal function (44). However, serum melatonin has a very short half-life and is rapidly metabolized, mainly in the liver. We have previously reported an ICC of 0.72 (95%CI 0.65–0.82) for urinary aMT6s over a 3-year period among premenopausal women (4). Based on our current findings, premenopausal plasma melatonin had a much lower ICC of 0.32, suggesting that blood levels are not a good measure of melatonin over time among younger women, which may be because one plasma sample does not reflect the nocturnal/circadian pattern of melatonin. Nonetheless, the ICC among postmenopausal women was much higher (0.63). For melatonin, we observed lower within- versus between-person CVs among postmenopausal women while both within- and between- person CVs were similar among premenopausal women. This is likely due to the fact that circadian variation of melatonin is much larger in younger compared with older women (45).
To our knowledge, there are no other studies of the reproducibility of resisten or soluble leptin receptor over time. The excellent ICCs we observed suggests that these analytes are relatively stable within postmenopausal women not using hormones over one year.
The reproducibility of a biomarker is of particular relevance for epidemiological studies in which we often have only one biologic sample to measure exposure over a long period of time. The ICC is a good measure of reproducibility it takes into account both between and within-person variability. An ICC ≥0.40 indicates that a single measurement of the biomarkers can reasonably represent long-term levels and that the analyte level is relatively stable within individuals over time. This is indicative of relatively low within-person and/or high between-person variation over time. In contrast, a low ICC (<0.40) is suggestive of poor reproducibility and limited stability of the analyte over time. A low ICC may be attributed to high within-person variability and/or low between-person variability, and will result in the attenuation of the relationship between exposure and disease (46). The majority of the analytes in the current analysis displayed fair to excellent ICCs. Overall, this level of reproducibility is similar to that found for other biological variables such as blood pressure (ICC=0.60–0.64) (47) and serum cholesterol (ICC=0.65) (48), exposures considered to be reasonably well-measured and which are consistent predictors of disease in epidemiologic studies.
Biomarker levels can be influenced by various factors including inherent individual factors (e.g., BMI, metabolism), as well as collection and laboratory procedures (e.g., date/time of blood draw); however, adjustment for various potential covariates did not substantially change any of the ICCs in our study except for cortisol and plasma melatonin.
Measurement error correction is one method to account for variability over time and to minimize its impact on effect estimates (20). These methods use data from a reproducibility study to estimate the true relative risk given the observed relative risk and ICC (2). Where ICCs are modest and only one analyte measurement is available, investigators can correct relative risks or correlation coefficients and their confidence intervals for random within-person variation to account for the attenuation introduced by this type of error (2). For example, in our previous study of plasma prolactin concentrations and risk of breast cancer, correcting for within-person variability increased the relative risk comparing the median of the top versus the bottom prolactin quartile from 1.3 to 1.7 (49). In contrast, where ICCs are high, measurement error correction will have little effect on the final estimate.
To our knowledge, ours is the largest study assessing the reproducibility of multiple plasma and urine biomarkers. A potential limitation of our study was the delay in the processing of the samples given that NHS participants live across the entire US. Nevertheless, we have previously shown that delayed processing up to 48 hours did not affect the stability of various biochemical markers measured in blood (e.g., ref. (50)) or urine (4), and we confirmed the stability of the biomarkers included here prior to assessing within-person variability. The within-person variance incorporates laboratory variability such that high CVs can artificially lower the ICC. However, the CVs for most of the analytes in this study were excellent, suggesting that they had limited impact on the ICCs. Whether or not these results apply equally to both pre- and postmenopausal women warrants further study. We could not evaluate the ICC for free epinephrine; however, this may be due to the fact that we had only a spot urine sample rather than 24 hour samples. Those analytes with borderline ICCs but large confidence intervals deserve further evaluation with larger sample sizes when considering their inclusion in epidemiologic studies.
In conclusion, we found that a single measurement of most plasma carotenoids, fatty acids, and MMPs can reliably represent long term levels over time. Plasma melatonin, urinary isothiocyanates and both urinary and plasma enterolactone had borderline to modest ICCs. In contrast, the low ICCs for most of the plasma and urinary phytoestrogens and plasma stress hormones indicate that these are not useful biomarkers, at least in this population. For this reason, they should not be employed in epidemiological studies until the source of their variability is further investigated, or unless higher ICCs are documented in populations with greater between-person variation. More importantly, these data suggest that for those analytes with moderate to high ICCs, one exposure assessment in longitudinal studies is sufficient for use in studies of exposure-disease relationships. Where ICCs are modest, the reproducibility data can be employed for measurement error correction to better estimate the magnitude of associations.
The authors thank Candice Ishikawa for her help with the laboratory database, as well as the study participants of the Nurses’ Health Studies for their dedication to this study and their contribution to this research. This research was supported by Research Grants CA105009, CA50385, CA49449, and P01 CA87969 from the National Cancer Institute. J.K. is a Research Fellow of the Canadian Cancer Society supported through an award from the National Cancer Institute of Canada.