|Home | About | Journals | Submit | Contact Us | Français|
Population-based studies have been hampered in exploring hypothalamic–pituitary–adrenal axis (HPA) activity as a potential explanatory link between stress-related and metabolic disorders due to their lack of incorporation of reliable measures of chronic cortisol exposure. The purpose of this review is to summarize current literature on the reliability of HPA axis measures and to discuss the feasibility of performing them in population-based studies. We identified articles through PubMed using search terms related to cortisol, HPA axis, adrenal imaging, and reliability. The diurnal salivary cortisol curve (generated from multiple salivary samples from awakening to midnight) and 11 p.m. salivary cortisol had the highest between-visit reliabilities (r = 0.63–0.84 and 0.78, respectively). The cortisol awakening response and dexamethasone-suppressed cortisol had the next highest between-visit reliabilities (r = 0.33–0.67 and 0.42–0.66, respectively). Based on our own data, the inter-reader reliability (rs) of adrenal gland volume from non-contrast CT was 0.67–0.71 for the left and 0.47–0.70 for the right adrenal glands. While a single 8 a.m. salivary cortisol is one of the easiest measures to perform, it had the lowest between-visit reliability (R = 0.18–0.47). Based on the current literature, use of sampling multiple salivary cortisol measures across the diurnal curve (with awakening cortisol), dexamethasone-suppressed cortisol, and adrenal gland volume are measures of HPA axis tone with similar between-visit reliabilities which likely reflect chronic cortisol burden and are feasible to perform in population-based studies.
Allostasis is the central process employed by mammals to maintain homeostasis threatened by various forms of stress. It includes a series of dynamic actions through which a variety of neuroendocrine hormones, immune factors and autonomic nervous system mediators are triggered [1, 2]. When burdened by cumulative stress, the allostatic load (i.e., hypothetical measure of cumulative stress) of an organism increases, resulting in wear and tear on the organism from excessive exposure to the catabolic properties of glucocorticoids, stress peptides and pro-inflammatory cytokines. This burden taxes metabolic systems and can influence the development of insulin resistance, cardiovascular disease, osteoporosis and other disorders. One of the major contributors to allostatic load is cortisol exposure, regulated by the hypothalamic–pituitary–adrenal (HPA) axis.
The HPA axis is a tightly regulated system that represents one of the body’s response mechanisms to acute and chronic physiological or psychological stress. In response to physiological or psychological stressors (Fig. 1), the HPA axis is activated, resulting in release of CRH from the hypothalamus, which stimulates the anterior pituitary gland to release ACTH. ACTH stimulates release of cortisol form the adrenal glands, which results in a cascade of physiological events. Once the stressor has resolved the response is terminated through a negative feedback loop, in which cortisol suppresses further release of ACTH and CRH. Chronic stress injures this component of the stress response. Population-based studies often attempt to measure the contribution of the HPA axis to metabolic outcomes.
There are numerous environmental and genetic factors that can increase an individual’s exposure to cortisol. For example, there is a growing body of evidence showing that clinical depression and depressive symptoms (states associated with hypercortisolism) predict development of cardiovascular disease  and type 2 diabetes . While much of this association is explained by obesity-promoting health behaviors, these factors do not explain all of the observed associations . There is also evidence accumulating that various forms of chronic stress promote the development of metabolic dysfunction . It has been proposed that an increased allostatic load, primarily mediated by excess cortisol burden, may result in metabolic dysfunction documented in patients with depression and other conditions that chronically elevate cortisol exposure .
Population-based studies have been hampered in exploring a neuroendocrine link between these conditions due to lack of incorporation of reliable measures of chronic cortisol exposure which would permit quantifying the metabolic burden imposed by cortisol production. One major problem in selecting and interpreting cortisol measurements in epidemiological studies is that most existing measures reflect cortisol exposure over a short duration of time and may not reliably quantify the allostatic load imposed by chronic cortisol exposure over time. Another problem in selecting and interpreting cortisol measures is the need to be aware of the test-retest reliability of cortisol procedures, which vary widely, and only a few are useful for epidemiological studies. In addition, certain measurements of HPA axis tone, such as overnight and 24-h urine free cortisol, laboratory-based stress tests and measurement of the hypothalamic and pituitary hormones–corticotrophin releasing hormone (CRH) and adrenocorticotrophic hormone (ACTH), respectively–are cumbersome to measure in population-based studies. In particular, CRH and ACTH are labile and require immediate laboratory processing to avoid sample degradation and they are pulsatile, such that a one-time measurement is unlikely to reflect diurnal activity, which is best obtained by frequent blood sampling every hour or less over 24-h. Hence, their measurement is not feasible in field settings.
The purpose of this review article is two-fold. First, we will summarize the current literature on the reliability of several measures of HPA axis tone, including one-time salivary cortisol measurement, cortisol awakening response, multiple salivary cortisol samples collected from awakening to bedtime, dexamethsone-suppressed cortisol, and adrenal gland volume. Second, we will discuss the feasibility and pros and cons of performing these measures in large, population-based studies. Measures of HPA axis responsivity (e.g., ACTH and CRH stimulation test, Trier psychosocial stress test, insulin-induced hypoglycemia), which may not assess chronic cortisol burden or are too invasive to use in non-clinical field settings, will not be discussed.
A search strategy was developed for MEDLINE using PubMed, with a combination of controlled vocabulary (MeSH terms) and key word terms and phrases to depict the concept of cortisol, HPA axis, adrenal imaging, and reliability (see Appendix). We limited the strategy to Human and English-language articles published through June 2010, and excluded review articles. Search terms for salivary cortisol were included because it is currently used in population-based studies and considered to be a feasible and non-invasive measure of cortisol. We identified 3,516 articles and reviewed the titles and abstracts. Our primary criteria for article inclusion were studies that included reliability data (see below) on repeated measures of HPA axis separated by at least 24 h. Articles were excluded if they did not compare repeated measures of HPA axis function, utilized stimulation testing other than the dexamethasone-suppression test (i.e. CRH and ACTH testing, mental stress testing), were treatment studies of hypo- or hypercortisolism, assessed brain imaging, or were not revelant to the objectives of the review (n = 3,497). We also excluded articles that focused on brain imaging of the pituitary gland and other structures because brain imaging is expensive and requires significant radiation exposure for population-based studies. We did, however, include articles that assessed adrenal gland volume even though its assessment is also expensive and accompanied by radiation exposure. Because many population-based studies perform body CT and MRI scans to measure coronary artery calcium and intra-abdominal fat, assessment of adrenal gland volume can be feasibly assessed simultaneously without significant additional participant burden in field settings. We identified 19 articles that assessed repeated measures of one-time salivary cortisol measurement, cortisol awakening response, multiple salivary cortisol samples collected from awakening to bedtime, dexamethsone-suppressed cortisol, and adrenal gland volume. These articles are summarized in the results section of this review.
Assessing the reliability of a biological measure determines the extent to which results agree when measured using the same approach at different time points or when using different approaches (i.e. different observers). In epidemiological studies, we are attempting to measure the variability between study participants; however, there can also be variability within study participants due to (1) variability of the laboratory measurement method or (2) variability in participant behaviors and physiology . Intra-observer variability is due to variability in a given laboratory measurement performed on the same participant at different points in time. Inter-observer variability is due to variability in a given laboratory measurement conducted on the same sample by different technicians . In this review, we will be focused primarily on intra-observer variability/reliability. There are several methods of assessing test-retest reliability of continuous biological measures. Most studies examining the reliability of HPA axis measures have used the intraclass (ICC or R), Pearson’s linear (r), or Spearman’s ordinal (rs) correlation coefficient . ICC is a true measure of agreement that combines information on the correlation and systematic differences between readings . While the Pearson’s r is often used as the main measure of reliability, it is sensitive to the range of values as well as outlier . Spearman’s r is less influenced by outliers but neither measure accounts for systematic biases between repeated measures . Throughout the review, we will indicate which reliability measure was used by each study summarized.
Measurement of salivary cortisol has the advantage of being easy to perform in large studies in the free-living state. It has several other advantages, including being (1) non-invasive, (2) amenable to timed sample collections in the free-living state without the need for medical personnel, (3) being stable at room temperature for a least 1 week and thus can be mailed back to the investigator, (4) measuring free or the physiologically active form of cortisol, and (5) having a strong correlation to free cortisol measured in plasma and serum [8–11]. Salivary cortisol levels are also stable following repeated cycles of freezing and thawing and cortisol levels in centrifuged saliva samples may be stored at 5°C for 3 months or −20 and − 80°C for a least 1 year . Because cortisol measured in saliva is free and not bound, it is not as subject to variation by factors that affect cortisol binding globulin (CBG), the primary transport protein for serum cortisol. Concentration of plasma CBG is altered by oral contraceptive use, pregnancy, severe illness and liver disease. Thus serum cortisol measurements can be misleading under these conditions. Although it is possible to measure unbound cortisol in serum, the test is expensive and, unlike salivary cortisol, requires venipuncture .
These virtues have led investigators to consider collection of salivary cortisol samples to characterize HPA axis tone. However, there are several limitations to consider in using salivary cortisol measurements to assess HPA axis tone, including (1) contamination of the salivary cortisol collection device by over-the-counter hydrocortisone creams and ointments, salivary blood, or consumption of low pH substances (which can artificially raise cortisol levels), (2) non-compliance with the recommended sample collection time, (3) insufficient saliva collection, and (4) the effect of smoking (current smoking is associated with higher levels than non- and former smoking) [10, 13–15]. Additionally, several medications (e.g. anti-depressants; oral, nasal, topical, and ophthalmic corticosteroids; alpha, beta, and cholinergic receptor antagonists) have the potential to impact salivary cortisol levels; however, few behaviorally oriented studies have comprehensively documented medication use or focused on the impact of various medication classes on salivary cortisol levels .
Despite these limitations, salivary cortisol still represents a practical approach to assessing the HPA axis in large, population-based studies in which collection of repeated blood samples is impractical and/or infeasible. The issue that is often faced in designing epidemiological studies is how many salivary cortisol samples should be collected to properly characterize HPA axis tone. Cortisol is a pulsatile hormone and has large fluctuations in blood reflecting both stressed and non-stresses states. Does a single salivary measurement per 24 h period reflect cortisol exposure during the 24 h period or does it measure cortisol exposure at only that moment in time of collection? Does acquiring multiple samples throughout the day reflect cortisol burden over days, weeks or months or does it merely reflect cortisol burden limited to the 24 h period of sampling? This conundrum is not limited to salivary cortisol but is an unanswered question for every measurement of HPA axis tone. Indeed, while using a single measurement is appealing, prior studies have suggested that a single salivary measurement does not have adequate between visit reliability, which would limit its use in longitudinal studies attempting to characterize an individual’s chronic HPA axis tone.
As with many other hormones, cortisol has a circadian variation, characterized by peaks in the early morning hours (8 a.m.) and nadirs in the late evening [8, 13]. Often an 8 a.m. or late night (i.e. 11 p.m. or midnight) salivary cortisol sampling will be collected under the assumption that individuals with high cortisol burden will have elevated cortisol levels at both the peak and nadir of the cortisol circadian rhythm. Thus such measurements may represent options for assessing chronic HPA axis tone in population-based studies. Table 1 summarizes 3 studies that have examined the between-visit reliability of a single 8 a.m. salivary cortisol. Among 20 healthy males, the between visit reliability (R) of 8 a.m. salivary cortisol collected 1–5 weeks apart was 0.18 . In another study of 116 women without Major Depressive Disorder, the between visit reliability (rs) of 8 a.m. salivary cortisol samples collected 6 months apart was 0.41 (P < 0.001) . Similarly, the between visit reliability (R) for repeated measures of morning salivary cortisol samples in 20 healthy volunteers was 0.47, indicating poor reproducibility with 53% of the variance explained by intra-individual differences . We collected quality control data on the between-visit repeatability of one time 8 a.m. salivary cortisol on a subset of 146 male and female participants in the Atherosclerosis Risk In Communities (ARIC) Carotid MRI Study (see Table 2). Among 48 repeat visit quality control replicates with sampling separated by an average of 81.7 days, the intraclass correlation coefficient (R) was quite low, 0.27, suggesting that one time morning salivary cortisol sampling is not an appropriate surrogate marker which reflects chronic cortisol burden over weeks to months. These findings were similar to those of Coste et al.
Because the morning cortisol may be affected by several extrinsic factors related to awakening, 11 p.m. salivary cortisol might be an additional alternative for a one-time salivary cortisol collection. Based on the normal diurnal secretion of cortisol, it should reach a nadir between 2300 and 0000 . In the same study of 20 healthy volunteers discussed above, the intraclass correlation coefficient (R) for repeated measures of 11 p.m. salivary cortisol was 0.78, indicating that 22% of the variance was explained by intra-individual differences in measurements . While the reliability of the 11 p.m. salivary cortisol was better in this population than the 8 a.m. salivary cortisol, the duration of time between repeated sample measurements was likely days, so it is unclear if similar reproducibility of results would be found on sample collections separated by weeks to months. Finally, in a study of individuals with subclinical hypercortisolism (as might been seen in individuals with depressive and metabolic disorders), the 11 p.m. salivary cortisol had a lower sensitivity in detecting subclinical hypercortisolism compared to the dexamethasone-suppressed cortisol (see below) or 24-h UFC . This suggests that it might also be less sensitive in detecting subtle differences in subclinical hypercortisolism in a relatively healthy population, such as individuals enrolled in longitudinal, epidemiological studies. Thus we do not recommend the use of single samplings.
An alternative approach to collecting one-time salivary cortisol samples is to have participants collect several samples over the course of 12 h to measure cortisol exposure over a longer time period. An advantage to this approach is that it incorporates the awakening cortisol response (see below) as well as cortisol secretion throughout the day.
It is established that there is a pronounced cortisol awakening response (CAR). Both salivary and serum cortisol increases by 50–70% during the first 30 min after awakening and remains elevated for about 60 min . The CAR is thought to reflect the adrenal capacity to respond to stress and can be exploited to capture subtle differences in HPA axis tone as a function of exposure to chronic stress. While affected by other factors, such as sex (women have a greater awakening response than men) and oral contraceptive use (users have a smaller awakening response than non-users), the salivary cortisol response to awakening has a higher intra-individual stability than a single morning salivary cortisol or measurement of salivary cortisol at predefined times .
A prior study summarized data on the awakening salivary cortisol response on three populations of individuals—42 children (mean age 11.2 ± 2 years), 70 young adults (mean age 26.5 ± 6.3 years), and 40 elderly individuals (mean age 70.4 ± 5.7 years) . In the children, where the CAR area under the curve (AUC) was characterized by samples collected 0, 10, 20, and 30 min after awakening, the intra-individual correlation (r) between repeated measures of the CAR AUC separated by a day over 3 days ranged from 0.39 to 0.67 (all P < 0.05). In the elderly, where the CAR AUC was characterized by samples collected 0, 15, 30, and 60 min after awakening, the intra-individual correlation (r) for CAR AUC separated by a day was 0.58 (P < 0.05) . In the young adults, awakening salivary cortisol samples were collected at 0, 15, 30, and 60 min after awakening on 3 occasions separated by 1 week and the intra-individual correlation (r) ranged from 0.42 to 0.65 (all P < 0.05) . These data suggest that the 16–45% of the variance in the CAR AUC was explained by intra-individual differences, indicating moderate to high stability of the AUC cortisol levels across days to weeks . Similar findings were observed in a population of 42 healthy volunteers (76% women, mean age 35 years), where the CAR AUC was characterized by samples collected at 0, 15, 30, and 45 min after awakening on two consecutive days. In this study, the intra-individual correlation (r) between repeated measures of the salivary CAR AUC was 0.34–0.52 (all P < 0.05) .
For the CAR to be informative, it is important that subjects collect samples at the specified times following awakening. Electronic monitoring devices on the salivary cortisol collection devices should be used detect deviations from the protocol and maximize compliance and accuracy in the documentation of sample times.
Another multiple cortisol measurement technique is to measure cortisol through all or part of the 24 h circadian cycle. Table 1 summarizes 3 studies that have examined the between-visit reliability of measuring salivary cortisol in this manner. In the study  summarized above, in addition to collecting awakening salivary cortisol samples, the authors also collected 4 additional samples 3, 6, 9, and 12 h after awakening on two consecutive days. For this analysis the CAR was excluded and the curve was characterized using the 0, 3, 6, 9, and 12 h post-awakening samples. The intra-individual correlation (r) between repeated measures of the diurnal cortisol curve separated by one day was 0.647 (P < 0.0001) . Another study assessed the diurnal cortisol profile in a population of 50 older adults by collecting salivary cortisol samples at awakening, 30 min post-awakening, 5 p.m. and 9 p.m. on two consecutive days . They found that when the awakening sample was used as the anchor (as opposed to the wake + 30 min post-awakening sample), the between-visit reliability (rs) of the diurnal cortisol slope was 0.63 (P < 0.05) . The predicted test-retest reliability (rs) of samples collected over 2 and 3 days was 0.78 and 0.84, respectively, suggesting that at least 2 days of sample collection are necessary to adequately characterize the diurnal curve . These studies were limited to being conducted in small populations.
While collecting multiple timed saliva samples might be cumbersome to perform in a population-based study, it has already been successfully performed in 1,000 participants in the Multi-Ethnic Study of Atherosclerosis (MESA) . A potential disadvantage to having study participants collect several salivary cortisol samples, particularly if they are timed, is that compliance might be reduced ; however, in the MESA Study, participants were compliant with 6 salivary cortisol sample collections/day over 3 consecutive weekdays . In MESA, salivary cortisol samples were collected in 936 men and women at the following times–directly upon waking; 30 min after waking; 10:00 a.m.; 12:00 p.m. or before lunch, whichever was earlier; 6:00 p.m. or before dinner, whichever was earlier; and at bedtime. This daily collection protocol was repeated on each of three successive weekdays. The intra-individual correlation (r) between mean salivary values across diurnal curves over the 3 days ranged from 0.65 to 0.72 . Another advantage to collecting multiple salivary cortisol samples from awakening to bedtime is that multilevel, mixed, or hierarchical linear statistical models can be used to simultaneously account for within and between individual differences in cortisol measures .
Very few studies have evaluated the between-visit repeatability of integrated daily salivary cortisol whose measurements are separated by a longer time interval. A small study of 28 children found reproducibility in repeated cortisol measurements at different time points throughout the day as well as in cortisol area under the curve in measurements separated by a median interval of 1 year; however, the specific correlation between the repeated measures was not reported . Further studies are needed to determine the repeatability of integrated salivary cortisol over longer time intervals (i.e. weeks to months).
Is it possible to derive useful information collecting fewer cortisol measurements? Kraemer et al. suggests that 2 salivary cortisol samples collected at awakening and 9 p.m. may adequately characterize the diurnal cortisol slope . In their study of 50 adults, the slope of the awakening to 9 p.m. cortisol correlated strongly with the slope determined from all 5 sample collections (rs = 0.954; 95% CI: 0.530–0.974) . When the cortisol slope was characterized by fewer than 5 samples, those calculated including the awakening and 9 p.m. salivary cortisols (regardless of the others chosen) were more strongly correlated to the slope calculated from all samples (rs = 0.957–0.976; all P < 0.05) than those that did not include the 9 p.m. sample (rs = 0.281–0.601; P-values non-significant to P < 0.05) . These findings need to be confirmed in a larger, population-based sample of men and women of various ethnicities.
A prior study of 22 healthy adults failed to find an association between the CAR and integrated 12-h salivary cortisol measured every 15 min from 9 a.m. to 9 p.m. . A more recent study by Edwards et al., however, suggests the CAR correlates with total daily cortisol secretion. In their study of 42 healthy men and women, there was a significant positive association between the CAR area under the curve and 12 h mean diurnal cortisol on two separate days (r = 0.595 and r = 0.660; both P < 0.0001), indicating that individuals who secreted more cortisol within the first 45 min after awakening secreted more cortisol throughout the day . Their data suggest that the CAR may represent a measure that predicts cortisol levels throughout the remainder of the day; however, additional data from larger studies are needed to confirm this observation.
The dexamethasone suppression test is widely used in clinical endocrinology during the investigation of Cushing syndrome. Individuals with clinical and/or subclinical hypercortisolism fail to adequately suppress serum cortisol levels at 8 a.m. in response to 1 mg of oral dexamethasone administered at 11 p.m. the night before . In the research setting, doses of dexamethasone below 1 mg (e.g. 0.5 or 0.25 mg) have been shown to allow detection of subtle degrees of increased HPA axis tone [29, 30], as might be present in community-dwelling study participants in a non-clinical cohort. In both non-clinical and clinical cohorts, the reduction in negative feedback sensitivity is thought to reflect the influences of chronic stress on the HPA axis. We have previously shown, in 20 healthy African-American women (mean age 32 ± 8 years) without affective illness or any form of Cushing’s or Pseudo-Cushing’s Syndrome, that 8 a.m. salivary cortisol levels following administration of 0.5 mg of dexamethasone correlated strongly with daily salivary cortisol measurements collected at several time points after awakening (0800, 0845, 1030, 1600, 2000, and 2300; (rs = 0.44–0.77; all P < 0.05) . Inadequate suppression of cortisol following dexamethasone administration indicates injury to the negative feedback mechanisms; this defect increases systemic cortisol exposure resulting in greater cortisol burden. These data suggest that the dexamethasone suppression test and multiple measurements of salivary cortisol throughout the day may provide similar information regarding cortisol burden, although our findings should be confirmed in a larger cohort.
Even though the dexamethasone suppression test is a stimulation test, it is more feasible to perform in population-based studies than other stimulation tests (e.g. CRH and ACTH stimulation tests) because (1) the medication (dexamethasone) can be taken orally at home by the participant, as opposed to the IV administration of ACTH and CRH, and (2) it only involves one sample collection which can be blood or saliva (see below).
The dexamethasone-suppressed cortisol is stable over time, even with changing metabolic conditions (Table 1). Yanovski et al. studied the effects of weight loss on dexamethasone-suppressed cortisol in obese binge and nonbinge eaters who lost 16–17% of their body weight and found that this HPA axis measure was not significantly changed from baseline to the 12-weeks return visit . Thus, dexamethasone-suppressed cortisol was stable in the setting of an intervention that did not specifically target HPA axis tone. Even though personality traits and affective states are known to affect HPA axis tone, in a small study of 13 women with Borderline Personality Disorder with or without Post-Traumatic Stress Disorder, the cortisol response to dexamethasone suppression was stable over 1 year. There was also a correlation between 8 a.m. morning (r = 0.42; P = 0.076) and 4 p.m. evening (r = 0.56; P = 0.023) cortisol following dexamethasone suppression . In a large study which examined the repeatability of dexamethasone-suppressed cortisol in 164 healthy elderly individuals without endocrine or affective disorders, there was inter-person variability but intra-person stability of the response to the dexamethasone suppression test over 2.5 years . Inter-person variability is thought to reflect differences in cortisol burden across individuals. In this study, there was a strong correlation between plasma cortisol following two dexamethasone suppression tests separated by 2.5 years, which was not affected by age (rs = 0.66; P < 0.001) .
While disadvantages to using this test are the requirement for medication administration and a timed collection of serum cortisol, an advantage to its use in the field setting is that it can be performed with collection of salivary cortisol, negating the need for a blood draw. While one small study in 29 individuals found that salivary cortisol was more variable and less repeatable than serum cortisol following dexamethasone administration , they and others found that the fractional suppression of serum and salivary cortisol in response to dexamethasone were similar [34– 36]. In addition, there was a strong correlation between post-dexamethasone serum and salivary cortisol measurements [35, 36]. In a large study of 250 psychiatric inpatients undergoing a 1 mg dexamethasone suppression test, the correlation (r) between post-dexamethasone serum and salivary cortisol was 0.89 . Therefore, measuring salivary instead of serum cortisol is likely justifiable to minimize participant burden and reduce costs large population-based studies.
Chronic activation of the HPA axis increases adrenal gland volume due to the trophic effects of ACTH on the adrenal cortex. Adrenal gland volume is thought to be a stable measure over several weeks to months in a stable environment but is also responsive to changing clinical conditions. In individuals with proven ACTH-dependent Cushing’s syndrome, there is a significant correlation between adrenal gland width and plasma cortisol and 24-h urine free cortisol . There is also a significant correlation between estimated disease duration and adrenal gland width, suggesting that it is an index of chronic HPA axis tone and glucocorticoid exposure over time . Major Depressive Disorder, often accompanied by persistent activation of the HPA axis, is associated with adrenal hypertrophy, which resolves over several weeks to months following remission of depression and resolution of HPA axis hyperactivity . The dose dependent nature of this effect is evidenced by the development of adrenal atrophy observed following injury to ACTH secretion as seen in individuals with central adrenal insufficiency [39, 40].
Adrenal gland volume may thus represent an integrated, non-invasive measure of HPA axis tone. We previously showed among healthy African-American women, adrenal gland volume was strongly correlated with cortisol following dexamethasone suppression (rs = 0.66; P = 0.004), suggesting that women with higher cortisol following dexamethasone administration had increased adrenal gland volume and higher cortisol burden . Adrenal gland volume did not correlate with 24-h urine free cortisol, as shown in prior studies [41, 42].
Adrenal gland volume is assessed using CT or magnetic resonance imaging (MRI). An advantage to using either of these two radiological approaches in population-based studies is that the adrenal volume can be assessed in individuals already receiving scans for other reasons as part of the parent study. For example, since many population-based studies perform body CT scans to measure coronary artery calcium and intra-abdominal fat, assessment of adrenal gland volume can be feasibly assessed simultaneously without significant additional participant burden. The additional scan time for performing a thin-slice adrenal protocol as part of a pre-existing cardiac or abdominal CT scan is only 5–10 min. The actual adrenal volume measurements, however, described below, are labor-intensive to perform and require 30–45 min/scan committed by a radiologist or highly-skilled technician. A disadvantage to using CT is that it requires radiation exposure (approximately 320 mrem, comparable to annual background radiation exposure). Both CT and MRI are expensive.
There is a paucity of information on intra-individual repeatability or inter-reader reliability of adrenal gland volume. In a small study of 22 pheochromocytoma patients who underwent subtotal adrenalectomy, an adrenal CT scan with intravenous contrast was performed on postoperative day 4 and repeated 3 months later to assess the volume of the adrenal remnant . The between-visit repeatability (r) of the adrenal remnant volume was 0.84 (P < 0.05) . Very recently, a small pilot study of 4 individuals sought to determine the intra- and inter-observer variation and repeatability of adrenal volume measurements obtained from MRI. Adrenal MRIs were separated by 7 days in each participant. They found that between-visit variation in the adrenal volumes was small (5% of a 3-cm3 adrenal gland) and while the intra- and inter-observer variation were larger, they were similar and still small (9% of a 3 cm3 adrenal gland) .
A disadvantage to using contrast for adrenal imaging by CT is that it precludes measurement in individuals with renal disease or dye-induced allergies. Non-contrast CT is an alternative. We did not identify published data on intra-individual reliability or inter-reader reliability of adrenal gland volume from non-contrast CT; however, we set out to generate inter-reader reliability data based on data from our population of healthy women. As previously described , we performed abdominal CT scans to measure adrenal gland volume in 28 healthy African-American women ages 18–45 years using a Siemens Multidetector-Row Scanner (either 16 or 64 slice) . One hundred and twenty 1 mm slices were made through the adrenals and the contour was manually traced on each slice with a console cursor. The GE Advantage Workstation software, which is commercially available, was used to automatically calculate the adrenal volume by summing the area on each slice .
Figure 2 shows the anatomic location of the left and right adrenal glands. We hypothesized that inter-reader reliability of the left adrenal gland might be more reproducible because the right adrenal gland is harder to distinguish from surrounding structures and is often compressed against the spine between the liver and inferior vena cava, as shown in Fig. 2. We also hypothesized that women with higher body-mass index (BMI) might have more retroperitoneal fat, making the adrenal glands easier to distinguish from surrounding tissue in heavier women. We therefore, determined the inter-reader reliability of the left and right adrenal glands separately, stratified by BMI, as summarized in Table 3.
Two radiologists performed adrenal volume calculations on 28 study participants. Among overweight and obese women (BMI ≥ 25 kg/m2) the inter-reader reliability determined by two independent readers separated by 12–24 months was very good for the left adrenal gland volume (rs = 0.71; P = 0.001) but not as good for the right adrenal gland volume (rs = 0.47; P = 0.056). Among lean women with BMI < 25 kg/m2, the inter-reader reliability of left and right adrenal gland volumes were similar at rs = 0.67 (P = 0.02) and rs = 0.70 (P = 0.02), respectively. While we found similar inter-reader reliability of the left and right adrenal gland volumes in lean women, we found a discrepancy in the overweight and obese women. In addition, the small size of the adrenal glands requires significant technical skill to accurately determine volumes on multiple CT slices.
Despite the challenges associated with measuring chronic HPA axis tone, it is important to identify reliable measures for incorporation into population-based cohort studies to determine if they provide an additional explanatory mechanism for the association between chronic stress (i.e. neighborhood violence, racism, poverty) and enhanced metabolic risk . In order to expand the field of epidemiology to include assessment of HPA axis tone, it is important to recognize the balance between identifying reliable hormonal measures and identifying measures that are feasible to incorporate into large studies in the field setting.
In considering several measures of chronic HPA axis tone reviewed in this article, the total diurnal salivary cortisol curve and 11 p.m. salivary cortisol measurement have the highest between-visit reliability (r = 0.63–0.84 and 0.78, respectively). While the 11 p.m. salivary cortisol has the advantage of being easy to perform in the free-living state, collecting multiple daily samples over 2–3 days to characterize the diurnal salivary cortisol curve is more cumbersome and imposes greater participant burden. Despite these limitations, the latter has been successfully performed in the MESA Study . Future studies need to confirm whether the between visit reliabilities are similar for 11 p.m. salivary cortisol and the salivary cortisol curve generated from multiple daily salivary samples collected from awakening to midnight when visits are separated by weeks to months, as opposed to consecutive days.
Following the 11 p.m. salivary cortisol and the salivary cortisol curve generated from multiple daily salivary sample collections from awakening to midnight, the CAR and dexamethasone-suppressed cortisol had the next highest between-visit reliabilities (r = 0.33–0.67 and 0.42–0.66, respectively). The cortisol awakening response has a higher intra-individual reliability than a single 8 a.m. morning cortisol (see below); however, it is more cumbersome to perform in the population-based setting because participants require additional instruction about sample collection and accurate timing of sample collection in relation to awakening. Between visit reliability is stable over days to 3 weeks; however, additional studies are needed to determine whether the reliability is similar over many months. The dexamethasone-suppressed serum cortisol has the advantage of being one of the few measures of HPA axis tone which shows moderate intra-individual repeatability over 1–2 years, which is a longer timeframe than reliability data for other HPA axis measures. Prior studies of depression that have incorporated a dexamethasone suppression test show normalization of post-dexamethasone cortisol following clinical recovery [46, 47]. This indicates that the test can be modified as cortisol burden changes thus reflecting its usefulness as a measure to capture changes in allostatic load. We recognize that the dexamethasone-suppression/corticotrophin releasing hormone test is used in psychiatry to detect HPA axis hyperactivity and monitor response to therapy in depressive disorders [48, 49]; however, this test of HPA axis tone would add undue cost and participant burden to large epidemiological studies and is therefore, not a practical consideration in a non-clinical setting.
While a single 8 a.m. salivary cortisol is one of the easiest measures to perform, it generally has the lowest between-visit reliability (R = 0.18–0.47), whether sample collection was separated by days, weeks, or months. Thus, this likely represents the poorest measure of chronic HPA axis tone.
Based on our own data, the inter-reader reliability (rs) of adrenal gland volume from non-contrast CT ranged from 0.67 to 0.71 for the left and 0.47 to 0.70 for the right adrenal gland. Adrenal volume can be more easily determined with greater reliability on a contrast CT ; however, contrast administration may be complicated by allergic reactions and/or renal side effects. While MRI may represent an alternative assessment method that does not require radiation exposure, both CT and MRI scans are expensive to perform if not already a component of the main study. Finally, the reliability of adrenal volume measurements may be affected by inter-device differences if more than one scanner is used during a study. Despite these limitations, the inter-reader reliability of adrenal gland volume was similar to that of the salivary cortisol-based HPA axis measures.
Based on the current literature, use of sampling multiple salivary cortisol measures across the diurnal curve (with the CAR), dexamethasone-suppressed salivary cortisol, and adrenal gland volume are measures of HPA axis tone with similar between-visit reliabilities. It is notable that HPA axis measures are generally not as reproducible as other biological measures collected in observational studies (e.g. serum creatinine), which likely reflects the many biological and environmental factors that impact the HPA axis and free cortisol availability [9, 16]. Thus a note of caution is in order. Although these techniques capture cortisol burden over a certain time frame, additional studies are required to quantify this time period and to determine if the measures are surrogates for the cortisol burden that has been present for weeks or months.
A final consideration for use of HPA axis measures in population-based studies is participant burden. In larger, multi-site cohort studies in the United States, use of the dexamethasone suppression test, repeated collections of salivary cortisol, and radiological procedures (e.g. CT, MRI) requires review by the study Steering Committee as well a National Institutes of Health-appointed Observational Study Monitoring Board to assess participant burden and safety risks. However, it is feasible to incorporate these measures into the workflow of an on-going study exam. In order to advance our understanding of the biological relation between depression and stress-related disorders and metabolic outcomes, epidemiologists, endocrinologists, behavioral scientists, and laboratory scientists will need to work collaboratively in population-based research to continue to identify additional cortisol biomarkers and incorporate them into on-going studies.
This review and accompanying studies were supported by the National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health (K23 DK071565 to SHG) and by the National Institute of Alcohol Abuse and Alcoholism (RO1 AA10158 to GW).
“Salivary cortisol”[tiab] OR “Daytime cortisol”[tiab] OR “morning cortisol”[tiab] OR “Awakening cortisol”[tiab] OR “Free cortisol”[tiab] OR “diurnal cortisol”[tiab] OR “cortisol”[ti] OR “cortisol response”[ti] OR “Cortisol levels”[tiab] OR “Cortisol rhythms”[tiab] OR “Cortisol rhythm”[tiab] OR “Cortisol secretion”[tiab] OR “Dexamethasone”[ti] OR “hydrocortisone”[tiab] OR “adreno-corticotropic hormone”[tiab] OR “Saliva/chemistry”[majr] OR “Saliva/drug effects”[majr] OR “adrenocortical stress capacity”[All Fields] OR “magnetic resonance imaging”[tiab] AND (“Adrenal Gland Volume”[tiab] OR “adrenal insufficiency”[tiab] OR “Adrenocortical Insufficiency”[tiab] OR “Adrenocorticotropic hormone deficiency”[All] OR “ACTH deficiency”[tiab] OR “Adrenocortical Hyperplasia”[tiab] OR “adrenocortical activity”[tiab] OR “adrenal incidentaloma”[tiab] OR “Adrenalectomy”[tiab] OR “Hypercortisolism”[tiab] OR “Cushing’s syndrome”[tiab] OR “Hypothalamic Pituitary Adrenal Axis”[tiab] OR “Hypothalamic pituitary axis” [tiab] OR “Hypothalamic pituitary adrenocortical system”[tiab] OR “pituitary adrenal cortical axis”[tiab] OR “Hypothalamo-pituitary-adrenal axis”[tiab] OR “hypothalamus–pituitary–adrenal axis”[tiab] OR “hypothalamo-pituitary-adrenal”[tiab] OR “major depression”[ti] OR “depressive illness”[ti] OR “major depressive disorder”[ti] OR “depressed”[ti] OR “antidepressant treatment”[ti] OR “salivary cortisol”[ti] OR “reliability”[ti] OR “cortisol response”[ti]) AND (“humans”[MeSH Terms] AND English[lang] AND “adult”[MeSH Terms] AND (“1982” [PDAT] : “2010/06/30”[PDAT])) NOT review[ptyp].
Sherita Hill Golden, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA. Department of Epidemiology, Johns Hopkins University, Baltimore, MD, USA. Division of Endocrinology and Metabolism, Johns Hopkins University School of Medicine, 2024 E. Monument Street, Suite 2-600, Baltimore, MD 21287, USA.
Gary S. Wand, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.
Saurabh Malhotra, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA.
Ihab Kamel, Department of Radiology, Johns Hopkins University, Baltimore, MD, USA.
Karen Horton, Department of Radiology, Johns Hopkins University, Baltimore, MD, USA.