|Home | About | Journals | Submit | Contact Us | Français|
Recent years have witnessed an upsurge in the usage of resting-state functional magnetic resonance imaging (fMRI) to examine functional connectivity (fcMRI), both in normal and pathological populations. Despite this increasing popularity, concerns about the psychologically unconstrained nature of the “resting-state” remain. Across studies, the patterns of functional connectivity detected are remarkably consistent. However, the test–retest reliability for measures of resting state fcMRI measures has not been determined. Here, we quantify the test–retest reliability, using resting scans from 26 participants at 3 different time points. Specifically, we assessed intersession (>5 months apart), intrasession (<1 h apart), and multiscan (across all 3 scans) reliability and consistency for both region-of-interest and voxel-wise analyses. For both approaches, we observed modest to high reliability across connections, dependent upon 3 predictive factors: 1) correlation significance (significantly nonzero > nonsignificant), 2) correlation valence (positive > negative), and 3) network membership (default mode > task positive network). Short- and long-term measures of the consistency of global connectivity patterns were highly robust. Finally, hierarchical clustering solutions were highly reproducible, both across participants and sessions. Our findings provide a solid foundation for continued examination of resting state fcMRI in typical and atypical populations.
Recent years have witnessed a proliferation of fMRI studies examining resting-state functional connectivity (fcMRI) in both normal and pathological populations. This approach detects spatial patterns of temporally correlated low-frequency fluctuations in the blood oxygen level–dependent (BOLD) signal across the brain (Biswal et al. 1995). Resting-state fcMRI allows researchers to map out complex neural circuits, referred to as intrinsic connectivity networks (ICNs), with a degree of detail and specificity previously possible only in animal paradigms or meta-analyses of hundreds of studies (Margulies et al. 2007; Di Martino et al. 2008; Kahn et al. 2008). Furthermore, the ICNs observed during rest show significant overlap with task-evoked activations (Biswal et al. 1995; Greicius et al. 2003; Fox et al. 2007; Toro et al. 2008), structural connectivity (Andrews-Hanna et al. 2007; Greicius, Supekar, et al. 2008; Hagmann et al. 2008; Lowe et al. 2008) and maps of anatomical connectivity derived using retrograde tracers in macaques (Vincent et al. 2007). In light of these observations, coherent spontaneous low-frequency fluctuations in BOLD activity are increasingly recognized as an intrinsic property of brain (Buckner et al. 2008; Fox and Raichle 2007), suggesting that measures of fcMRI are inherently stable.
The remarkable spatial consistency of ICNs detected across resting-state fcMRI studies appears to corroborate such stability. The ICNs detected using both model-based (e.g., seed-based correlation analysis) and model-free approaches (e.g., independent component analysis) are highly reproducible across participants and scans (Van De Ven et al. 2004; Damoiseaux et al. 2006) and multiple resting-state conditions, including eyes open, eyes closed, or fixation (Fox et al. 2005; Fransson 2005). The spatial configurations of ICNs are also preserved across conscious states, specifically during light sedation (Greicius, Kiviniemi, et al. 2008; Horovitz et al. 2008) and during sleep (Fukunaga et al. 2006, 2008).
Although these studies indicate that the overall architecture of correlated spontaneous activity in the brain is stable, other work suggests that the strength of specific correlations between regions is dynamic. Task demands have been shown to modulate functional connectivity within ICNs (Fransson 2006; Hampson et al. 2006; Harrison, Pujol, López-Solà, et al. 2008; Kelly, Uddin, et al. 2008), and may alter the spatial configuration of negative correlations to a greater extent than that of positive correlations (Tian et al. 2007). Other studies have shown that specific interregional functional connections are modulated by factors such as current conscious (Greicius, Kiviniemi, et al. 2008; Horovitz et al. 2008), cognitive (Waites et al. 2005) and emotional state (Harrison, Pujol, Ortiz, et al. 2008). Given the unconstrained nature of the resting state, such factors should decrease the reliability of fcMRI measures for a given individual across time. Accordingly, the reliability of resting state measures, and the factors that may modulate it, need to be rigorously examined.
To our knowledge, no prior study has explicitly quantified the test–retest reliability of resting state fcMRI measures. As differences in fcMRI measures have been associated with differences between clinical groups (Castellanos et al. 2008; see Greicius 2008 for review; Greicius et al. 2007; He, Snyder, et al. 2007; Kennedy et al. 2006; Liu et al. 2008) and with interindividual differences in behavioral performance (Fox et al. 2007; Hampson et al. 2006; Kelly, Uddin, et al. 2008; Seeley et al. 2007), establishing the reliability of these measures is crucial to the continued investigation of such interindividual and group-based differences.
In the present study, we investigated the test–retest reliability of resting-state fcMRI. Specifically, we used fMRI to measure resting-state activity in a group of 26 participants at 3 different time points, in order to assess intersession (>5 months apart), intrasession (<1 h apart), and multiscan (across all 3 scans) reliability. To provide a comprehensive assessment of brain functional connectivity, we adopted several approaches. As a starting point, we specified 3 sets of regions of interest (ROIs), derived from 4 different and representative studies (Dosenbach et al. 2007; Kennedy et al. 1998; Makris et al. 1999; Toro et al. 2008). We then explored the reliability and consistency of fcMRI between ROIs within each seed set in 3 different ways. We computed the following: 1) the reliability of correlations between pairs of ROIs using intraclass correlations (ICC); 2) the consistency of entire sets of correlations, using Kendall's coefficient of concordance (Kendall's W); 3) the consistency with which hierarchical clustering partitioned ROIs into 2 of the most commonly observed ICNs in the resting state fcMRI literature, the “default mode” and the “task positive” networks. We also calculated ICC and Kendall's W on a voxelwise basis for the ICNs associated with 3 seed ROIs placed in posterior cingulate cortex (PCC), supplementary motor area (SMA), and the inferior parietal sulcus (IPS). As previous studies have suggested that the stability of fcMRI measures may vary, we also explored 3 factors that could impact reliability. These were 1) statistical significance of correlations, 2) valence of correlations (i.e., positive vs. negative correlations), and 3) network membership of regions (default mode vs. task positive network).
Twenty-six right-handed native English-speaking participants were included (11 males; mean age 20.5 ± 8.4). Participants had no history of psychiatric or neurological illness, as confirmed by a psychiatric clinical assessment. The study was approved by the institutional review boards of the New York University School of Medicine and New York University. Signed informed consent was obtained prior to participation, which was compensated.
A Siemens Allegra 3.0 Tesla scanner equipped for echoplanar imaging (EPI) was used for data acquisition. For each participant, we collected 3 resting-state scans of 197 continuous EPI functional volumes (time repetition [TR] = 2000 ms; time echo [TE] = 25 ms; flip angle = 90; 39 slices, matrix = 64 × 64, field of view [FOV] = 192 mm; acquisition voxel size = 3 × 3 × 3 mm) for each scan. Scans 2 and 3 were conducted in a single scan session, 45 min apart, and were 5–16 months (mean 11 ± 4 months) after Scan 1. Complete cerebellar coverage was not possible for all participants and only those cerebellar regions acquired in all participants were included in subsequent statistical analyses. During the scan, participants were instructed to rest with their eyes open while the word “Relax” was centrally projected in white, against a black background. For spatial normalization and localization, a high-resolution T1-weighted anatomical image was also acquired using a magnetization prepared gradient echo sequence (MP-RAGE, TR = 2500 ms; TE = 4.35 ms; inversion time (TI) = 900 ms; flip angle = 8; 176 slices, FOV = 256 mm).
Consistent with prior work in our lab (e.g., Margulies et al. 2007; Di Martino et al. 2008), data were processed using both AFNI (version AFNI_2008_07_18_1710, http://afni.nimh.nih.gov/afni) and FSL (version 3.3, www.fmrib.ox.ac.uk). Image preprocessing using AFNI consisted of 1) slice time correction for interleaved acquisitions using Fourier interpolation, 2) 3D motion correction (3D volume registration using least-squares alignment of 3 translational and 3 rotational parameters), and 3) despiking of extreme time series outliers using a continuous transformation function. Preprocessing using FSL consisted of 4) mean-based intensity normalization of all volumes by the same factor, 5) spatial smoothing (Gaussian kernel of full-width half maximum 6 mm, see below for exception), 6) temporal high-pass filtering (Gaussian-weighted least-squares straight line fitting with sigma = 100.0 s), 7) temporal low-pass filtering (Gaussian filter with half-width half maximum = 2.8 s), and 8) correction for time series autocorrelation (prewhitening). Prewhitening renders successive time points independent of one another, thus improving the validity of subsequent statistical analyses (Woolrich et al. 2001). Functional data were then transformed into MNI152 (Montreal Neurological Institute) space using a 12 degree of freedom linear affine transformation implemented in FMRIB Linear Image Registration Tool (voxel size = 2 × 2 × 2 mm). Mean time series for each ROI (selection described below) were extracted from this standardized functional volume by averaging over all voxels within the region. To ensure that each time series represented regionally specific neural activity, in each analysis, the mean time series of each ROI was orthogonalized with respect to 9 nuisance signals (global signal, white matter, cerebrospinal fluid, and 6 motion parameters). In previous studies (e.g., Margulies et al. 2007; Di Martino et al. 2008), seed time series were orthogonalized with respect to one another, in addition to the 9 nuisance signals. This was necessary because the aim of those studies was to examine functional differentiation within specific brain regions such as the anterior cingulate cortex. Orthogonalization removes signals common to all the seeds, thus permitting the detection of fcMRI unique to each seed included in the model. In contrast, in the present study, our aim was to examine condition-related (i.e., time- or scan-related) differences in fcMRI. In line with other studies from our group that examined group differences in fcMRI (e.g., Castellanos et al. 2008; Kelly, Di Martino, et al. 2008), we have not orthogonalized the seed time series with respect to one another. This is because in the context of the examination of interindividual, group- or condition-related differences, removal of signals common to the seeds (through orthogonalization) can be hazardous, because the nature or degree of the signal removed can differ between groups or conditions, introducing a confound.
In view of the possible influence of ROI selection on functional connectivity, we adopted 3 different seed sets based on previously published studies (see Table S1 for all ROI coordinates). In separate analyses, we assessed the reliability of connections between seeds of each set.
The 3 sets used were as follows:
Although preprocessing was identical for analyses using Sets A and B, spatial smoothing differed for Set C. More specifically, ROIs in Sets A and B were extracted from spatially smoothed data, whereas ROIs in Set C were extracted from nonspatially smoothed data in line with previous studies (Salvador et al. 2005; Achard et al. 2006; Liu et al. 2008).
Subsequent to time series extraction, functional connectivity analyses were carried out in the R statistical environment (version 2.7.0, http://www.r-project.org). For each seed set, Pearson correlation coefficients were calculated for each pair of regions, for each subject and each scan. The resulting correlation coefficients were either Fisher z-transformed for subsequent calculation of ICC, or were transformed into a distance measure (1 - r), for use in subsequent consistency (Kendall's W) and clustering analyses.
To assess the significance of the correlation between each pair of regions in each seed set, we carried out a one-sample t-test on the z-transformed correlation coefficients for the 26 participants. Significance was defined as a 2-sided P-value of 0.05, which was adjusted for multiple comparisons using a Bonferroni correction (741 correlations for Set A, 378 for Set B, and 6216 for Set C). This t-test determined the group-level significance of each correlation (i.e., whether or not the correlation differed significantly from zero).
To derive a group-level functional connectivity matrix, every z-transformed correlation was averaged across subjects, for each seed set and for each scan. The resulting matrix of mean z-transformed correlation values was then reverse transformed to produce a matrix of group-mean r-values (Corey et al. 1998).
For each participant, we performed a multiple regression analysis (as implemented in the FSL program FEAT [version 3.3. www.fmrib.ox.ac.uk]) to identify those voxels positively and negatively correlated with each of 3 seed ROIs. The seed ROIs were selected from seed Set C (Toro et al. 2008): the PCC (MNI coordinates: −6 −58 28), SMA (−2 10 48), and IPS (26 −58 48). These 3 ROIs were selected because they represent core components of the commonly identified default mode and task positive networks. The time series data were preprocessed as outlined above, and the seed ROI time series were orthogonalized with respect to the same 9 nuisance signals (global signal, white matter, cerebrospinal fluid, and 6 motion parameters). For a more complete description of our methods for determining voxelwise connectivity, see Margulies et al. (2007) and Di Martino et al. (2008).
Group-level analyses were carried out using a mixed-effects model (as implemented in the FSL program FLAME). Corrections for multiple comparisons were carried out at the cluster level using Gaussian random field theory (min Z > 2.3; cluster significance: P < 0.05, corrected). This group-level analysis produced thresholded Z-score maps (“networks”) of positive and negative functional connectivity for each seed ROI. Group-level maps were calculated for each scan (scans 1, 2 and 3). We also calculated group-level maps of intersession, intrasession, and multiscan functional connectivity. To do this, we carried out a fixed-effects analysis for each participant, which combined scans 1 and 2 (intersession fcMRI), scans 2 and 3 (intrasession fcMRI) and scans 1, 2 and 3 (multiscan fcMRI). For all our analyses, we defined intersession reliability as the comparison between scans 1 and 2, rather than scans 1 and 3, because scans 1 and 2 both represent the first resting-state scan of their respective scan sessions. Subsequent to this subject-level fixed-effects analysis, a standard mixed-effects model was employed to derive the thresholded Z-score maps for each of the combined analyses (i.e., intersession, intrasession, and multiscan functional connectivity).
To investigate the reliability of each functional connection, we calculated ICCs, a common measure of test–retest reliability (Shrout and Fleiss 1979). For each correlation, three 26 × n matrices were created, representing the z-transformed correlation values for 26 participants and n scans. Here n can represent scans 1 and 2 (intersession or long-term reliability), or scans 2 and 3 (intrasession or short-term reliability), or all 3 scans (multiscan reliability). Using a one-way ANOVA applied to each of the 3 possibilities for n, we obtained the between-subject mean square (MSb) and within-subject mean square (MSw) for each correlation. ICC values were subsequently calculated according to the following equation where k is the number of observations per participant (Shrout and Fleiss 1979):
Given the substantial differences in time between scans, we compared intersession (>5 months apart) and intrasession (<1 h apart) ICC. We also examined the effect of the following factors on the multiscan reliability of fcMRI. 1) Statistical significance: correlations determined to be significant at the group level (see Functional Connectivity: ROI Analyses, above) were compared with those that failed to reach significance. 2) Valence: significant positive correlations were compared with that of significant negative correlations. 3) Network membership: from seed Set B (Toro et al. 2008), we compared correlations for connections within the default mode network, correlations for connections within the task-positive network, and correlations for connections between the 2.
To examine the stability for sets of correlation patterns as opposed to individual correlations, we adopted a second approach. We used Kendall's coefficient of concordance (W) to quantify the consistency of all possible correlations in each seed set in 2 ways: 1) intraindividual (i.e., within subjects across scans) and 2) interindividual (i.e., within scans across subjects) (Kendall and Smith 1939; Kendall and Gibbons 1990). Kendall's W is typically used to assess agreement among raters based on rank order of ratings, and ranges from 0 (no agreement) to 1 (complete agreement). Here, it reflects the consistency or agreement in the rank order of correlations across participants and scans. In the context of fcMRI, Kendall's W has previously been used to compare the consistency of time series within an individual (“regional homogeneity”; Zang et al. 2004). Kendall's W was calculated as follows (where k = number of scans or number of participants, n = number of possible connections, Ri is the sum rank of the ith connection, is the mean of Ri‘s):
We applied permutation tests to assess the significance of the resulting Kendall's W values (see Supporting Information; Legendre and Lapointe 2004; Mielke and Berry 2007). Taking all pairwise correlations from each seed set, we examined the significance of 1) interindividual consistency (i.e., comparing the consistency within scans across subjects to chance), and 2) intraindividual consistency (i.e., comparing the consistency of a given participant's 3 scans to the consistency of 3 scans selected randomly from 3 different participants and always comprising one of each scans 1, 2, and 3).
As with ICC, we wanted to assess the substantial differences in time between scans and compared intersession (>5 months apart) and intrasession (<1 h apart) Kendall's W. We also examined the effect of the following factors on the multiscan consistency of fcMRI: statistical significance, valence, and network membership (see Reliability: ROI Analyses, above).
We tested the reproducibility of the default mode and task networks, as well as the reliability and consistency of correlations within and between these networks. ROIs for these 2 networks were derived from seed Set B (Toro et al. 2008). We used hierarchical clustering and compared the 2-cluster solutions for each participant at each scan session. For each scan and each participant, we 1) applied hierarchical clustering in a manner similar to previous fcMRI studies (Cordes et al. 2002; Salvador et al. 2005; Dosenbach et al. 2007) using average linkage to each 28 × 28 matrix of distances (1 - r) representing all pairwise correlations for seed Set B (Toro et al. 2008) and 2) identified a 2-cluster solution. We then explored the similarity of cluster membership across participants and sessions. For each region, and for each scan, we recorded the proportion of participants for whom that region was assigned to the same cluster as in Toro et al (“percent agreement”).
We calculated the reliability of individual connections within and between these 2 networks using ICC, and the consistency of correlation patterns within and between the networks using Kendall's W.
To assess the reliability of the 3 voxelwise analyses (for the PCC, SMA, and IPS), we calculated the ICC for each voxel, using the same method as for the ROI analyses. We calculated the between-subject mean square (MSb) and within-subject mean square (MSw) for each voxel's parameter estimate (the output of the multiple regression analysis conducted to assess functional connectivity), reflecting that voxel's connectivity with the seed ROI. We then calculated the ICC on a voxelwise basis. As for the ROI analyses, we computed the intersession, intrasession and multiscan ICC for each network (i.e., the pattern of functional connectivity associated with the PCC, SMA, and IPS seeds), and compared inter- and intrasession ICC for each network's positive, negative and nonsignificant correlations using the Wilcoxon signed rank test.
To quantify the consistency of voxelwise correlation patterns at the individual level, we calculated the inter-, intra- and multiscan Kendall's W for each seed across scans as well as between subjects in an identical manner to the ROI analysis (see Consistency of Correlation Sets). We also directly compared the intra- and intersession concordance for individual subjects across scans using the Wilcoxon signed rank test.
Given our previous interest in the magnitude of the negative correlation between the cingulo-parietal or default mode network and the fronto-parietal or task positive network (Kelly, Uddin, et al. 2008), we also quantified the test–retest reliability of that anticorrelation. To do this, we extracted the mean time series for the default mode and task positive networks, using the group-level maps (for the combined [multiscan] analysis) of positive and negative connectivity for the PCC seed as masks. The mean time series were then orthogonalized with respect to the 9 nuisance covariates, using the same Gram–Schmidt process employed prior to conducting the voxelwise multiple regression. Finally, for each participant we quantified the strength of the negative relationship between the default mode and task positive time series across participants using the Pearson correlation coefficient. We computed the intersession, intrasession, and multiscan ICC of the anticorrelation in the same manner as described above.
In order to provide a comprehensive assessment of fcMRI across the brain, we quantified the reliability and consistency of correlations between ROIs within 3 different seed sets (Sets A, B, and C) that were derived from 4 previously published studies (Kennedy et al. 1998; Makris et al. 1999; Dosenbach et al. 2007; Toro et al. 2008) (see Table S1 for all ROI coordinates).
To investigate the reliability of fcMRI between pairs of regions, we calculated the ICC, a standard measure of test–retest reliability, for all possible z-transformed correlation coefficients, separately for each seed set (Shrout and Fleiss 1979). The ICC is a ratio of within-subject variability to between-subject variability. Thus, for a functional connection to be reliable, within-subject variability of r-values (i.e., across scans) must be low relative to between-subject variability of r-values (i.e., across participants). We calculated ICCs for correlations taken from scans 1 and 2 (intersession reliability), scans 2 and 3 (intrasession reliability), or scans 1, 2, and 3 (multiscan reliability).
Intersession (long-term; scans 1 and 2) and intrasession (short-term; scans 2 and 3) test–retest reliability were highly similar across the 3 seed sets (Table 1, Fig. S1a), though intrasession ICCs were higher on average than intersession ICCs (Fig. 1a).
The multiscan ICC measures reliability across all 3 scanning sessions. By pooling all 3 scans, multiscan ICC provides a more precise and stable estimate of reliability (Fig. S1a). Multiscan ICCs for all correlations within a seed set were similar for each of the 3 seed sets (Table 1, Fig. S1a). Within each seed set, multiscan ICC values for specific correlations were variable, ranging from effectively zero to moderate/high reliability (maximum ICC: Set A = 0.67; Set B = 0.69; Set C = 0.76). Table 2 displays the statistically significant correlations (i.e., those significant at the group level for each of the 3 scans, see Functional Connectivity: ROI Analyses of Materials and Methods) exhibiting multiscan ICC values larger than 0.5 for seed sets A and B and larger than 0.60 for seed set C.
Significant versus nonsignificant connections. As most fcMRI studies focus their analyses on statistically significant correlations, we compared the ICCs of both significant and nonsignificant correlations (Fig. 2a). A Wilcoxon rank-sum test demonstrated that intersession, intrasession, and multiscan ICCs for significant correlations were significantly greater than for nonsignificant correlations (Table 1) for all 3 seed sets (P < 0.0001 for all comparisons).
Positive versus negative correlations. Consistent with previous research suggesting greater variability of negative fcMRI correlations (Tian et al. 2007; Skudlarski et al. 2008), we found that positive correlations were more reliable than negative correlations (Fig. 2b). Restricting our analysis to only significant correlations, a Wilcoxon rank-sum test demonstrated that positive correlations were significantly more reliable than negative correlations for intersession, intrasession, and multiscan comparisons across all 3 seed sets (see Table 1; P < 0.001 for all comparisons).
Magnitude of correlations. Figure 3 plots the mean group-level correlation (i.e., the group-level correlation, averaged across all 3 scans) against the corresponding multiscan ICC (see Fig. S2 for similar inter- and intrasession plots). Spline-based nonparametric regression fits, shown in the figures, revealed a trend towards increasing ICC for increasing magnitudes of correlation values, especially for positive correlations. Approximate Wald tests of these nonparametric regression models (Wood 2006) confirmed the significance of the nonlinear relationships between correlation and intersession, intrasession, and multiscan ICC for all 3 seed sets (P < 0.0001 for all comparisons).
Intersession versus intrasession ICC. The difference between inter- and intrasession ICCs (Table 1) was significant (Wilcoxon signed rank test; P < 0.001) for 2 of the 3 seed sets (Set A and Set C). This was the case for all of the comparisons we examined, except for negative significant correlations (i.e., intrasession ICCs were significantly larger for all correlations combined, and for significant, nonsignificant, and positive significant correlations). For Set B, only nonsignificant correlations exhibited a significantly higher intrasession ICC (Wilcoxon signed rank test; P < 0.001).
We used the ICC to quantify the reliability of specific connections. However, functional connections may be best considered not in isolation but rather as part of a general pattern of connectivity. Thus, we measured the concordance of sets of correlations within and between subjects using Kendall's coefficient of concordance (W). Kendall's W reflects the consistency or agreement in the rank order of correlations across subjects or across scans, and ranges from 0 (no agreement) to 1 (complete agreement). We assessed intersession (scans 1 and 2), intrasession (scans 2 and 3), and multiscan (scans 1, 2, and 3) consistency in terms of 1) intraindividual consistency (i.e., concordance of sets of correlations within subjects across scans) and 2) intraindividual consistency (i.e., concordance of sets of correlations within scans across subjects).
Within subjects (i.e., intraindividual), the consistency of each seed set across intersession scans 1 and 2 and intrasession scans 2 and 3 ranged from moderate to high (Table 3, Fig. 1a, Fig. S1b, see Fig. S3 for 2 representative participants). The differences in intra- and intersession consistency for all correlations were not significant for any of the 3 seed sets (Wilcoxon signed rank test), following Bonferroni correction for multiple comparisons (i.e., adjusted for 5 comparisons, P < 0.01). Between subjects (i.e., interindividual), the consistency of each seed set for scans 1, 2, and 3 were highly similar (Table 4).
Intraindividual consistency ranged from moderate to high (Table 3, Fig. S1b), whereas interindividual consistency for each seed set was lower. Permutation tests indicated that these levels of consistency were highly significant (intra- and interindividual consistency, for all sets, P < 0.0001).
Significant versus nonsignificant connections. Comparing intraindividual consistency of sets of statistically significant or nonsignificant correlations (Table 3), we found that significant correlations were significantly more reliable than nonsignificant correlations for all 3 seed sets (Wilcoxon signed rank test; P < 0.0001 for all sets; Fig. 4a).
Interindividual consistency for sets of significant correlations was moderate and was larger than the low consistency found for nonsignificant correlations for each scan and each seed set (Table 4, Fig. 4b).
Positive and negative connections. Restricting our analysis to significant correlations, we examined differences in consistency between positive and negative correlations (Table 3). Within subjects, we found that positive correlations were significantly more reliable than negative correlations for all 3 seed sets (Wilcoxon signed rank test; P < 0.0001 for all sets; Fig. 4a).
Intersession versus intrasession consistency. For seed Set A, intrasession consistency (within-subjects) was higher than intersession consistency for all connections, and for significant, nonsignificant and positive significant connections (Table 3). However, this difference was significant (Wilcoxon signed rank test; P < 0.01, adjusted for 5 comparisons) only for the comparison of positive significant connections. For seed sets B and C there were no differences in intra- and intersession consistency.
Group-level consistency. We also assessed the concordance of sets of correlations at the group level. Group-level correlation matrices were generated by averaging all possible z-transformed correlations across participants, for each seed set and each scan. These group-average z-transformed correlations were then reverse-transformed to obtain group-average r-values. Sets of group-level correlations exhibited high inter- and intrasession concordance (Intersession Kendall's W; Set A = 0.94; Set B = 0.98; Set C = 0.97; Intrasession Kendall's W; Set A = 0.92; Set B = 0.96; Set C = 0.97) as well as high multiscan concordance (Kendall's W: Set A = 0.91; Set B = 0.96; Set C = 0.96, see Fig. 5).
We tested the reproducibility of the default mode and task positive networks, 2 of the most commonly examined networks in the resting-state fcMRI literature. We also examined the reliability and consistency of correlations within and between these networks. ROIs for these 2 networks were derived from seed Set B, a task-based meta-analysis (Toro et al. 2008). In order to test the reproducibility of these functionally distinct networks, we used hierarchical clustering and compared the 2-cluster solutions that arose for each participant at each scan session.
Across all 3 scan sessions, the 2 clusters elicited through hierarchical clustering of each participant's correlation matrix were consistent with the fronto-parietal (task positive) and cingulo-parietal (default mode) clusters observed by Toro et al. (2008) (see Fig. S6). To quantify the consistency of a region's membership in a network, we recorded the proportion of participants for whom that region was assigned to the same cluster as in Toro et al. (2008) for each scan (“percent agreement”). We observed high degrees of membership agreement in both the task positive and the default mode networks (Table 5). Only one region was not consistently classified into either cluster: the right dorsolateral prefrontal cortex (DLPFC) seed (mean agreement across 3 scans = 55%).
We examined the relationship between a region's degree of connectivity (i.e., the number of significant correlations exhibited by a region, averaged across the 3 scans, see Table S2) and its mean network membership consistency (i.e., percent agreement, averaged across 3 scans, see Table 5). The degree of connectivity and consistency of network membership were strongly related (r = 0.78, P < 0.0001; see Fig. 6).
We also examined the reliability and consistency of significant correlations for connections within and between the 2 networks (Fig. 7). We examined connections 1) within the task positive network, 2) within the default mode network, or 3) between members of the task positive network and members of the default mode network. First, to assess reliability, we compared the multiscan ICCs for connections within the task positive network (mean multiscan ICC = 0.25 ± 0.18), within the default mode network (mean multiscan ICC = 0.32 ± 0.16), and for between-network connections (mean multiscan ICC = 0.19 ± 0.16; Fig. 7a). A Wilcoxon rank-sum test demonstrated that connections within the default mode network were significantly more reliable than connections within the task positive network or between the 2 networks (P < 0.0001 for both comparisons). Second, to assess consistency within and between subjects, we compared Kendall's W for connections within the task positive network (mean Kendall's W; within-subject = 0.61 ± 0.10; between-subject = 0.23 ± 0.01), within the default mode network (mean Kendall's W; within-subject = 0.67 ± 0.08; between-subject = 0.18 ± 0.01), and between the 2 networks (mean Kendall's W; within-subject = 0.49 ± 0.09; between-subject = 0.07 ± 0.02; Fig. 7b). A Wilcoxon signed rank test demonstrated that within-subject connections within the default mode network were significantly more consistent than connections within the task positive network (P < 0.05), and connections within either the default mode network or task positive network were significantly more consistent than connections between the 2 networks (P < 0.0001).
Finally, we assessed the consistency of cluster solutions computed on the basis of the group-level correlation matrices. The 2 clusters derived from hierarchical clustering of group-level correlation matrices of seed Set B were virtually identical to the cingulo-parietal (default mode) and fronto-parietal (task positive) clusters observed by Toro et al. (2008) (see Fig. S6). Indeed, across all 3 scans, all regions were consistently assigned to the appropriate cluster except for the DLPFC region that demonstrated inconsistency in the subject-level analysis. During scans 1 and 2, the DLPFC ROI was assigned to the cingulo-parietal network, whereas in the Toro et al. (2008) analyses and in scan 3, it was classified as a member of the fronto-parietal network.
We performed voxelwise multiple regression analyses to identify the networks of voxels positively and negatively correlated with each of 3 seeds selected from seed Set B (Toro et al. 2008): the PCC (−6 −58 28), SMA (−2 10 48), and IPS (26 −58 48). These 3 ROIs were selected because they represent core components of the commonly identified default mode and task positive networks and had the largest number of significant correlations with other regions within their respective networks (i.e., they were “hubs,” see Table S2).
Across scans, there was considerable overlap in the group-level Z statistic maps of positive and negative connectivity for each seed (Fig. 8d). For each network, voxelwise comparisons of regression coefficient Z statistics across scans (i.e., Scan 1 vs. Scan 2 and Scan 2 vs. Scan 3) also revealed a significantly high positive correlation (Fig. 10). The high degree of cross-scan stability in the patterns of positive and negative connectivity associated with each seed (i.e., Z statistic maps) is also evident even at the individual level (see Figs S4 and S5, respectively, for 2 representative participants).
Table 7 lists the top 12 peaks of connectivity for the positive and negative networks associated with each seed, and the corresponding Z statistics and mean and maximum ICC (computed for a 10-mm-diameter sphere centered on the corresponding peak voxel). As Figure 8 shows, the group-level network for each seed (i.e., the pattern of functional connectivity associated with the PCC, SMA, and IPS seeds) demonstrated a substantial degree of test–retest reliability, as reflected in the large proportion of suprathreshold (Z > 2.3) voxels yielding ICC > 0.5 (see Table 7).
Figure 9 demonstrates that the proportion of suprathreshold voxels with ICC > 0.5 increases with increasing group-level Z statistic (i.e., for higher thresholds). Though inter- and intrasession reliability were significantly positively correlated (Fig. 11a), intrasession reliability was significantly greater than intersession reliability, for positive, negative and nonsignificant correlations (Wilcoxon signed rank test; P < 0.0001)
We calculated the intersession, intrasession, and multiscan Kendall's W, for each participant (Tables 8 and and9,9, Fig. 12). As in the ROI-based analysis, consistency of voxelwise fcMRI was assessed in terms of 1) intraindividual consistency (i.e., concordance of sets of correlations within subjects across scans) and 2) intraindividual consistency (i.e., concordance of sets of correlations within scans across subjects).
The intraindividual consistency of voxelwise correlations across intersession scans 1 and 2 and intrasession scans 2 and 3 ranged from moderate to high (>0.45). The difference between intra- and intersession consistency for all correlations were not significant (Wilcoxon signed rank test) for any of the 3 networks we examined, following correction for 5 comparisons (i.e., P < 0.01).
Significant versus nonsignificant connections. Comparing intraindividual consistency of sets of statistically significant or nonsignificant correlations (Table 8), we found that significant correlations were significantly more reliable than nonsignificant correlations for all 3 seed sets (Wilcoxon signed rank test; P < 0.0001 for all seeds; Fig. 12a). Interindividual consistency for sets of significant correlations was moderate and was larger than the low consistency found for nonsignificant correlations (Table 9, Fig. 12b).
Positive and negative connections. Restricting our analysis to significant correlations, we examined differences in consistency between positive and negative correlations (Table 8). Within subjects, we found that positive correlations were significantly more reliable than negative correlations for all 3 seed regions (Wilcoxon signed rank test; P < 0.0001 for all sets; Fig. 12a). Between subjects, consistency for sets of positive correlations was low, as was consistency for sets of negative correlations (Table 9, Fig. 12b).
Intersession versus intrasession consistency. For all 3 seed ROIs, there were no significant differences between inter- and intrasession consistency (intraindividual) for all connections, and for significant, nonsignificant and positive significant connections (Wilcoxon signed rank test, P > 0.05; Table 8).
Group-level consistency. We also assessed the consistency of network correlations for the group-level correlation map associated with each seed ROI. Group-level correlations exhibited high inter- and intrasession concordance (intersession Kendall's W PCC: 0.95, IPS Right: 0.92, SMA: 0.93; intrasession Kendall's W PCC: 0.95, IPS Right: 0.93, SMA: 0.92) as well as high multiscan concordance (Kendall's W PCC: 0.94, IPS Right: 0.90, SMA: 0.90, see Fig. 10).
We quantified the test–retest reliability of the anticorrelation (i.e., negative correlation) between the default mode and task positive networks. These networks were defined, respectively, as those voxels exhibiting significant (group-level) positive (corresponding to the default mode network) and negative (the task positive network) correlations with the PCC in the combined (multiscan) analysis (depicted in Fig. 8c). Though long-term intersession reliability was low (ICC = 0.21), intra- and multiscan reliability of this anticorrelation was moderate (ICC > 0.4). Furthermore, the reliability of the anticorrelation increased with increasing Z statistic threshold values (though intersession reliability declined again after Z = 6, Fig. 13).
In the present study, we examined the test–retest reliability of measures of resting-state fcMRI, within a single scan session (short-term/intrasession), across 2 scan sessions separated by at least 5 months (long-term/intrasession), and across all 3 scans (multiscan). Using several methods to quantify reliability, and using both seed-ROI–based and voxel-wise analytic approaches to quantify fcMRI, we observed that the test–retest reliability of resting-state fcMRI ranged from minimal to robust, depending on at least 3 factors. These include 1) statistical significance: significant correlations (i.e., at the group level) for a given scan exhibited greater test–retest reliability than those that were nonsignificant; 2) valence: significant positive correlations exhibited greater reliability than significant negative correlations; and 3) network membership: regions within the default mode network were more reliably correlated with one another than were regions within the task positive network. These findings provide an initial quantitative basis for continued use of resting-state fcMRI to identify the neural substrates of interindividual differences in behavioral traits or psychopathology, as a result of experimental manipulations (e.g., task, state, or pharmacological), or development.
Correlations that were statistically significant across participants for a given scan session (i.e., at the group level) exhibited the highest degree of test–retest reliability (ICC > 0.5). This was true for both the ROI-based and voxelwise analyses. Figure 9 demonstrates this most clearly: for the PCC and SMA seeds, over 50% of voxels that exhibited positive connectivity with the seed ROI at Z statistic thresholds greater than Z = 5 also demonstrated ICCs > 0.5. That percentage was even greater for higher thresholds and for intra- and multiscan reliability.
That the regions exhibiting statistically significant correlations also exhibit the highest degree of test–retest reliability should not be surprising. Nevertheless, this result bolsters the emerging overarching notion that measures of fcMRI reflect fundamental organizational properties of the brain. The correlations that were statistically significant and highly reliable in the present study were those typically observed to be coactive during task-based studies, or part of the same ICN in other resting state fcMRI studies. For example, we observed highly reliable (ICC > 0.5) correlations between regions of lateral PFC (e.g., frontal eye fields, DLPFC) and regions of the inferior parietal lobe (IPL) (see Table 2). Lateral frontal and lateral parietal cortices have been observed to be coactive in hundreds of task-based studies (Toro et al. 2008), and are commonly identified as part of the task positive network, observed in resting state fcMRI studies (Damoiseaux et al. 2006; Fox et al. 2005; Fransson 2005; Van Den Heuvel et al. 2008). Similarly, Figure 8 demonstrates high levels of ICC across multiple core regions (e.g., medial prefrontal cortex, medial temporal lobe, posterior cingulate, and lateral temporoparietal cortex) of the default mode network. Statistical significance of an fcMRI measure at the group level is interpreted as reflecting a meaningful functional relationship between regions (Friston 1994). The present results support such an interpretation by demonstrating that such significant correlations are also reliable across time.
Consistent with the suggestion that negative correlations may exhibit lower stability than positive correlations (Tian et al. 2007), we observed lower test–retest reliability for negative correlations in both region-based and voxelwise analyses. Negative correlations have been noted to occur between networks that appear to be functionally distinct (Fox et al. 2005; Fransson 2005; Kelly, Uddin, et al. 2008). The lower test–retest reliability exhibited by negative correlations suggests that relationships between functional networks are more dynamic than positive correlations within networks. Indeed, our results indicate that correlations between regions within either the default mode or task positive networks (within-network correlations) were significantly more reliable than the cross-network negative correlations (Fig. 7).
It is important to acknowledge that the ability to detect negative correlations is influenced by correction for the global BOLD signal, a common step in resting fcMRI studies. Although some studies question the use of global signal correction (Aguirre et al. 1998; Birn et al. 2008; Murphy et al., forthcoming) and the validity of negative correlations (Skudlarski et al. 2008), others studies suggest global signal correction as a reasonable alternative to direct measurement and subsequent removal of physiological cardiac and respiratory signals (Fox et al. 2005; Birn et al. 2006; Hampson et al. 2006). Two previous resting fcMRI studies found a reduction in the strength of negative, but not positive, correlations when global normalization was not performed, although the spatial pattern of both positive and negative correlations was retained with and without global normalization (Fransson 2005; Uddin et al. 2008).
In our study, negative correlations between ROIs and anticorrelations between networks exhibited several properties that would encourage confidence in their reliability. First, several significant negative correlations (e.g., between PCC and PFC, see Tables 2 and and7;7; between PCC and PFC and anterior cingulate cortex (ACC), Fig. 8) exhibited high reliability (ICC > 0.6). Second, when we plotted the ICC of the voxelwise anticorrelation between the default mode and task positive networks against a range of threshold levels (Fig. 13), we observed that the ICC of the anticorrelation generally increased with increasing threshold values. This suggests that reliability is highest for anticorrelations between voxels that are most strongly positively and negatively correlated with the PCC. Although further work aimed at understanding the impact of global signal correction on fcMRI measures, and the underlying neurophysiological basis of negative correlations is clearly warranted, our findings support further examinations of interindividual differences in negative functional connectivity.
Of note, we found that the test–retest reliability of a voxel's negative connection to the PCC (a core component of the default mode network) is directly related to the reliability of the same voxel's positive connection with the SMA (a core node of the task positive network) (see Fig. S7). This finding suggests that the more reliably a voxel is a member of one network, the more reliably it is segregated from another (as indicated by the anticorrelation). In other words, the test–retest reliability of an anticorrelation is dependent on the reliability of positive connections within each of the 2 relevant networks. Recent work by our lab suggests that this observation is indicative of a more general property of ICNs and their anticorrelations. In a recent resting state study of dorsal ACC (dACC) functional connectivity in an adult Attention-Deficit Hyperactivity Disorder (ADHD) sample, the same precuneus region that exhibited ADHD-related decreases in negative connectivity with the dACC also exhibited decreases in positive connectivity with the ventromedial PFC, which is part of the same network as the precuneus (i.e., the default mode network). Thus a reduction in the integrity of the negative relationship between a default mode subregion (i.e., the precuneus) and its anticorrelated “task positive” network may be accompanied by decreases in its positive connectivity with other default mode components. Hence, we believe future work will continue to benefit from consideration of both positive and negative connectivity associated with a given region of interest.
We observed an impressively high degree of cross-session consistency for larger-scale patterns of correlations observed across individuals. Specifically, multiscan analyses using Kendall's W demonstrated that concordance was highest when both positive and negative correlations for each subject were compared across scans, rather than cross-scan comparisons that were limited to positive correlations. This was true for both the ROI-based (Tables 3 and and4;4; Fig. 4) and voxelwise analyses (Tables 8 and and9,9, Fig. 12). This observation further supports the value of taking into account patterns of negative functional connectivity in the brain, rather than limiting the scope of analyses to positive correlations.
Overall, the high consistency for the large-scale pattern of correlations suggests that, rather than focusing on a limited number of specific inter-regional connections, a promising avenue for future research using ROI-based approaches is the examination of interindividual differences in the broader pattern of functional connections (i.e., the entire correlation matrix).
Application of hierarchical clustering for network identification across the 3 scans provided direct evidence of the utility of examining the default mode (“task-negative”) network and its negatively correlated “task positive” counterpart, 2 of the most commonly studied networks in the current resting state literature (Fox and Raichle 2007). Group-level clustering analyses revealed near-identical assignment of regions to one or the other of these networks across sessions (see Fig. S6). Moreover, the pattern of assignment observed was identical (with only one exception, discussed below) to that revealed by the largest meta-analysis of patterns of task-evoked coactivation to date (Toro et al. 2008). Finally, when clustering analyses were performed for each participant individually, rather than at the group level, a similarly high degree of agreement was noted both across participants and across sessions (see Table 5).
When we examined the specific regions that were assigned to one network or the other with the greatest consistency (over 90%), we found that they correspond to “hubs” (i.e., those regions with the greatest number of significant correlations, see Fig. 6 and Table S2). Some of these same regions have previously been identified as hubs in other resting-state fcMRI studies (Achard et al. 2006; Fransson and Marrelec 2008; Hagmann et al. 2008), and form key components of regulatory systems in human brain (Dosenbach et al. 2006; Achard and Bullmore 2007; Fair et al. 2007; Sridharan et al. 2008). Furthermore, regions of the lateral and medial parietal lobes were recently identified as structural hubs on the basis of their structural connectivity profile, as identified with diffusion spectrum imaging (Hagmann et al. 2007; Hagmann et al. 2008), diffusion tensor imaging (Gong et al. 2008), and cortical thickness measures (He, Chen, et al. 2007; Chen et al. 2008). As we have already noted, these hubs correspond to those regions typically observed to be coactive during task-based studies (Toro et al. 2008), or as part of the same ICN in resting state fcMRI studies (Fox et al. 2005; Fransson 2005; Damoiseaux et al. 2006; Van Den Heuvel et al. 2008). The robustness with which these areas appear connected across multiple studies, and their high test–retest reliability, highlight their particular importance in the quest to identify interindividual or group differences in functional connectivity. Consistent with this suggestion, several recent studies have observed interindividual and group differences in connectivity in regions such as the precuneus and posterior cingulate cortex (Andrews-Hanna et al. 2007; Castellanos et al. 2008; Kelly, Di Martino, et al. 2008).
Correlations between regions within the task-negative (default mode) network exhibited significantly greater reliability than correlations between regions within the task positive network. This finding is consistent with the observation that the strongest spontaneous low-frequency fluctuations and the highest metabolic activity at rest are observed within the default mode network (Raichle et al. 2001; Zou et al. 2008). Similarly, task-related suppression of default mode network activity has been observed ubiquitously across diverse tasks (Shulman et al. 1997; Mazoyer et al. 2001; Fransson 2006). In contrast, the task positive network is likely composed of a number of distinct functional systems (Dosenbach et al. 2006; Seeley et al. 2007). This functional heterogeneity may contribute to the task positive network's lower overall reliability.
In our cluster analysis, we observed that the cluster-membership agreement of the DLPFC, a component of the task positive network, was particularly low (the agreement of its assignment with that of Toro et al. (2008) was ~55%, across all 3 scans). Furthermore, this DLPFC region was also the only region that at the group level failed to demonstrate the same pattern of cluster assignment observed by Toro et al. (2008), instead appearing as a member of the task-negative network. Voxelwise, ICA, and N-cut clustering approaches have found that this same region is separable from other regions in the task positive network (e.g., SMA and dACC) and that it is part of a distinct lateralized fronto-parietal network commonly observed in attentional and memory processing studies (Damoiseaux et al. 2006; Seeley et al. 2007; Van Den Heuvel et al. 2008). Furthermore, our DLPFC ROI is close to a region of PFC which, in 2 previous studies (Fox et al. 2006; He, Snyder, et al. 2007), appeared to belong to 2 different functional networks—the dorsal and ventral attention systems—and was hypothesized to serve as a locus of interaction between these 2 networks. Taken together, these observations support the idea that there may be a greater degree of functional independence between components of the task positive network, which may in turn result in lower reliability.
These observations may suggest that the task-negative/task-positive dichotomy simplifies the functional architecture of the brain, which is composed of a myriad of functional networks. Despite the dynamics within each of the 2 networks in particular within the task-positive network, both of these superordinate networks demonstrate substantial coherence across participants and scans. Accordingly, we recommend attending to multiple levels of analysis that will compromise both superordinate networks such as the task-positive network as well as subcomponents such as the salience network (Seeley et al. 2007) depending on the questions being addressed.
The analysis of functional connectivity between sets of a priori ROIs represents one fruitful approach to the study of fcMRI (Achard et al. 2006; Dosenbach et al. 2007; Fair et al. 2007; Fransson and Marrelec 2008). An alternative model-based analysis is to examine seed-based connectivity on a voxelwise basis, which permits the examination of a wider possible range of functional connections. We examined the reliability of the patterns of positive and negative voxelwise connectivity associated with 3 seed regions, located in the PCC, right IPS, and SMA. These 3 ROIs were selected because they represent core components of the default mode and task positive networks, and because we found that they constituted hubs of connectivity (i.e., they were significantly correlated with a large number of other regions in their respective networks; see Table S2). Figure 8 illustrates the striking overlap between voxels exhibiting significant positive and negative correlations with each of the 3 seeds we examined, and those exhibiting the highest reliability (ICC > 0.5). Although the stability of the ICNs observed across multiple studies has been intuitive to many, the findings we have presented, most clearly illustrated by Figure 8, provide a solid basis for continued confidence in the utility of resting state fcMRI studies.
It is notable that intersession reliability (>5 months between scans) was somewhat lower than intrasession reliability (<1 h between scans). That intersession reliability was lower than intrasession reliability suggests that measures of fcMRI are dynamic, and may be subject to modulations related to an individual's current state, a suggestion that is consistent with previous findings (Harrison, Pujol, Ortiz, et al. 2008; Waites et al. 2005). On the other hand, we found that intra-, inter- and multiscan reliability increased with increasing Z statistic threshold values (Figs 9 and and13),13), suggesting that long-term reliability is highest for correlations between voxels that are most strongly positively or negatively correlated with the seed ROI.
Previous studies have demonstrated that even structural measures are subject to change over time, albeit as the result of the acquisition of a new skill (Draganski et al. 2004; Ilg et al. 2008). We have recently shown that measures of functional connectivity may provide an index of brain changes associated with typical development (Kelly, Di Martino, et al. 2008). The quantification of longitudinal changes in resting state fcMRI, and how they relate to underlying changes in brain structure represents a crucial next step in our understanding of the long-term stability of functional connectivity measures. Of course, a myriad of other factors relating to scanner characteristics could also have contributed to greater variation between fcMRI measures across several months by contrast to measures obtained within the same scan session. Given the substantial intersession intervals (up to 16 months), we interpret the corresponding reliability indices as a reasonable estimate of the lower bound of such long-term stability. Beyond this extended interval it is reasonable to assume that changes at the level of the neuronal architecture, associated with factors such as development, aging or learning, would be associated with changes in patterns of fcMRI, thus substantially reducing reliability.
Establishing the test-retest reliability of resting-state fcMRI measures is crucial to the interpretation and validation of studies examining interindividual and group differences in functional connectivity. Several recent studies observed differences in resting-state functional connectivity that were related to behavioral performance (Hampson et al. 2006; Kelly, Uddin, et al. 2008; Seeley et al. 2007) and clinical diagnosis (see Greicius 2008, for review). Other studies have observed differences in measures of fcMRI following pharmacological intervention (e.g., Achard and Bullmore 2007) and mood-induction (Harrison, Pujol, Ortiz, et al. 2008). Not only do our findings support the interpretation of these findings as reflecting meaningful interindividual and group differences, but they also highlight specific brain regions that appear to exhibit particularly stable interindividual differences. As discussed above, these regions have been identified as constituting the brain's structural and functional hubs, and include regions such as the precuneus and posterior cingulate cortex.
The consistency between the networks identified in previous functional and structural studies of the brain and those identified in the present study also provides further support for the idea that measures of resting state fcMRI reflect aspects of the intrinsic functional organization of the brain. In particular, our data suggest that the default mode and task positive networks are particularly robust, and therefore likely to provide a veridical reflection of underlying neural architecture.
A variety of analytical decisions were made in the present work, and represent parameters that vary widely across labs. Noteworthy examples include 1) the spherical definition of ROIs (radius or geometric form employed can vary), 2) use of parcellation units (the specific method of parcellation can vary, or the set of parcellation units adopted), 3) global signal correction (means of correction may vary), and 4) choice of spatial and temporal filtering bands. Similarly, studies vary in the specific imaging parameters adopted (field strength, TR, voxel size, scan duration). Although these factors can clearly impact measures of fcMRI, and merit further examination, we believe the general principles and findings demonstrated in the present work will generalize across the various approaches.
In the present study, we adopted the ICC to quantify reliability, primarily due to its widespread usage in a variety of literatures. ICC is not without limitations, however. Because it provides a ratio of within-subject to between-subject variability, for a measure to be reliable, within-subject variability must be low relative to between-subject variability. However, numerous studies suggest that measures of fcMRI are highly consistent across subjects, rendering between-subject variability low. Consequently, reliability as quantified with ICC may be low for some portion of functional connections that are highly stable, because both within- and between-subject variability are low. In recognition of this potential limitation, we employed additional measures of cross-scan consistency, such as Kendall's coefficient of concordance. These measures supported the high degree of consistency across participants and scans.
Our assessment of the test–retest reliability of measures of resting state fcMRI has been necessarily selective in the specific measures obtained and regions examined. Nonetheless, by including ROIs from 3 different studies (Dosenbach et al. 2007; Kennedy et al. 1998; Makris et al. 1999; Toro et al. 2008), and examining correlations on a voxelwise level, we believe that we have provided an initial comprehensive assessment of the most commonly employed model-based fcMRI method in the field. Model-free analyses, such as independent component analysis (ICA), represent an alternative approach to identifying ICNs in resting state data. Although several groups have already examined the qualitative reproducibility of networks identified using ICA and other model-free approaches (Damoiseaux et al. 2006), future work will determine the test–retest reliability of these measures.
We acquired our functional data while participants rested with their eyes open. Previous studies have asserted that resting state data acquired under either eyes-open or eyes-closed conditions are highly similar (Fox et al. 2005; Fransson 2005), though this similarity has not been quantified. Thus, systematic quantitative differences may exist between these 2 resting state conditions. In particular, individuals in the eyes-closed state may fall asleep during resting state scans, which may affect fcMRI measures (Fukunaga et al. 2006, 2008). Future studies should aim to quantify the effects of state, such as sleep, on test–retest reliability.
Although each scan acquired for the present study comprised a single 6.5-min run of resting state data, significant variation exists across laboratories with respect to the duration and/or number of scans obtained. Such variation can clearly impact the reliability of resting state measures obtained, and should be considered in the design of any study. Supplementary analyses (see Supporting Information and Fig. S8 and Table S3) demonstrate the significant increase in reliability and consistency of functional connectivity following averaging across 2 resting state scans for each participant. Future work may focus on other factors to increase reliability and consistency, including more rigorous examinations of the effects of increasing the duration and/or number of scans included in a session.
The goal of our present study was to establish the stability of coherent BOLD fluctuations at rest. However, examining the dynamics of correlated activity is equally important in understanding the relevance of such functional correlations to behavior in both health and disease. Future research may seek to examine the dynamics of correlated spontaneous and task-related activity in ICNs. To date, this approach has been exemplified by studies utilizing psychophysiological interaction (PPI) analyses. First described by Friston et al. (1997) more than a decade ago, this approach examines task-related modulations of interregional functional connectivity. An alternative technique is to examine the modulation of ICNs by the presence of task demands and compare correlations within specific ICNs across rest and task conditions (Hampson et al. 2006; Vincent et al. 2006; Kelly, Uddin, et al. 2008). Studies employing this approach have, for example, demonstrated increases in the spatial extent and magnitude of correlations in the default mode network from rest to a moral dilemma task condition (Harrison, Pujol, López-Solà, et al. 2008). Given the importance of both resting state and task-related activity to our understanding of brain function, future work may build upon these studies and examine the impact and reliability of various task-based manipulations on functional connectivity. Similarly, future work may directly examine the relationship between the magnitude of functional connectivity between 2 regions at rest, and the magnitude of activity during task-performance.
In summary, our results represent the first quantitative evaluation of test–retest reliability of some of the most commonly used measures of resting state fcMRI. We observed that reliability ranges from minimal to robust, and identified several factors that appear to be strongly predictive of high degrees of stability within individuals across time. These results provide a foundation for continued examination of resting state fcMRI in typical and atypical populations. Our findings also further support the audacious hypothesis that ICNs, which are readily observed during resting state fMRI studies (as well as during task-based studies), reflect the fundamental self-organizing properties of brain.
Stavros S. Niarchos Foundation, the Leon Lowenstein Foundation, NARSAD (The Mental Health Research Association) grants to F.X.C.; and Linda and Richard Schaps, Jill and Bob Smith, and the Taubman Foundation gifts to F.X.C.
We wish to acknowledge the invaluable contribution made by Don Klein, who inspired the present work. We would also like to thank our volunteers, who endured several hours of scanning over the past 2 years. Conflict of Interest: None declared.