|Home | About | Journals | Submit | Contact Us | Français|
Functional connectivity analyses of resting-state fMRI data are rapidly emerging as highly efficient and powerful tools for in vivo mapping of functional networks in the brain, referred to as intrinsic connectivity networks (ICNs). Despite a burgeoning literature, researchers continue to struggle with the challenge of defining computationally efficient and reliable approaches for identifying and characterizing ICNs. Independent component analysis (ICA) has emerged as a powerful tool for exploring ICNs in both healthy and clinical populations. In particular, temporal concatenation group ICA (TC-GICA) coupled with a back-reconstruction step produces participant-level resting state functional connectivity (RSFC) maps for each group-level component. The present work systematically evaluated the test-retest reliability of TC-GICA derived RSFC measures over the short-term (< 45 minutes) and long-term (5 − 16 months). Additionally, to investigate the degree to which the components revealed by TC-GICA are detectable via single-session ICA, we investigated the reproducibility of TC-GICA findings. First, we found moderate-to-high short- and long-term test-retest reliability for ICNs derived by combining TC-GICA and dual regression. Exceptions to this finding were limited to physiological- and imaging-related artifacts. Second, our reproducibility analyses revealed notable limitations for template matching procedures to accurately detect TC-GICA based components at the individual scan level. Third, we found that TC-GICA component's reliability and reproducibility ranks are highly consistent. In summary, TC-GICA combined with dual regression is an effective and reliable approach to exploratory analyses of resting state fMRI data.
Functional connectivity of resting-state fMRI data is rapidly emerging as a highly efficient and powerful tools for in vivo mapping of neural circuitry in the human brain. Essentially, resting state functional connectivity (RSFC) approaches detect coherent patterns of low-frequency (< 0.1 Hz) fluctuations in the resting state BOLD signal, referred to as “intrinsic connectivity networks” (ICNs). Despite a burgeoning literature demonstrating the utility of RSFC approaches (Fox and Raichle, 2007), researchers continue to struggle with defining computationally efficient and reliable approaches for identifying and characterizing ICNs.
Inspired by a seminal demonstration of RSFC within the motor system (Biswal et al., 1995), seed-based correlation represents the dominant approach for studying RSFC. This method detects ICNs by identifying voxels whose timeseries significantly correlate with the mean timeseries of voxels within an a priori seed region of interest (ROI). In addition to generating highly detailed maps of complex functional systems (Margulies et al., 2007; Zhang et al., 2008; Di Martino et al., 2008; Roy et al., 2009; Nioche et al., 2009), seed-based correlation analyses are commonly used to identify abnormalities in ICNs related to psychopathology (Greicius, 2008; Broyd et al., 2009), as well as relationships between RSFC measures and individual differences in behavioral (Kelly et al., 2008; Fox et al., 2007) and trait characteristics (Di Martino et al., 2009). Overall, this mass-univariate approach has proven to be a powerful, efficient, and reliable tool for neuroimaging (Shehzad et al., 2009).
Seed-based analytic approaches are not without limitation, however. Most notably, the ICNs derived with seed-based correlation are highly dependent upon choice of a seed ROI. Further, the seed ROI timeseries includes indeterminate noise, in contrast to the idealized task design matrix used in task-based analyses. While several preprocessing strategies commonly are employed to remove noise (e.g., regressing out the global signal), such corrections can affect the data (e.g., inducing artifactual negative correlations) (Murphy et al., 2009). Finally, as a general linear model (GLM)-based univariate method, seed-based correlation disregards relationships among multiple voxels (i.e., it only analyzes the relationship between the seed timeseries and one other voxel at a time).
In light of these issues, neuroimaging researchers have adopted a multivariate signal processing method known as independent component analysis (ICA) to explore the spatial-temporal properties of resting state fMRI data (Kiviniemi, 2003; van de Ven et al., 2004; Beckmann et al., 2005). In theory, without any explicit a priori knowledge, ICA aims to separate spatially (spatial ICA: sICA) or temporally (temporal ICA: tICA) independent patterns from their linearly mixed BOLD signals via maximization of mutual independence among components (Stone et al., 1999). Generally, sICA is the more appropriate choice for analyzing resting state fMRI data given the small number of time points included in most fMRI datasets. Thus, we also used sICA in this paper. For simplicity, in the remainder of this paper, we refer to sICA as ICA.
ICA offers several potential advantages over seed-based correlation. First, as noted above, ICA is a multivariate, data-driven approach. It thus requires no a priori hypothesis or model of brain activity. Second, by taking account of multiple simultaneous voxel-voxel relationships, ICA detects interacting networks of regions, rather than the single region-dominant (i.e., seed ROI) networks produced by seed-based correlation. Finally, ICA is capable of extracting noise (e.g., scanner, physiological and motion artifacts) from the desired dataset. ICA-based denoising is fully data-driven, automatic (Thomas et al., 2002; Perlbarg et al., 2007; Tohka et al., 2008) and relatively unaffected by different temporal sampling rates (DeLuca et al., 2006), thus avoiding the need for a priori specification of noise that arises in seed-based correlation.
Despite its promise, the unconstrained nature of ICA raises a significant challenge for group-based analyses. If a separate ICA is conducted for each participant in a study, a researcher must first match the components between participants before being able to carry out group analyses (Wang and Peterson, 2008) − a task that is complicated by the unconstrained component order in ICA (Hyvarinen and Oja, 2000). Different numbers of components also can be extracted for different individuals, further complicating the challenge. Accordingly, some have advocated carrying out a single ICA analysis across an entire group of participants (see (Guo and Pagnoni, 2008) and (Calhoun et al., 2009) for reviews on various group ICA methods). This can be accomplished in a relatively efficient manner by carrying out an ICA on a single large dataset by temporally concatenating all individual datasets (Calhoun et al., 2001; Beckmann et al., 2005). This method is referred to as temporal concatenation group ICA (TC-GICA). Based upon group-level components identified by TC-GICA, individual participant components can be generated using approaches such as principal component analysis (PCA) back-projection (Calhoun et al., 2001), or more recently developed GLM dual regression approaches (Filippini et al., 2009; Beckmann et al., 2009). The functionally relevant components of all group-level components are referred to as ICNs.
The potential utility of group-based ICA approaches has been demonstrated by an increasing number of studies examining clinical populations. Clinical samples with Alzheimer's disease or dementia (Greicius et al., 2004; Seeley et al., 2009; Rombouts et al., 2009), schizophrenia (Jafri et al., 2008), depression (Greicius et al., 2007), epilepsy (Zhang et al., 2009), Huntington's disease (Wolf et al., 2008), and amyotrophic lateral sclerosis (Mohammadi et al., 2009) have demonstrated altered ICA-derived ICNs. Despite this promising start, it is important to note that several factors may reduce the reliability of ICA approaches for mapping RSFC. First, the resting state is inherently unconstrained and thus subject to variation related to a participant's state. Second, factors unrelated to the participant can introduce significant variability (e.g., inter-session differences in scanner performance). Third, no ideal means exist for assessing the optimal number of components. Finally, the initial values for ICA estimation are random, which introduces further variability into the findings (Himberg et al., 2004; Yang et al., 2008). In considering these factors, in order to provide biomarkers based on neural circuitry, the test-retest (TRT) reliability of ICA-derived ICNs must first be assessed.
Several groups have addressed the reliability or consistency of ICA. Damoiseaux et al. (2006) demonstrated 10 qualitatively consistent ICNs across subjects by applying tensor ICA to resting state fMRI data from two imaging sessions separated by 5 − 14 days. They found that (1) spatial patterns of the 10 ICNs from the two sessions were quite consistent by visual inspection, and (2) regions exhibiting the highest RSFC within a component tended to exhibit the least variation across 100 surrogate datasets created via bootstrapping. Recently, using TC-GICA with three initial PCA data reductions, Chen et al. (2008) demonstrated the consistency of ICA-derived ICNs across 5 sessions within 16 days. Two more recent studies also showed high reproducibility and inter-rater selection reliability of an ICA-derived “default mode” network (Meindl et al., 2009; Franco et al., 2009). The above four studies focused only on the spatial consistency of ICNs across occasions within approximately 2 weeks, and they did not quantitatively assess both the voxel-wise intra- and inter-session TRT reliability of each group-level component.
In the current paper, we address the short-term (< 1 hour) and long-term (> 5 months) TRT of group-level components. We conducted these analyses with a previously published resting state fMRI dataset of 26 participants scanned three times on two different occasions (Shehzad et al., 2009; Zuo et al., 2010). Specifically, we: (a) used the TC-GICA approach to derive group-level components across all participants and sessions, (b) used dual regression to back-reconstruct each group-level component for each of the 3 scans at the individual participant-level, and (c) calculated voxel-wise intra- and inter-session intraclass correlation (ICC) on the basis of these individual-level back-reconstructed components. Based on prior work (Shehzad et al., 2009), we expected to find moderate-to-high TRT reliability for the various ICNs detected except for those components reflecting physiological or scanner-related noise.
While the present work relied upon the group-based TC-GICA approach for detecting components, alternative ICA-based analytic strategies exist. In particular, many studies have employed individual participant based ICA approaches, combined with a template matching for a group analysis (Greicius et al., 2004, 2007; Seeley et al., 2009; Mohammadi et al., 2009). To date, no study has systematically examined the issue of how easily components identified by group ICA can be detected using individual ICA. Thus, a secondary goal of the present work was to assess the reproducibility of components detected by TC-GICA at the participant-level. Specifically, the reproducibility analysis measured the degree to which components identified using group ICA were present in the results of single-participant ICA. For each participant, we carried out an individual ICA for each of the 3 scans. Then, for each group-level component, a commonly employed template matching procedure (Garrity et al., 2007) was applied to identify the best matching component in each of the three ICA analyses. These best-matched individual-level components were used to evaluate the reproducibility.
Finally, we ranked the group-level components according to their TRT reliability and reproducibility. We hypothesized that functionally relevant ICNs would show high ranks in both TRT reliability and reproducibility. In contrast, we predicted that the components corresponding to various sources of physiological noise and scanner artifact would exhibit both low TRT reliability and reproducibility.
Twenty-six participants (mean age 20.5 ± 4.8 years, 11 males) were scanned three times. The data were used in our earlier studies to examine the TRT reliability of seed-based correlation measures (Shehzad et al., 2009; Zuo et al., 2010). The participants had no history of psychiatric or neurological illness, as confirmed by clinical assessment. Informed consent was obtained prior to participation. Data were collected according to protocols approved by the institutional review boards of New York University (NYU) and the NYU School of Medicine.
Three resting-state scans were obtained for each participant using a Siemens Allegra 3.0 Tesla scanner. Each scan consisted of 197 contiguous EPI functional volumes (TR = 2000 ms; TE = 25 ms; flip angle = 90°, 39 slices, matrix = 64 × 64; FOV = 192 mm; acquisition voxel size = 3 × 3 × 3 mm). Scans 2 and 3 were conducted in a single scan session, 45 minutes apart, and were 5-16 months (mean 11 ± 4) after scan 1. All individuals were asked to relax and remain still with their eyes open during the scan. For spatial normalization and localization, a high-resolution T1-weighted magnetization prepared gradient echo sequence was also obtained (MPRAGE, TR = 2500 ms; TE = 4.35 ms; TI = 900 ms; flip angle = 8°; 176 slices, FOV = 256mm).
Data preliminary processing was carried out using FMRIB Software Library (FSL: version 4.1): 1) slice timing correction for interleaved acquisitions using Sinc interpolation with a Hanning windowing kernel; 2) head motion correction via a robust and accurate intra-modal volume linear registration (mcflirt) (Jenkinson et al., 2002); 3) spatial smoothing with a 6mm FWHM Gaussian kernel; 4) 4D intensity normalization of all volumes by the same factor; 5) spatial normalization via estimating a B-spline basis nonlinear transformation from an individual functional space to MNI152 standard brain space (flirt+fnirt) (Andersson et al., 2007). After preprocessing, we excluded one participant from our subsequent analyses due to large amount of head motion in one of the three scans. The final data for this study thus consisted of 75 scans from 25 participants. These data are fully available for downloading via http://www.nitrc.org/projects/nyu_trt.
Figure 1A illustrates the overall analytic strategy:
We briefly review probabilistic ICA and relevant properties that are important for applications in resting state fMRI studies (Beckmann and Smith, 2004). The ICA decomposes a resting state fMRI dataset into a linear mixture of spatially independent components (ICs) plus Gaussian noise. It has been implemented as the multivariate exploratory linear optimized decomposition into independent components (melodic) toolbox in FSL. It has the following mathematical representation:
For resting state fMRI data, T is the number of time points, N is the number of voxels, and P is the number of ICs. These matrices can be further summarized and explained as follows: X = [x1, x2, · · · , xT]t, in which each row vector represents a resting state BOLD volume at a time point t; S = [s1, s2, · · · , sP]t, in which the row vectors represent each IC; A = [a1, a2, · · · , aT]t, in which each row includes the contributions of all P ICs to the resting state fMRI BOLD volume at the corresponding time point (thus each column constitutes a timeseries); E = [e1, e2, · · · , eT]t denotes the background multivariate Gaussian noise. Accordingly, this method decomposes a resting state fMRI BOLD volume into several spatially independent volumes (i.e., ICs) and relevant timeseries. Of note, explicit modeling of the background Gaussian noise effectively can reduce the noise-induced asymptotic bias of the ICA estimation (Beckmann et al., 2005; Cordes and Nandy, 2007). For a given number of ICs, the ICs can be solved by a maximum likelihood estimation using the FastICA algorithm (Hyvarinen and Oja, 2000; Beckmann and Smith, 2004). In the current work, these algorithms are not our focus and therefore their details are not explained here.
TC-GICA was used to generate group-level components across all participants and sessions (Beckmann et al., 2005). This approach consists of 3 fundamental steps: 1) Estimation of a mean covariance matrix: all 75 individual fMRI datasets are spatially concatenated in MNI152 standard space and used to estimate the mean covariance matrix; 2) PCA reduction of individual datasets: for a given number of ICs, the mean covariance matrix spans a common subspace for all fMRI data. All individual fMRI data were projected into this common subspace to reduce the individual fMRI data; and 3) Probabilistic ICA on temporally concatenated data: all reduced individual data were temporally concatenated and fed into the ICA algorithm(1). This procedure produces the final group-level components. Specifically, it is formularized as the following:
Here, is the dimension-reduced fMRI data from the i-th scan of the j-th participant and Aij, Eij are the relevant mixing matrix and the background noise matrix (1 ≤ i ≤ 3, 1 ≤ j ≤ 25); S includes the ICs shared by all 75 individual scans (i.e., group-level components). All components are standardized into Z-score maps by dividing the relevant component weight by the standard deviation of the background noise. Thus, the group-level component measures not only the raw component but also a signal-to-noise ratio (snr). Finally, a spatial mixture model is applied to the Z-score map to infer whether the voxels were significantly modulated by the associated timeseries (p > 0.5). In the current work, to examine large-scale spatial networks, we fixed the number of components to 20 and performed a 20-component melodic (Smith et al., 2009). In addition, we also used melodic to automatically estimate the number of components, and conducted a second melodic using the estimated number of components (42). Of note, we based the TC-GICA analysis on the entire dataset from the 25 participants across the three scans (a total of 75 scans). This approach was adopted to: 1) provide a realistic equivalent of population-based studies that tend to base their group-level analyses on the large dataset (controls and patients) rather than a single subset (Filippini et al., 2009; Rombouts et al., 2009; Jafri et al., 2008; Damoiseaux et al., 2008; Wolf et al., 2008; Calhoun et al., 2004), and 2) obtain the best possible ICA estimation, independent of any single session's noise.
In order to assess the TRT reliability of each group-level component, we used the dual regression approach to build individual-level DR components (Filippini et al., 2009; Beckmann et al., 2009). This method is based on the following GLM dual regression equations:
In equation (3), Xij represents the fMRI data from the i-th scan of the j-th participant. As in a previous study (Calhoun et al., 2004), the first part of equation (3) uses unthresholded group-level components S as the spatial predictors of the individual fMRI volumes and results in the regression matrix containing the relevant individual regression weights in the time domain (i.e., timeseries). These timeseries were then used as the temporal predictors for the individual fMRI timeseries in the second regression equation. The resulting regression matrix Sij contains regression weights for each of the components in the spatial domain, which serve as our measure of functional connectivity (i.e., the individual-level DR components). These individual-level DR components were subsequently used to evaluate the TRT reliability of the group-level components.
Supplementary analyses examined whether or not TRT reliability is impacted by the addition of variance normalization on the temporal predictors prior to the second-level regression. Variance normalization is commonly used to remove the potential impact of amplitude information on regression results, placing greater emphasis on the shape of the timeseries instead. The comparison of TRT reliability between the two approaches is presented in the supplementary materials: Part 2 - Impact of Several Factors on TRT Reliability.
As in our prior work (Shehzad et al., 2009; Zuo et al., 2010), we used intra-class correlation coefficients (ICC) to assess TRT reliability. Multiple variants of ICC exist, each with different advantages and limitations (Shrout and Fleiss, 1979). The specific form used here is:
In this equation, MSp is the between-participants mean square and MSe is the error mean square. The relevant statistical computation and interpretation can be found in Appendix A. Of note, this ICCc calculation does not penalize for systematic errors (e.g., differences in ICs between scans). Its usage is recommended for TRT situations in which systematic errors due to learning effects or fatigue effects may occur (e.g., over the duration of the scan session), which are not directly related to reliability (Rousson et al., 2002; Weir, 2005). We have also analyzed our data using a form that does penalize for systematic error (the form as noted in Appendix A) and supplied the results of TRT reliability analyses as Figure S4 (inter-session ICCu maps) and Figure S5 (relevant rank ordering using ICCu).
In the present work, we assessed TRT reliability as follows: 1) TC-GICA was first carried out across all participants and sessions (total 75 scans) to identify group-level components; 2) dual regression was used to reconstruct the individual-level DR components; and 3) at each voxel, intra-session TRT reliability was calculated as the ICCc between scan 2 and scan 3. Inter-session TRT reliability was computed as the ICCc between scan 1 and the mean of scans 2 and 3. Given the close temporal proximity of scans 2 and 3 (45 minutes apart) relative to scan 1 (5 − 16 moths), we averaged scans 2 and 3 together for our estimation of the long-term TRT reliability, rather than using a single scan from the second session (e.g., scan 2 or scan 3) (Zuo et al., 2010). This decision was made in order to obtain the best possible estimate for the second session data, and thus long-term TRT reliability.
We examined the extent to which group-level components detected via TC-GICA are reproducible via individual-level ICA. First, given the number of ICs, we performed an individual probabilistic ICA for each of the 75 scans (25 participants × 3 scans). Then, for each of the group-level components, we identified the closest match in each of the 75 individual-level ICA runs using a template matching procedure based upon the spatial correlation in equation (5). We selected Pearson spatial correlation as our matching metric because of its frequent usage in the literature (Garrity et al., 2007; Esposito et al., 2006; Harrison et al., 2008).
In (5), measures the degree to which a given individual-level component from the individual-level ICA on the j-th (j = 1, 2, · · · , 25) participant's i-th (i = 1, 2, 3) scan (i.e., ), matches the k-th template component . Before we calculated the matching score, we found the voxel with the maximum absolute Z-score, and multiplied the volume by the sign of the Z-score. This ensures that the voxel with the greatest absolute Z-score is positive. The matching score for each best-matched individual-level component can be used to measure the reproducibility of corresponding group-level component at the individual-level. For an easy understanding of such a template matching procedure, the process is illustrated in Figure 1B.
Of note, while using the template matching score for group-based analysis, a well-considered study must perform a very careful visual inspection of the matching results (e.g. Harrison et al., 2008). In the current study, in considering the issue of the stability of template matching, we repeated our template-matching procedure using individual-level DR component as the template (i.e., DR template matching) for a given group-level IC, rather than the overall group-level component map (i.e., TC-GICA template matching). The overlap ratio of individual-level components between the two template matching strategies was used to measure the stability of the template matching. Specifically, for each group-level component, the ratio of overlap of all individual-level components across the two template matching strategies was computed as the agreement between the two template matching procedures as defined in equation (6).
where is the m-th individual-level component matched by the k-th TC-GICA template, and is the m-th individual-level component matched by the k-th DR template of the m-th scan. # is the number of agreements between the two template matching procedures.
We used two ordering methods to rank the group-level components. First, for each group-level component, a histogram of the ICC values within the group-level component mask (i.e., a binary version of the thresholded group-level component) was used to demonstrate the ranges of TRT reliability across different components. We then took the first most frequent ICC value ICC(k,1) and the second most frequent ICC value ICC(k,2) from each ICC histogram. After this feature selection step, each TRT reliability map can be simplified to a bivariate sample dataset (ICC(k,1), ICC(k,2)) where k = 1, 2, · · · , P. These TRT reliability data then can be ranked using a method for sorting multivariate observations based on a minimal spanning tree (see Appendix B). The second index for ranking group-level components was the mean Pearsons correlation coefficient between the individual-level DR template and the best matched individual-level component (i.e., the reproducibility).
For each individual resting state fMRI dataset, we applied five algorithms to estimate the number of components automatically: Akaike information criterion (Akaike, 1973), Bayesian information criterion (Schwarz, 1978), minimum description length (Rissanen, 1978), probabilistic PCA (Minka, 2001) and entropy rate enhanced minimum description length (Li et al., 2007). The first four methods were performed using melodic in FSL. The last method was carried out in a Group ICA fMRI Toolbox (gift: version 1.3f). Finally the ICCc was applied to evaluate the TRT reliability of each method.
We carried out 20-component and 42-component TC-GICA. All TRT reliability and reproducibility analyses were performed equally for both ICA decompositions. In this section, we only present the results of the 20-component ICA decomposition. The findings of the 42-component decomposition are summarized in the supplementary materials: Part 1 - Reliability and Reproducibility for 42-Component MELODIC Analysis Combining Dual Regression.
Figure 2 depicts the 20 component maps generated by carrying out a lower order (number of components fixed to 20) TC-GICA across all participants and sessions. The component ordering for the figure was based on the relative ranking of the percentage of variance explained by each component. The components observed were highly consistent with those previously reported (van de Ven et al., 2004; Beckmann et al., 2005; DeLuca et al., 2006; Damoiseaux et al., 2006; Smith et al., 2009; Kiviniemi et al., 2009).
We sorted the 20 components into two broad classes - functionally relevant components (i.e., ICNs) and scanner/physiological artifactual components - based on visual inspection of each component's spatial profile (e.g., biological plausibility, comparability to patterns previously reported in ICA-based studies) and timeseries-based power spectrum profile (e.g., whether or not signals < 0.1Hz were prominent). We noted 4 components that appeared to be associated with artifactual sources: cerebrospinal fluid (IC01), white matter (IC03), head motion (IC05), and large vessels (IC16). These four components accounted for 39.4% of the total variance in the resting state fMRI data. Several functionally relevant components consistent with prior reports were also revealed in our results. Two components (IC04 and IC15) are involved in vision. IC09 combines visual and motor regions including the occipital pole, superior parietal cortex and precentral gyrus. IC13 includes brain regions such as the primary motor cortex and primary and association auditory cortices. Several components include regions related to various high-order brain functions: fronto-parietal networks corresponding to cognition and language functions (IC07 and IC19), medial-frontal including anterior cingulate and paracingulate associated with executive control (IC08) and three “default mode” networks (IC10, IC12 and IC14). We found six other components that are rarely reported or investigated systematically corresponding to the cerebellum (IC11 and IC18), a motor-striatal component (IC02), a ventromedial prefrontal component (IC17), a brainstem component (IC06), and a temporal-lobe component (IC20). Of note, we found several components that exhibit anti-correlation relationships between regions (IC04, IC08, IC14 and IC15). In particular, the executive and attentional network (IC08) and the “default mode” network (IC14) demonstrated prominent anti-correlation relationships (Figure S1).
We detected the classic “default mode” network, although in the form of three components that we interpret as sub-networks. The first is a medial-prefrontal subsystem (IC12), the second is a posterior cingulate/precuneus subsystem (IC10), and the third is a temporal subsystem (IC14). These three subsystems mainly overlap in the posterior cingulate cortex and medial prefrontal cortex (Figure S2). As we discuss below, the existence of three overlapping but differentiable sub-networks may account for some of the variations in the specific spatial distributions or functional specialization of the “default mode” network reported across ICA studies (Buckner et al., 2008; Harrison et al., 2008).
For each group-level component, the voxel-wise intra-session (Figure 3) and inter-session (Figure 4) TRT reliability maps were generated and summarized by calculating the first most frequent ICC value (i.e., mode; see Figure 5) in their histograms (Figure S3). Consistent with our predictions, most regions within noise components 1, 3 and 5 showed relatively low TRT reliability (modal ICC: 0.2 − 0.45) both within and between sessions. Noise component 16 (large vessels) exhibited moderate reliability within-session (modal ICC: 0.525), but lower between-session reliability (modal ICC: 0.375). This suggests that cardiac factors may be more stable within-session than over the longer term.
Also consistent with our predictions, components commonly associated with sensory, motor, higher order cognitive function and the “default-mode’ network all exhibited moderate to high TRT reliability both within (modal ICC: 0.5 − 0.65) and between (modal ICC: 0.45 − 0.65) sessions. The six remaining components showed relatively low reliability (modal within-session ICC: 0.275 − 0.4; between-session ICC: 0.2 − 0.375), with the exception of a cerebellar component (modal within-session ICC: 0.575; modal between-session ICC: 0.5).
Our reproducibility findings indicated marked variability across the 20 components with respect to their reproducibility at the individual scan level, regardless of the template matching approach employed (see Figure 6). Greatest reproducibility was noted for IC04 (i.e., medial visual network), whose mean matching score was 36% (for DR template matching) and 62% (for TC-GICA template matching) greater than the mean of the remaining components’ mean matching scores. The lowest reproducibility was noted for artifact components IC01, IC03 and IC20, as well as IC18 (cerebellum). The mean match score between the template and individual-level components was 34% greater across components when we employed the DR template matching, rather than TC-GICA template matching (paired t-tests: p < 10−11). However, the components selected by the two template matching approaches were frequently contradictory, except for IC04, which had the highest reproducibility. Figure 6 indicates the number of disagreements on the matching results between the two template matching procedures. This finding is particularly worrisome since the most commonly employed approach for template matching in the ICA literature is based on an overall group template matching (Mantini et al., 2009; Greicius et al., 2004, 2007; Damoiseaux et al., 2008).
Finally, we used a random effects model to construct a group statistical map for each of the 20 group-level components, using the best matched individual-level components across the 75 scans, for each of the two template matching procedures. A relatively high degree of concordance was noted between the group statistical maps produced using the two matching procedures, as well as with the TC-GICA group-level components (see Figure 7). This suggests that while template matching can produce numerous misidentifications of individual-level components, these misidentifications do not appreciably affect the resulting group statistical maps. This is a source of concern, as the concordance of group statistical maps can give the erroneous impression that template matching is effective, when this is frequently not the case.
Based on the histogram of the TRT reliability for each group-level component, we generated a minimal spanning tree to rank components. Figure 8 shows the group-level components ranked with the nodes denoting the minimal spanning tree. The fronto-parietal (IC19 and IC07), visual (IC04), “default mode” (IC10 and IC14), auditory-motor (IC13), executive control (IC08) and one cerebellum component (IC11) demonstrated the highest ranks for both intra- and inter-session TRT reliability. In contrast, the low reliability ranks corresponded to those noise or artifact components associated with white matter, head motion, cerebrospinal fluid, and presumed imaging artifacts. Additionally, the other cerebellum component (IC18), the motor-striatal component (IC02), the ventromedial prefrontal component (IC17), and the temporal-lobe component (IC20) all ranked low in TRT reliability. This lack of stability across scan sessions might be related to unknown imaging artifacts or other sources of variability.
The group-level components were also ranked by their reproducibility. Again, functionally relevant components (i.e., ICNs) ranked highly in reproducibility, while components apparently dominated by physiological noise and scanner artifact were less reproducible (Figure 9).
To assess component number estimation algorithms, we applied five estimation approaches to approximate the number of components in the resting state fMRI signals. Figure 10 illustrates the bar graphs of the numbers of components across the five algorithms. The probabilistic PCA-based Laplacian algorithm produced a mean of around 20 components (Scan1: 23.0±12.3; Scan2: 19.4±3.5; Scan3: 20.8±3.8). Built-in methods in melodic such as Akaike information criterion, Bayesian information criterion, and minimum description length produce smaller estimations than the probabilistic PCA method. The entropy rate enhanced minimum description length method generates similar estimates to the minimum description length method. The intra- and inter-session ICC values were calculated (Table S1). All five algorithms exhibit high short-term reliability (ICC > 0.74) and moderate to poor long-term reliability. Probabilistic PCA had the lowest intra-session and inter-session TRT reliability.
The present work evaluated the reliability and reproducibility of TC-GICA-derived measures of RSFC, yielding four main results. First, we found moderate-to-high short- and long-term TRT reliability for RSFC measures derived by combining TC-GICA and dual regression. Exceptions to this finding were limited to components corresponding to physiological- and imaging-related artifacts. Second, we found that the degree to which components detected using TC-GICA were reproducible at the level of a single-scan ICA varied across components but not sessions, with the medial visual component having the highest reproducibility. Importantly, our analyses revealed the potential for worrisome levels of inaccuracy in template matching procedures. This inaccuracy potential appeared to affect most components, but could easily be overlooked. Third, we found the reliability- and reproducibility-based rank orderings of the TC-GICA components were highly consistent, with functionally relevant ICNs ranking notably higher than noise components. Finally, we demonstrated that component estimation algorithms tend to exhibit high intra-session test-retest reliability, but low to moderate inter-session reliability, with poor concordance among algorithms.
The greater reliability of ICNs compared to noise components provides support for the notion that stable trait differences exist among individuals with respect to their ICN characteristics (Shehzad et al., 2009; Buckner et al., 2009; Smith et al., 2009; Damoiseaux et al., 2006). This statement is not inferring that there is no dynamic range for ICN characteristics − quite the contrary. Numerous studies have documented the impact of factors such as task demands (Esposito et al., 2006; Fransson, 2006; Hampson et al., 2006; Harrison et al., 2008; Kelly et al., 2008; Yan et al., 2009), arousal (Boly et al., 2008; Fukunaga et al., 2006), and medication status (Kelly et al., 2009b) on ICN properties. As such, we posit that ICNs in each individual (whether it be the spatial extent or strength of functional connectivity) exist at equilibrium, around which transient changes in their properties may occur in response to both external influences and state-related factors. Of note, several papers have argued that development (Fair et al., 2009; Kelly et al., 2009a) and learning (Albert et al., 2009) can produce long-term changes in these properties. It is worth noting that the most reliable ICNs were those related to fronto-parietal regions supporting attentional selection and encoding (IC19) (Milham et al., 2001), primary visual perception (IC15), and “default mode” function (IC14) (Gusnard et al., 2001). The stability of these ICNs may reflect the sustained reliance of the brain on these networks for moment-to-moment functions. Alternatively, it may suggest these networks have a more consistent response to the “resting state” fMRI environment.
Our reproducibility analyses raised significant concern about the usage of individual scan based ICA approaches and template matching. The two approaches to identifying group-level components at the individual scan-level exhibited a high degree of discordance with respect to the component selected as the best fit for an individual. Yet, across all scans, each of the template matching approaches managed to capture the gestalt of the targeted group-level component. This should caution against the suggested use of matching scores as a metric of ICN integrity, and consequently, as potentially promising clinical biomarkers (Greicius et al., 2004). In this regard, dual regression based approaches clearly engender a higher degree of confidence.
In the present work, the “default mode” network was divided into three sub-networks: the medial prefrontal sub-network, the posterior cingulate/precuneus sub-network and the medial temporal lobe sub-network. Each of the three sub-networks exhibited high test-retest reliability. This component-splitting phenomenon previously was thought to be a result of overestimating the model order, but recently was reported to reflect functional segregation or hierarchy within the “default mode” network (Buckner et al., 2008; Kiviniemi et al., 2009; Smith et al., 2009; Uddin et al., 2009; Harrison et al., 2008). Intriguingly, the three sub-networks we observed are well matched with the anatomical and functional theory of the “default mode” network proposed by Buckner et al. (2008). For example, the three sub-networks overlapped in the posterior cingulate cortex and medial prefrontal cortex, which are posited to be critical to interactions between the medial prefrontal subsystem and the medial temporal lobe subsystem (Buckner et al., 2008).
Our TC-GICA analyses revealed multiple ICNs that were accompanied by anti-phasic or “anti-correlated” functional networks. Since Fox et al. (2005) systematically demonstrated the anti-correlation between the “default mode” network and task-positive network, the anti-correlation has been suggested to reflect either a competitive mechanism or segregation of processing between functional systems (Fox et al., 2005; Kelly et al., 2008). Recently, concerns have arisen regarding the potential artifactual nature of findings of negative connectivity, resulting from global corrections procedures commonly employed in conjunction with seed-based approaches (Murphy et al., 2009; Fox et al., 2009; Weissenbacher et al., 2009). Here, TC-GICA generated a pattern of negative connectivity between the “default mode” network and an anterior cingulate-based network involving regions supporting executive control and attentional function (Long et al., 2008). Several recent theoretical models support the existence of anti-correlations (Honey et al., 2007; Steyn-Ross et al., 2009; Deco et al., 2009), suggesting that negative connectivity may be a true organizational characteristic of the resting state brain, with neural origins (Popa et al., 2009). Of note, our results suggest less prominent negative connectivity between the ”default mode” network and the dorsal visual attention stream (IC09), than prior work using seed-based approaches (Fox et al., 2006). It may be beneficial for future studies to investigate such inconsistencies.
Despite its successes, the present work has several limitations worth noting. First, usage of random initial values is a well-documented source of variability for ICA. Recent work by Himberg et al. (2004) has suggested that a potential solution to this problem is to carry out a series of ICA analyses using different initial values, and then combining the results via hierarchical clustering. The focus of the present work was on assessing the reliability and reproducibility for a single ICA run, rather than the aggregate result. In part, this decision was made to reflect the more common approach currently employed in the literature. However, future work may directly quantify the impact of random initial values on reliability and reproducibility, and in addition to look at how solutions such as that of Himberg et al. (2004) mediate it.
Second, our examination of TRT reliability focused on the combination of TC-GICA and dual regression, though does not speak to either step individually. In the first stage, dual regression can be applied to any template brain map, not just those derived from ICA. As such, future work may focus on the dual regression step specifically, using template images derived by alternative approaches, such as those recently developed by Smith et al. (2009) using the BrainMap database. Similarly, in the second stage, dual regression can use step-wise regression or univariate regression, not only multiple regression. Future work may examine the impact of different second-level regression strategies on TRT reliability. Finally, it would be worthwhile to explore the effects of ICA analyses using alternative approaches for obtaining individual RSFC maps following group analyses (e.g., PCA back-projection) (Calhoun et al., 2001). The test-retest data used in this study are publicly available via http://www.nitrc.org/projects/nyu_trt to facilitate the above or other examinations in future.
Third, the present work relied upon the template matching using a commonly employed measure of goodness of fit the spatial correlation coefficient. Unfortunately, to date, there is no standard approach to determining what an adequate match score is, and misclassification using this template matching approach is clearly possible for components with a high degree of similarity (e.g., “default mode” network components: IC10, IC12 and IC14 in Figure S2). In fact, our results actually showed that the template matching procedure could be highly affected by the template selection strategy. Alternative strategies exist to calculate the template matching fitness (Greicius et al., 2004, 2007), though they likely face the same issues we demonstrated. Future work may focus on the determination of an optimal matching procedure.
In summary, temporal concatenation group ICA (TC-GICA), combined with dual regression is an effective approach for exploratory analyses of resting state fMRI data, exhibiting moderate to high test-retest reliability for intrinsic connectivity networks. Additionally, while many components detected via TC-GICA are reproducible at the individual scan level, our findings raise cautions with respect to group-based analyses that rely on individual scan-based ICA approaches due to the challenges of component matching.
This study was supported, in part, by grants from NIMH (R01MH081218), and the Stavros S. Niarchos Foundation to F.X.C., and from the Leon Levy Foundation to M.P.M.; and by gifts from Linda and Richard Schaps, and Jill and Bob Smith to F.X.C. We desire to express our thanks to Dr. Bharat B. Biswal for helpful advice and Dr. Christian F. Beckmann for invaluable discussions on the dual regression method. Dr. Maarten Mennes helped to improve the readability of the manuscript.
To calculate the TRT reliability of RSFC for each voxel in each group-level component, for each voxel, we consider a random sample of n subjects with d repeated measurements of a continuous variable Y characterizing the RSFC. We denote Yij as the j-th measurement made on the i-th subject (for i = 1, · · · , n and j = 1, · · · , d). In the current situation, Yij denotes the RSFC of a component from the i-th participant's j-th measuring occasions. We apply a voxel-wise single-factor ANOVA on each component as the following decomposition of Yij:
where μ is a fixed parameter and pi, tj, eij are independent random effects normally distributed with mean 0 and variances , and . The term pi is the participant effect, whereas tj is the systematic error (i.e., the scanning occasion effect) and eij is the measurement error. One way to assess reliability of variable Y may use the concept of intraclass correlation (ICC) defined as
Obviously, the ICC has the desired property to characterize the test-retest reliability, i.e., becoming smaller when either or become larger. Also, if one does not consider systematic error when assessing reliability, one may use the following equation:
Solving the ANOVA model (7), we can estimate the sample version of ρu and ρc. Briefly, from data Yij, we compute three sums of squares:
The expectations of MSp = SSp/(n − 1), MSt = SSt/(d − 1) and MSe = SSe/((n − 1 )(d − 1)) are , and , respectively. We therefore obtain unbiased estimates of , and as
Then an estimate of ρu is
while an estimate of ρc is
Using the bivariate feature , k = 1, 2, · · · , P of each ICC histogram with 80 bins, we can generate a fully connected graph G = (V, E) with P nodes. A minimal spanning tree (MST) of the graph G is a tree subgraph that contains all nodes and whose edges have the least total distance. We use the Kruskal algorithm for formating the MST of G.
Algorithm 1 (Kruskal (1956)). Formation of a minimal spanning tree T from a connected graph with P nodes and with edge distances in E:
Based on the MST of G, we rank these nodes (i.e., components in the current study) by defining a starting node at an endpoint of a tree diameter and then proceed through the tree in such a way as to visit any shallow subtrees at a given node before proceeding to the deeper subtrees. In our situation, we choose to begin on the right (i.e., large TRT reliability) of the tree (Figure 8) and use the ordering rule to get the final ordering for these group-level components.