|Home | About | Journals | Submit | Contact Us | Français|
A genome-wide, whole brain approach to investigate genetic effects on neuroimaging phenotypes for identifying quantitative trait loci is described. The Alzheimer's Disease Neuroimaging Initiative 1.5 T MRI and genetic dataset was investigated using voxel-based morphometry (VBM) and FreeSurfer parcellation followed by genome-wide association studies (GWAS). One hundred forty-two measures of grey matter (GM) density, volume, and cortical thickness were extracted from baseline scans. GWAS, using PLINK, were performed on each phenotype using quality-controlled genotype and scan data including 530,992 of 620,903 single nucleotide polymorphisms (SNPs) and 733 of 818 participants (175 AD, 354 amnestic mild cognitive impairment, MCI, and 204 healthy controls, HC). Hierarchical clustering and heat maps were used to analyze the GWAS results and associations are reported at two significance thresholds (p<10−7 and p<10−6). As expected, SNPs in the APOE and TOMM40 genes were confirmed as markers strongly associated with multiple brain regions. Other top SNPs were proximal to the EPHA4, TP63 and NXPH1 genes. Detailed image analyses of rs6463843 (flanking NXPH1) revealed reduced global and regional GM density across diagnostic groups in TT relative to GG homozygotes. Interaction analysis indicated that AD patients homozygous for the T allele showed differential vulnerability to right hippocampal GM density loss. NXPH1 codes for a protein implicated in promotion of adhesion between dendrites and axons, a key factor in synaptic integrity, the loss of which is a hallmark of AD. A genome-wide, whole brain search strategy has the potential to reveal novel candidate genes and loci warranting further investigation and replication.
Recent advances in brain imaging and high throughput genotyping techniques enable new approaches to study the influence of genetic variation on brain structure and function (Bearden et al., 2007; Cannon et al., 2006; Glahn et al., 2007a; Meyer-Lindenberg and Weinberger, 2006; Potkin et al., 2009a). The NIH Alzheimer's Disease Neuroimaging Initiative (ADNI) is an ongoing 5-year public–private partnership to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), genetic factors such as single nucleotide polymorphisms (SNPs), other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). Given the availability of genome-wide SNP data and repeat structural and functional neuroimaging data as part of this initiative, ADNI provides a suitable data set for a large scale imaging genetics study. Using the ADNI baseline MRI data set, we present an imaging genetics framework that employs a whole genome and whole brain strategy to systematically evaluate genetic effects on brain imaging phenotypes for discovery of quantitative trait loci (QTLs).
Imaging genetics is an emergent transdisciplinary research field where the association between genetic variation and imaging measures as quantitative traits (QTs) or continuous phenotypes is evaluated. Imaging genetics studies have certain advantages over traditional case control studies. QT association studies have been shown to have increased statistical power and thus decreased sample size requirements (Potkin et al., 2009b). In addition, imaging phenotypes may be closer to the underlying biological etiology of the disease making it easier to identify underlying genes (e.g., Potkin et al., 2009a). Given these observations, the method proposed in this paper focuses on identifying strong associations between regional imaging phenotypes as QTs and SNP genotypes as QTLs and aims to provide guidance for refined statistical modeling and follow-up studies of candidate genes or loci.
SNPs and other types of polymorphisms in single genes such as APOE have been related to neuroimaging measures in both healthy controls and participants with brain disorders such as MCI and AD (e.g., Lind et al., 2006; Wishart et al., 2006). However, the analytic tools that relate a single gene to a few imaging measures are insufficient to provide insight into the multiple mechanisms and imaging manifestations of these complex diseases. Genome-wide association studies (GWAS) are increasingly performed (Balding, 2006; Hirschhorn and Daly, 2005; Purcell et al., 2007; Zondervan and Cardon, 2007), but effectively relating high throughput SNP data to large scale image data remains a challenging task. As pointed out by Glahn et al. (2007b), in imaging genetics, prior studies typically make significant reduction in one or both data types in order to complete analyses. For example, whole brain studies usually focus on a small number of genetic variables (e.g., Ahmad et al., 2006; Brun et al., in press; Filippinia et al., 2009; Nichols and Inkster, 2009; Pezawas et al., 2004; Shen et al., 2007), while whole genome studies typically examine a limited number of imaging variables (e.g., Baranzini et al., 2009; Potkin et al., 2009a; Seshadri et al., 2007). This restriction of target genotypes and/or phenotypes greatly limits our capacity to identify important relationships.
To overcome this limitation, we present a whole genome and whole brain search strategy for discovering imaging genetics associations to guide further detailed analyses. In addition, we present the results from implementation of this technique, including the identification of new genetic loci potentially involved in hippocampal and global brain atrophy associated with MCI and AD. In the present study, a detailed set of regions of interest (ROIs) extracted using voxel-based morphometry (VBM) and FreeSurfer automated parcellation defined 142 imaging phenotypes from across the brain (Risacher et al., 2009). A separate GWAS analysis using PLINK software (Purcell et al., 2007) was completed for each of these 142 imaging phenotypes. Hierarchical clustering and heat maps (Eisen et al., 1998) were used to display and evaluate the association patterns between top SNPs and top imaging phenotypes for multiple statistical thresholds. Subsequent pattern analysis of these heat maps not only confirmed prior findings (e.g., APOE and TOMM40 SNPs were among the top ranked list) but also revealed novel QTLs which warranted further analyses. Two types of refined imaging genetics analysis were performed for one of the top SNPs (NXPH1, rs6463843), including a VBM analysis assessing global grey matter (GM) density and a regional analysis of target phenotypes. These focused analyses resulted in interesting imaging genetics findings about the target SNP, including an overall and regional decrease in GM density associated with TT genotype relative to the GG genotype with an increased vulnerability to this effect in AD participants.
Data used in the preparation of this article were obtained from the ADNI database (http://www.loni.ucla.edu/ADNI). The following data from 818 ADNI participants were downloaded from the ADNI database: all baseline 1.5 T MRI scans, the Illumina SNP genotyping data, demographic information, APOE genotype, and baseline diagnosis information. Two participants had genotypic data but no baseline MRI scans and were excluded from all analyses.
The ADNI was launched in 2004 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public–private partnership. The Principle Investigator of this initiative is Michael W. Weiner, M.D., VA Medical Center and University of California-San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations. Presently, more than 800 participants, aged 55 to 90 years, have been recruited from over 50 sites across the United States and Canada, including approximately 200 cognitively normal older individuals (i.e., healthy controls or HCs) to be followed for 3 years, 400 people with MCI to be followed for 3 years, and 200 people with early AD to be followed for 2 years. Baseline and longitudinal imaging, including structural MRI scans collected on the full sample and PIB and FDG PET imaging on a subset are collected every 6–12 months. Additional baseline and longitudinal data including other biological measures (i.e. cerebrospinal fluid (CSF) markers, APOE and full-genome genotyping via blood sample) and clinical assessments including neuropsychological testing and clinical examinations are also collected as part of this study. Written informed consent was obtained from all participants and the study was conducted with prior institutional review board's approval. Further information about ADNI can be found in the study of Jack et al. (2008) and Mueller et al. (2005a,b) and at www.adni-info.org.
Single nucleotide polymorphism (SNP) genotyping for more than 620,000 target SNPs as was completed on all ADNI participants using the following protocol. Seven milliliters of blood was taken in EDTA containing vacutainer tubes from all participants and genomic DNA was extracted using the QIAamp DNA Blood Maxi Kit (Qiagen, Inc., Valencia, CA) following the manufacturer's protocol. Lymphoblastoid cell lines were established by transforming B lymphocytes with Epstein-Barr virus as described by Neitzel (1986). Genomic DNA samples were analyzed on the Human610-Quad BeadChip (Illumina, Inc. San Diego, CA) according to the manufacturer's protocols (Infinium HD Assay; Super Protocol Guide; Rev. A, May 2008). Before initiation of the assay, 50 ng of genomic DNA from each sample was examined qualitatively on a 1% Tris–acetate–EDTA agarose gel to check for degradation. Degraded DNA samples were excluded from further analysis. Samples were quantitated in triplicate with PicoGreen® reagent (Invitrogen, Carlsbad, CA) and diluted to 50 ng/μl in Tris–EDTA buffer (10 mM Tris,1 mM EDTA, pH 8.0). DNA (200 ng) was then denatured, neutralized, and amplified for 22 h at 37 °C (this is termed the MSA1 plate). The MSA1 plate was fragmented with FMS reagent (Illumina) at 37 °C for 1 h, precipitated with 2-propanol, and incubated at 4 °C for 30 min. The resulting blue precipitate was resuspended in RA1 reagent (Illumina) at 48 °C for 1 h. Samples were then denatured (95 °C for 20 min) and immediately hybridized onto the BeadChips at 48 °C for 20 h. The BeadChips were washed and subjected to single base extension and staining. Finally, the BeadChips were coated with XC4 reagent (Illumina), dessicated, and imaged on the BeadArray Reader (Illumina). The Illumina BeadStudio 3.2 software was used to generate SNP genotypes from bead intensity data. All SNP genotypes are publicly available for download at the ADNI website (http://www.loni.ucla.edu/ADNI).
Two widely employed automated MRI analysis techniques were used to process and extract brain-wide target MRI imaging phenotypes from all baseline scans of ADNI participants as previously described (Risacher et al., 2009). First, voxel-based morphometry (VBM; Ashburner and Friston, 2000; Good et al., 2001; Mechelli et al., 2005) was performed to define global grey matter (GM) density maps and extract local GM density values for 86 target regions (Table 1). Second, automated parcellation via FreeSurfer V4 (http://surfer.nmr.mgh.harvard.edu/) was conducted to define 56 volumetric and cortical thickness values (Table 2). All included ADNI participants had a minimum of two 1.5 T MP-RAGE scans at baseline following the ADNI MRI protocol (Jack et al., 2008). Each raw scan was independently processed using FreeSurfer and VBM.
For VBM analysis, SPM5 (http://www.fil.ion.ucl.ac.uk/spm/) was used to create an unmodulated normalized GM density map (1×1×1 mm voxel size, 10 mm FWHM Gaussian kernel for smoothing) in the MNI space for each scan as previously described (Risacher et al., 2009). A mean GM density map was created as an average of two independent smoothed, unmodulated normalized GM density maps for each participant using SPM5. The MarsBaR region of interest (ROI) toolbox (Brett et al., 2002; Tzourio-Mazoyer et al., 2002) as implemented in SPM5 was then used to extract a single mean GM density value for 86 target regions in MNI space (Table 1) to be used as target QTs for the imaging genetic analyses. In addition to the individual MarsBaR ROIs, larger target regions defined by combining the mean GM density value from a set of MarsBaR ROIs were used as imaging phenotypes. All individual and combined mean GM density values are referred to as VBM phenotypes; see Table 1 for a total list and explanation of the 86 VBM phenotypes.
For automated segmentation and parcellation, FreeSurfer V4 was employed to automatically label cortical and subcortical tissue classes using an atlas-based Bayesian segmentation procedure (Dale et al., 1999; Fischl and Dale, 2000; Fischl et al., 2002, 1999) and to extract target region volume and cortical thickness, as well as to extract total intracranial volume (ICV) for all participants. Extracted FreeSurfer values for two independently processed MP-RAGE images of the same participant were averaged to create a mean value for volumetric and cortical thickness measures for all target regions. Mean volumetric and cortical thickness measures extracted using automated parcellation are referred to as FreeSurfer phenotypes; see Table 2 for a total list of the 56 FreeSurfer phenotypes defined for selected target regions.
The APOE gene is an important target gene in AD research (Farrer et al., 1997). However, the two previously identified APOE SNPs important in AD susceptibility (rs429358, rs7412) were not available on the Illumina array. Therefore, we determined the genotypes of the two APOE SNPs (rs429358, rs7412) using the APOE ε2/ε3/ε4 status information from the ADNI clinical database for each participant.
The original genotype data contained 620,903 markers, including 620,901 genomic markers on the Illumina chip plus 2 APOE SNPs whose values were obtained from the APOE status data. Only SNP markers were analyzed in this study. The following quality control (QC) steps were performed on these genotype data using the PLINK software package (http://pngu.mgh.harvard.edu/~purcell/plink/), release v1.06. SNPs were excluded from the imaging genetics analysis if they could notmeet any of the following criteria: (1) call rate per SNP ≥90%, (2) minor allele frequency (MAF) ≥5%, and (3) Hardy–Weinberg equilibrium test of p≤10−6 using healthy control (HC) subjects only. Participants were excluded from the analysis if any of the following criteria was not satisfied: (1) call rate per participant ≥90% (1 participant was excluded); (2) gender check (2 participants were excluded); and (3) identity check (3 sibling pairs were identified with PI_HAT over 0.5; one participant from each pair was randomly selected and excluded). Population stratification analysis suggested the advisability of restricting analyses to non-Hispanic Caucasians (79 participants were excluded from this report). After the QC procedure, 733 out of 818 participants and 530,992 out of 620,903 markers remained in the analysis and the overall genotyping rate for the remaining dataset was over 99.5%.
One hundred forty-two separate GWAS analyses on 142 selected imaging phenotypes (86 VBM phenotypes and 56 FreeSurfer phenotypes) were completed using the quality-controlled SNP data. All the imaging phenotypes were adjusted for the baseline age, gender, education, handedness, and baseline intracranial volume (ICV) using the regression weights derived from the HC participants, prior to any of the GWAS analyses (Risacher et al., 2009). Using the PLINK software package (v1.06) with the quantitative trait association option, each GWAS analysis calculated the main effects of all SNPs on the target quantitative imaging phenotype. An additive SNP effect was assumed and the empirical p-values were based on the Wald statistic (Purcell et al., 2007). Right hippocampal GM density was selected for a detailed sample analysis of a target QC because it had the largest number of associations at p<10−6. A Manhattan plot and a quantile-quantile (Q–Q) plot were used to visualize GWAS results for the right hippocampal GM density. All association results surviving the significance threshold of p<10−6 were saved and prepared for additional pattern analysis.
The sample employed in the GWAS analyses of FreeSurfer phenotypes included participants that passed the genotype QC procedure and FreeSurfer processing. The sample used in the GWAS analyses of VBM phenotypes included participants that passed the genotype QC procedure, FreeSurfer processing, and VBM processing. Demographic information, including baseline age, years of education, gender distribution, and handedness distribution, was compared between baseline diagnostic groups for each sample separately using one-way ANOVAs and chi-squared analyses as applicable in SPSS (version 16.0.1).
To expedite the review of GWAS results and data reduction for subsequent analyses, we employed heat map and hierarchical clustering approaches (Eisen et al., 1998; Levenstien et al., 2003; Sloan et al., submitted for publication) for visualizing associations between identified SNPs and their associated imaging phenotypes at various significance levels. Heat maps are colored images mapping given values (in this study, −log10(p) of the corresponding association) to coded colors. Generally, heat maps have dendrograms, representing hierarchical clustering results along both the x-axis and y-axis (in this study, x: imaging phenotypes, y: SNPs). R (v.2.9.0) (http://www.r-project.org/), an open source statistical computing package, was employed to create the heat maps. Hierarchical clustering was completed using Euclidean distance methods to define dissimilarity between two nodes and average of distances between all pairs of objects in two clusters to measure the distance between two clusters. On each heat map, significant associations between imaging phenotypes and SNPs were marked with an “x” to facilitate visual evaluation of the results. The color bar on the left side of the heat map encodes the chromosome IDs for the corresponding SNPs. In addition to the heat maps, a summary statistic detailing the number of significant associations at the p<10−6 level for each imaging phenotype and SNP was evaluated to help guide the refined analyses. In the present study, all imaging GWAS results are presented and analyzed using heat maps and summary statistics.
An in-depth analysis was performed for one of the top SNPs selected by inspecting the heat maps and summary statistics. The refined analysis included two steps: (1) a global voxel-based analysis on the entire brain using VBM and (2) regional analyses of identified target phenotypes. We included both types of analyses as they provide complementary information relevant to assessing risk for AD or disease progression (Risacher et al., 2009; Saykin et al., 2006).
For global analyses, VBM was performed on a voxel-by-voxel basis using a general linear model (GLM) approach as implemented in SPM5. After identifying the SNP of interest, a two-way ANOVA assessing the effects of baseline diagnostic group and SNP genotype value was performed to compare the smoothed, unmodulated normalized GM maps to determine any significant effects of diagnosis, SNP genotype, and SNP-by-diagnosis interactions on global GM density between and within groups. Contrasts between genotypes were displayed with a significance threshold of p<0.01 corrected for multiple comparisons using a false discovery rate (FDR) technique when including the entire sample. For contrasts within a single diagnostic group, the p<0.01 (FDR) threshold was too stringent given the reduced power and no significant voxels were observed. Therefore, we used a slightly less stringent significance threshold of p<0.001 (uncorrected for multiple comparisons) when examining SNP effects with in a diagnostic group, in order to evaluate the pattern of GM density associated with genotype. A minimum cluster size (k) of 27 voxels was required for significance in all comparisons and an explicit GM mask was used to restrict analyses to GM regions. Age, gender, education, handedness and baseline ICV were included as covariates in all analyses.
For ROI analyses, a two-way multivariate ANOVA in SPSS (version 16.0.1) was completed to determine the effect of baseline diagnosis and genotype on bilateral hippocampal and mean medial temporal lobar GM density. Similar to the VBM analysis, age, gender, education, handedness, and baseline ICV were included as covariates in all comparisons. Independent effects of baseline diagnosis and genotype, as well as the interaction effect of baseline diagnosis×genotype for each SNP, were assessed for selected imaging variables. All graphs were created using SigmaPlot (version 10.0).
After quality control of the genotyping data including the exclusion of 79 participants to avoid potential population stratification confounds, 733 out of 818 ADNI participants remained in the present study. Among these 733 participants, 729 sets of scans were successful in FreeSurfer segmentation and parcellation and were included in GWAS analyses of FreeSurfer phenotypes (56 volumetric and cortical thickness values described in Table 2). Seven hundred fifteen participants had successful VBM processing and were used in GWAS analyses of VBM phenotypes (86 GM density values described in Table 1). Table 3 shows the demographics information of the sample analyzed for both FreeSurfer and VBM studies. In both samples, gender and education are significantly different (overall p<0.05) among baseline diagnostic groups (HC, MCI, AD). In the subsequent GWAS analyses, baseline age and gender, as well as education, handedness, and baseline ICV are included as covariates.
For convenience, in this paper, an SNP is described by its rs number together with its respective gene (i.e., the closest gene, as annotated in Illumina's Human610-Quad SNP list). Shown in Fig. 1 are all the imaging genetics associations at a significance threshold of p<10−7 (a typical threshold for genome-wide significance), which are discovered by GWAS analysis of 142 imaging phenotypes (i.e., quantitative traits, or QTs).
At the p<10−7 significance level, 22 strong SNP-QT associations (see blocks labeled with “x” in Fig. 1) were identified in the GWAS analyses, and five SNPs were involved in these associations. As a well-established AD risk factor (Farrer et al., 1997), the APOE SNP rs429358 confirmed to have multiple associations with both FreeSurfer QTs and VBM QTs, showing as the most prominent imaging genetics pattern at the significance level of p<10−7. In addition, associations with multiple FreeSurfer QTs were identified for rs2075650 (TOMM40), supporting the recent finding of TOMM40 as a gene adjacent to APOE and an additional contributor to AD (Osherovich, 2009; Potkin et al., 2009a). Three additional SNPs were found to have strong associations with one or more VBM QTs: rs6463843 (NXPH1), rs4692256 (LOC391642), and rs10932886 (EPHA4). Further information about these SNPs is available in Table 4.
A number of imaging phenotypes were identified to have strong associations with target SNPs in the GWAS analyses, suggesting that these values may be sensitive QTs to imaging genetics studies of AD. As expected, both the left and right amygdalar and hippocampal regions were found to be strongly associated with rs429358 (APOE) using volumetric and GM density measures. In addition, rs2075650 (TOMM40) was significantly associated with bilateral hippocampal volume and left amygdalar volume. Additional imaging phenotypes found to be sensitive QTs, include (a) volume measures from the right cerebral cortex and cerebral white matter, (b) cortical thickness measures from left and right inferior parietal gyri, and right middle temporal gyrus, and (c) GM density measures from the left middle orbital frontal gyrus, left precuneus, left superior frontal gyrus, and left and right mean frontal lobe regions (see MeanFrontal definition in Table 1).
Heat maps of clustered associations at a somewhat less stringent significance level (p<10−6) are shown in Fig. 2. As expected, more SNPs and QTs are involved. The top 10 SNPs and their respective genes ranked by the total number of significant QT associations at p<10−6 are shown in Table 4. With more SNPs and QTs available in the heat maps, interesting clustering patterns in both the imaging and genetics dimensions were revealed by examining the corresponding dendrograms (i.e., hierarchical clustering results). In the imaging dimension (x-axis), many pairs of left and right measures of the same structure were clustered together, supporting the symmetric relationship between these phenotypes and genetic variation. In addition, regional similarity was also detected including a prominent pattern of multiple orbital frontal measures clustered together in Fig. 2b. In the genomic dimension (y-axis), three SNPs from LOC391642 were grouped together in Fig. 2b, suggesting an increased likelihood of linkage disequilibrium (LD) effects.
Subsequent analyses focused on a target QT and a target SNP selected from heat maps in Fig. 2. Shown in Fig. 3 are the Manhattan and Q–Q plots of the GWAS for the target QT, right hippocampal GM density (RHippocampus in Fig. 2b). In the Q–Q plot, for most of the p-values, the observed p-values from GWAS are almost the same as the expected p-values from the null hypothesis. There was little or no evidence of systematic bias, which could be caused by factors such as a strong population substructure and genotyping artifacts. The p-values in the upper tail of the distribution do show a significant deviation suggesting strong associations between these SNPs and the QT.
A target SNP, rs6463843 (NXPH1), was selected for detailed imaging analyses since it was the only SNP strongly associated with both left and right hippocampi other than rs429358 (APOE) and rs2075650 (TOMM40). The results of a two-way ANOVA using VBM to compare the effects of baseline diagnostics group and rs6463843 (NXPH1) genotype on global GM density are shown in Fig. 4. After evaluating hippocampal GM density group means for each diagnosis-genotype group, we chose to contrast GG vs. TT (GG>TT) using all participants (n=715; 166 AD (44 TT, 78 GT, 44 GG); 346 MCI (82 TT, 170 GT, 94 GG); 203 HC (35 TT, 105 GT, 63 GG)). As shown in Fig. 4a, TT participants had significantly reduced global GM density throughout the brain relative to GG participants (p<0.01 (FDR), k=27). Maximal differences between groups were found in a number of regions known to be associated with AD, including the medial temporal lobe (−36, −30, −17; T=5.20) and frontal (19, 56, −15; T=5.56), parietal (26, −59, 67; T=5.71) and temporal (−59, 2, −30; T=4.81) lobe cortical surfaces. In order to determine whether a particular diagnostic group was responsible for the effects seen in the full sample contrast of GG>TT, we evaluated the same comparison within each baseline diagnostic group (Fig. 4b; AD, MCI, HC). The pattern of significant voxels for GG>TT was largest in the AD group, with highly significant clusters in the right hippocampus (31, −26, −15; T=5.34), left medial temporal lobe (−25, −32, −7; T=4.37), and frontal lobe (−35, 49, −13; T=4.33). MCI and HC groups also showed significant voxels in the contrast of GG>TT, with maximum voxels found in the inferior frontal lobe (45, 25, −13; T=3.82) and middle frontal lobe (−25, 6, 62; T=4.58), respectively. The AD panel in Fig. 4b showed more prominent patterns, while the MCI and HC panels appeared less structured. This suggested a possible SNP-by-diagnosis interaction effect on brain structure, which is examined below at a more detailed level for several candidate imaging phenotypes. Furthermore, the inclusion of APOE genotype as a covariate did not significantly alter these effects (data not shown).
Based on the heat map and VBM results, four GM density measures were further evaluated as phenotypes for additional associations with rs6463843 (NXPH1). As shown in Fig. 5, expected baseline diagnostic differences in left (Fig. 5a; F(7,708)=79.4, p<0.001) and right (Fig. 5b; F(7,708)=78.4, p<0.001) hippocampal GM density, as well as left (Fig. 5c; F(7,708)=60.3, p<0.001) and right (Fig. 5d; F(7,708)=59.4, p<0.001) mean medial temporal lobe GM density were found. Pairwise comparisons indicated that AD participants had significantly reduced hippocampal and mean medial temporal lobe GM density relative to both MCI and HC participants (all p<0.001). MCI participants also showed a significantly reduced GM density in all these regions relative to HCs (p<0.001). The main effect of genotype across all participants was also significant for left and right hippocampal GM density (left, F (7,708)=10.4; right, F(7,708)=9.9, both p<0.001) and left and right mean medial temporal lobe GM density (left, F(7,708)=7.9; right, F(7,708)=9.0, both p<0.001). Paired comparisons indicated significantly reduced left and right hippocampal and mean medial temporal lobar GM density in participants with a TT genotype relative to those with a GG genotype in the rs6463843 (NXPH1) SNP (p<0.01). In addition, participants with the TT genotype had significantly reduced left and right mean medial temporal lobe GM density relative to TG heterozygotes (p<0.01). The interaction effect of baseline diagnosis and rs6463843 genotype was also significant for right hippocampal GM density (p<0.05), but not for the other three regions, which suggested that AD patients with TT genotype were particularly vulnerable to increased GM density loss in right hippocampus.
Employing a whole genome and entire brain strategy, we presented an imaging genetics methodological framework for systematically identifying associations between genotypes and imaging phenotypes, and demonstrated the utility of this method using the ADNI cohort. Our imaging genetics method can be broadly summarized as the following four steps after quality control and preprocessing: (1) imaging phenotype definition, (2) GWAS of image phenotypes, (3) cluster and heat map analysis of imaging GWAS results, and (4) refined statistical modeling.
Eight-six GM density ROI measures and 56 volume and cortical thickness ROI measures were extracted, using VBM and FreeSurfer methods respectively, and analyzed as image phenotypes in independent GWAS analyses. This approach is complementary to another recently proposed imaging genetics analysis method, voxel-wise GWAS (vGWAS) (Stein et al., submitted for publication). The vGWAS technique explores SNP associations with all voxels in the image space. Our study is ROI-based, analyzing fewer but anatomically meaningful imaging phenotypes and thus, requires less computational resources. In addition, we used multiple techniques to define imaging phenotypes. Among the top 5 SNPs identified as part of the present study (Table 4), rs10932886 (EPHA4), rs7610017 (TP63) and rs6463843 (NXPH1) are primarily associated with VBM QTs, rs2075650 (TOMM40) is associated with FreeSurfer QTs, and rs429358 (APOE) is associated with ROIs extracted using both techniques. These results suggest that the VBM and FreeSurfer QTs are not equally sensitive to the same genetic markers and consequently may provide complementary information. The VBM measures we employed are not modulated (Good et al., 2001) and therefore measure GM densities (Ashburner and Friston, 2000), which are different from the volume and thickness measures that FreeSurfer generates for analysis. The complementary nature of GM density, volumetric, and cortical thickness ROIs in assessing of early AD, MCI, and pre-MCI samples is consistent with our recent findings examining ADNI baseline MRI data (Risacher et al., 2009) as well as an independent cohort (Saykin et al., 2006).
Following quality control of the genotyping data, genome-wide association studies were conducted on each of the 142 imaging phenotypes. The entire set of the GWAS analyses was performed and completed on a 112-node parallel computing environment within 20 min, suggesting an excellent potential for larger scale future extensions. One extension could be to investigate more sophisticated statistical models (e.g., exploring SNP-by-SNP or SNP-by-diagnosis interactions). Another extension could be to involve more imaging phenotypes from other imaging modalities or longitudinal data.
Heat maps and hierarchical clustering have been used frequently for grouping results in gene expression analysis for pattern discovery (Eisen et al., 1998; Levenstien et al., 2003). In imaging genetics, heat maps can be equally useful for performing relevant pattern analysis tasks thanks to the rich information contained within the maps and their effective mechanism to organize and visualize complicated imaging GWAS results. A straightforward use of a heat map is to select target QTs, SNPs, or associations for further analyses. Due to its intuitive representation, some obvious patterns (e.g., the APOE SNP in Fig. 1) can be easily identified. For less obvious cases, other criteria could be used, for example, the selection of rs6463843 (NXPH1) because of its associations with multiple candidate phenotypic regions (i.e., hippocampus) affected by AD (Fig. 2b). In addition, a heat map can also be used to discover new patterns or structures. All the QTs and SNPs are hierarchically clustered as dendrograms on the x-axis and y-axis, respectively. In the genomic domain, for those SNP clusters that do not match the existing LD relationships, the dendrogram provides the ability to identify novel inter-SNP structures (e.g., Sloan et al., submitted for publication). In the imaging domain, for those phenotype clusters that do not follow a regional or bilaterally symmetric pattern, there might be an opportunity to identify an underlying brain connectivity pattern associated with a genetic variation.
In this paper, each heat map includes all the strong associations at a given significance threshold level, and can be used to guide further analyses using refined statistical models (e.g., involving diagnosis and other biomarkers, addressing interaction effects, etc.). These analyses can be performed using different strategies as follows: (1) select a target phenotype from the heat map and examine its whole genome mapping (e.g., Fig. 3); (2) pick a target SNP from a heat map and perform detailed image analysis (e.g., Fig. 4); and (3) choose a target SNP-QT association based on a heat map and/or an imaging analysis results, and perform a refined statistical modeling (e.g., Fig. 5). In this study, we conducted sample analyses for each of the above cases. The ultimate goal of these types of analyses is to identify genetic markers affecting brain structure and function, how these imaging and genetic markers interact with each other, as well as with diagnosis and/or other clinically and biologically relevant measures, and to gain a better understanding of disease risk and pathophysiology.
The APOE SNP rs429358 and TOMM40 SNP rs2075650 were confirmed to be top markers affecting multiple brain structures in a mixed population of HC, MCI and AD (Farrer et al., 1997; Osherovich, 2009; Potkin et al., 2009a). Other SNPs, including rs10932886 (EPHA4), rs7610017 (TP63) and rs6463843 (NXPH1), were also among the top markers influencing brain structures in our analysis (Table 4). These SNPs and the genes in which they are found or flank have a number of important functions and potential pathways through which they may influence the pathophysiological processes underlying AD.
The EPHA4 [EPH receptor A4] gene belongs to the ephrin receptor subfamily of the protein-tyrosine kinase family (Fox et al., 1995). The interaction between neuronal EphA4 and glial ephrin-A3 was found to bidirectionally control synapse morphology and glial glutamate transport, which may ultimately regulate hippocampal function (Carmona et al., 2009). In addition, EphA4 and EphB2 receptors were reported to be reduced in the hippocampus before the development of impaired object recognition and spatial memory in transgenic mouse models of AD (Simon et al., 2009). The TP63 [Tumor protein 63] gene encodes a member of the p53 family of transcription factors (Yang et al., 1998). A literature search did not locate any articles associating TP63 with AD, cognitive impairment or neurode-generation. Additional imaging genetics analyses on both rs10932886 (EPHA4) and rs7610017 (TP63) appear warranted for future study.
The NXPH1 [Neurexophilin 1] gene is a member of the neurexophilin family and encodes a secreted protein which features a variable N-terminal domain, a highly conserved, N-glycosylated central domain, a short linker region, and a cysteine-rich C-terminal domain. This protein forms a very tight complex with alpha neurexins, a group of proteins that promote adhesion between dendrites and axons (Missler and Sudhof, 1998). This gene has previously been implicated as a candidate gene for neuroticism (van den Oord et al., 2008). In the present study, a VBM analysis of rs6463843 (NXPH1) revealed significantly reduced global and regional GM density in participants with the TT genotype relative to those with the GG genotype. Additional analyses indicated an interaction between rs6463843 (NXPH1) and baseline diagnostic group in which AD patients homozygous for the T allele were differentially vulnerable to decreased GM density in the right hippocampus, a finding presumably reflecting greater atrophy associated with this genotype in patients with AD.
Heat maps of imaging genetics associations at two significance threshold levels (p<10−7 and p<10−6) were also reported. At the conventional p<10−7 significance threshold, measures of hippocampal and amygdalar GM density and volume were strongly associated with the APOE and TOMM40 SNPs. Ten additional imaging phenotypes were strongly associated with at least one of the top SNPs (Fig. 1). We also examined a somewhat less stringent threshold (p<10−6) in order to identify additional SNP and imaging QT associations, as well as to examine patterns of genotype and phenotype clustering. SNPs associated with multiple unrelated or loosely related imaging phenotypes may represent an interesting genetic marker affecting overall brain structure or neurodegeneration. In addition, imaging variables associated with a number of SNPs from multiple genes may be particularly sensitive phenotypic markers for examining disease associated genetic variation. Therefore, heat maps at multiple statistical thresholds are useful in identifying candidate SNPs and imaging phenotypes warranting further investigation.
The majority of analyses presented in this study focused on the extraction and evaluation of imaging phenotypes and the relationship of genetic variation to these phenotypes. However, we also included a limited assessment of the effects of baseline diagnostic group and the interaction effect of SNP and diagnosis in the analysis of candidate SNPs and phenotypes. Future studies could incorporate additional variables (e.g., clinical measures, other types of imaging and biomarkers) in the GWAS design to examine their effects and interactions with SNPs and/or target imaging phenotypes. The present analysis did not address epistasis or gene–gene interactions, a potentially very important topic. Future analyses should include models that incorporate epistatic interactions which are likely to be important for understanding susceptibility and protective factors in AD and other complex diseases.
Although we employed reasonably stringent thresholds for assessing genome-wide significance, a large number of ROIs represent a multiple comparison problem. The issue of determining the proper statistical threshold for a whole genome and whole brain search for associations is a challenging area for investigation (Nichols and Holmes, 2002; Nichols and Inkster, 2009; Stein et al., submitted for publication). The issue is complicated by the fact that variables within both the genomic and neuroimaging dimensions are non-independent due to LD and spatial autocorrelation, respectively. The determination of the effective number of independent statistical tests under these conditions is an area of investigation. Models for the joint distribution of both dimensions under the null hypothesis require development and validation.
Replication of current and future GWAS results in independent samples will remain of critical importance for confirmation. Although our follow-up analyses examine additional statistics at a more detailed level for yielding additional insights, these statistics are non-independent of the statistics used to select candidate ROIs and candidate associations. Given the recent interest in the non-independent analysis issue (e.g., Kriegeskorte et al., 2009), independent datasets for replication will be important for future studies to confirm the findings. For the current ADNI sample, given its modest size, we were unable to use one half of the data for hypothesis generation and the other half for confirmation, since one half of the data (i.e., n=367 in this study) cannot provide sufficient power to detect moderate/small genetic effects (Potkin et al., 2009b). With additional replication and extension opportunities under development, we anticipate that there will be ample statistical power and the ability to replicate potentially important findings in multiple independent data sets in the future.
At present there are few opportunities for replication of imaging genetics results such as those emerging from ADNI given the unique nature of this multi-dimensional data set. Fortunately, a worldwide ADNI consortium is actively being developed and large scale international data sets are likely to become available in the next few years that can provide adequate replication samples. In addition, the new NIH sponsored AD Genetics Consortium (ADGC) is assembling large meta-analytic databases of GWAS results that can provide confirmation of novel findings. Finally, the AlzGene meta-analytic database (www.alzgene.org) of candidate genes for AD, curated by Lars Bertram and colleagues (Bertram et al., 2007), provides a regularly updated source for determining the replication and validation status of AD genes.
The AAL atlas (Tzourio-Mazoyer et al., 2002) used to create the ROIs for the VBM analysis in this study is based on a single individual. To take anatomical variability into account, an important future direction will be to employ a probabilistic atlas, e.g., the Harvard-Oxford atlas (distributed with the FSL software package; http://fsl.fmrib.ox.ac.uk/fsl/), or the LONI probabilistic brain atlas (Shattuck et al., 2008). The most appropriate method to derive a GM-based summary statistic (e.g., density or volume) for a probabilistic ROI is a topic warranting investigation.
Despite the limitations and challenges, the encouraging experimental results obtained using the proposed analytic framework appear to have substantial potential for enabling the discovery of imaging genetics associations and for localizing candidate imaging and genomic regions for refined statistical modeling and further characterization. Ultimately, imaging genetics holds the promise of providing important clues to pathophysiology that could inform development of methods for earlier detection and therapeutic intervention.
Data collection and sharing for this project were funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI; principal investigator: Michael Weiner; NIH grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and through generous contributions from the following: Pfizer Inc., Wyeth Research, Bristol-Myers Squibb, Eli Lilly and Company, GlaxoSmithKline, Merck & Co. Inc., AstraZeneca AB, Novartis Pharmaceuticals Corporation, Alzheimer's Association, Eisai Global Clinical Development, Elan Corporation plc, Forest Laboratories, and the Institute for the Study of Aging, with participation from the U.S. Food and Drug Administration. Industry partnerships are coordinated through the Foundation for the National Institutes of Health. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory of Neuro Imaging at the University of California, Los Angeles.
Data analysis was supported in part by the following grants from the National Institutes of Health: NIA R01 AG19771 to A.J.S. and P30 AG10133 to Bernardino Ghetti, MD and NIBIB R03 EB008674 to L.S., by the Indiana Economic Development Corporation (IEDC 87884 to AJS), by Foundation for the NIH to A.J.S., and by an Indiana CTSI award to L.S.
The FreeSurfer and PLINK analyses were performed on a 112-node parallel computing environment, called Quarry, at Indiana University. We thank the University Information Technology Services at Indiana University for their support.
We thank the following people for their contributions to the ADNI genotyping project: (1) genotyping at the Translational Genomics Institute, Phoenix AZ: Jennifer Webster, Jill D. Gerber, April N. Allen, and Jason J. Corneveaux; and (2) sample processing, storage and distribution at the NIA-sponsored National Cell Repository for Alzheimer's Disease: Kelley Faber.