|Home | About | Journals | Submit | Contact Us | Français|
Imaging traits provide a powerful and biologically relevant substrate to examine the influence of genetics on the brain. Interest in genome-wide, brain-wide search for influential genetic variants is growing, but has mainly focused on univariate, SNP-based association tests. Moving to gene-based multivariate statistics, we can test the combined effect of multiple genetic variants in a single test statistic. Multivariate models can reduce the number of statistical tests in gene-wide or genome-wide scans and may discover gene effects undetectable with SNP-based methods. Here we present a gene-based method for associating the joint effect of single nucleotide polymorphisms (SNPs) in 18,044 genes across 31,662 voxels of the whole brain in 731 elderly subjects (mean age: 75.56 ± 6.82SD years; 430 males) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Structural MRI scans were analyzed using tensor-based morphometry (TBM) to compute 3D maps of regional brain volume differences compared to an average template image based on healthy elderly subjects. Using the voxel-level volume difference values as the phenotype, we selected the most significantly associated gene (out of 18,044) at each voxel across the brain. No genes identified were significant after correction for multiple comparisons, but several known candidates were re-identified, as were other genes highly relevant to brain function. GAB2, which has been previously associated with late-onset AD, was identified as the top gene in this study, suggesting the validity of the approach. This multivariate, gene-based voxelwise association study offers a novel framework to detect genetic influences on the brain.
Recent efforts in imaging genetics have advanced the field rapidly from identifying heritable features of the brain to genome-wide searches for specific genetic variants that might account for functional and structural variations in large populations (Potkin et al., 2009a; Potkin et al., 2009b; Shen et al., 2010; Stein et al., 2010b; Thompson et al., 2010). Variation in the human genome may account for variations in brain integrity, and multi-national consortia have been set up to discover and verify genetic effects on brain images (e.g., the ENIGMA project; http://enigma.loni.ucla.edu). In imaging genomics, the vast amount of information in the images (>100,000 voxels) and across the genome (>12 million known variants) requires powerful methods to relate genetic variants to the structure and function of the brain. Power issues arise due to the small effect sizes, and the huge numbers of statistical comparisons. Most techniques use some type of data reduction, limiting the number of genetic variants studied or the number of imaging features studied, or both. The ultimate goal of these gene-hunting studies is to create a method that addresses the gene discovery problem in a statistically powerful and biologically meaningful way.
The current mainstay of gene-hunting efforts in imaging genetics is the genome-wide association study (GWAS). Most genetic association tests relate individual SNPs to phenotypes, but since there are, on average between 20 and 100 SNPS per gene (in our dataset), and alleles at these SNPs are often highly correlated, a method which tests all the SNPs in a gene at once (or most of the variance contributed by SNPS in a gene) would reduce the number of tests required and be more powerful. We will hereafter refer to SNP-Based approaches and gene-based approaches. This assesses associations between common SNPs and features in an image. In typical GWAS studies, each genetic variant (usually a SNP) is independently tested for its association to the phenotype – a mass univariate method, where no data reduction is used across the genome. For example, Stein et al. (2010b) performed a genome-wide search of around 500,000 SNPs, and found a novel variant in the GRIN2B gene that is associated with temporal lobe volume. The gene GRIN2B encodes a glutamate receptor that is already the target of drugs (memantine) used to treat Alzheimer’s disease (Parsons et al., 2007). Findings such as these are promising as they have biological relevance without relying on a prior hypothesis about any specific SNP. However, performing mass SNP-based tests on imaging summary measures (such as temporal lobe volume, hippocampal volume, etc.) or ad hoc regions of interest (ROI), collapses the brain measures into a single number. Studies using an ROI to define the imaging phenotype may miss fine-grained differences throughout the brain, across subjects. In addition, a predefined ROI can lead to false-negative results if a true association signal lies outside or only partially within a chosen ROI.
Several studies now perform genome-wide searches at each voxel across the brain (Hibar et al., 2010). This approach avoids pre-selecting an ad hoc region of interest in the brain and does not require prior hypotheses about which genetic variants, or which regions of interest, matter. Stein et al. (2010a) performed a genome-wide, brain-wide search, termed a voxelwise genome-wide association study (vGWAS), in 740 subjects from the ADNI. The experiment was extremely computationally intensive (27 hours on 500 nodes), performing around 16 trillion tests of association. However, the correction for multiple comparisons was commensurate with the number of tests performed. None of the variants identified was significant after multiple comparisons correction, but several variants were promising candidates for further analysis. In an alternative approach, Vounou et al. (2010) proposed a method that leverages the sparseness of signals to simultaneously select SNP variants and regions of association, reducing the number of SNPs and phenotypes tested. Future GWAS studies in imaging will likely reduce the number of tests and multiple comparisons using Bayesian priors. This can prioritize certain regions of the image or the genome, for later meta-analysis across multiple datasets.
Gene-based association methods complement single-marker GWAS for implicating underlying genetic variants in complex traits and diseases (Neale and Sham et al., 2004). Given recent advances in high-throughput genotyping, densely-packed sets of SNPs, or genetic markers, can capture increasing amounts of variation throughout the genome. Methods that consider combinations of SNPs from the same gene should better describe genetic associations than methods that rely on data from SNPs independently (Neale and Sham et al., 2004; Schaid et al., 2004). Whole-gene testing is a biologically plausible approach to the problem, as the ultimate unit of biological activity is the gene (or its protein product; Potkin et al., 2009c). By associating the joint effect of multiple SNPs within a gene, in this study we aimed to show that gene-based approaches can be more powerful than traditional SNP-based approaches (with the relative power depending on how the genetic variants affect the phenotype). For example, if a gene contains multiple causal variants with small individual effects, SNP-based methods will miss these associations if a very stringent significance threshold is used (as in GWAS). In addition, if multiple loci within a gene combine to jointly affect a phenotype, this may also be missed by traditional GWAS. These two scenarios are highly likely, especially if we accept the “common disease, common variant” hypothesis (Reich and Lander, 2001), but they are not accounted for in methods that test each SNP, one at a time.
A multi-SNP, gene-based test can consider the combined effect of each variant within the gene, while accounting for linkage disequilibrium (LD) or correlation between markers. As such, at least in theory it may detect associations missed by traditional SNP-based GWAS. Related to this approach is “multi-locus fitting” - a developing field in quantitative genetics, for the analysis of complex traits. Some multi-locus analyses use statistical methods specialized for handling high-dimensional data, including regularized regression methods such as ridge regression (Malo et al., 2008; Sun et al., 2009), the Bayesian lasso (Zou et al., 2006; Wu et al., 2009), and neural network models (Lucek et al., 1998; Ott et al., 2001). Another related approach is set-based association testing, implemented in the software Plink (Purcell et al., 2007), which allows for the combination of univariate test statistics into a single univariate test statistic using permutations. Gene-based tests also reduce the effective number of statistical tests by aggregating multiple SNP effects into a single test statistic. However, for gene-based tests to be feasible, the multivariate test statistics need to be computationally efficient to implement. Here we assessed whether it would be feasible to extend to a neuroimaging database, a gene-based association method using principal components regression (PCReg) as proposed by Wang and Abbott (2008) for single-valued traits. We applied PCReg across all genes, to a large database of voxelwise imaging data. We call our method a voxelwise “gene-wide” association study (vGeneWAS). By performing association tests on whole genes, we greatly reduce the number of tests (from 437,607 SNPs down to 18,044 genes) while avoiding the problems associated with focusing on ROIs or summary measures. Our framework shows how to conduct vGeneWAS studies, and identify gene variants that warrant further study.
We hypothesized that vGeneWAS would, in some situations, have greater power to detect associations than existing SNP-based methods. One such situation might be when a gene contains many loci with weak individual effects. In addition, we expected that vGeneWAS would have greater overall power than mass SNP-based methods, like vGWAS, because of the drastic reduction in the effective number of statistical tests performed.
ADNI is a large 5-year study initiated in 2003 as a public-private partnership between the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies, and non-pro t organizations. The ADNI study aims to identify and investigate biological markers of Alzheimer’s disease through a combination of neuroimaging, genetics, neuropsychological tests and other measures in order to develop new treatments, track disease progression, and lessen the time required for clinical trials. The study was conducted according to the Good Clinical Practice guidelines, the Declaration of Helsinki, and U.S. 21 CFR Part 50—Protection of Human Subjects, and Part 56—Institutional Review Boards. Written informed consent was obtained from all participants before protocol-speci c procedures were performed.
The study recruited 202 Alzheimer’s disease subjects (AD), 413 with mild cognitive impairment (MCI), and 237 normal elderly controls (NC) who were assessed every 6 or 12 months for three years. Subjects went through extensive clinical and cognitive tests at the time of each scan to determine and track diagnosis. Further information on inclusion criteria and the study protocol may be found online (http://adni-info.org/). Baseline structural MRI scans and genetic data for 818 subjects were obtained on or before May 5, 2010, from the public ADNI database (http://www.loni.ucla.edu/ADNI/). Scans for 852 subjects were available, but we excluded 121 subjects based on quality control measures (poor registration, image quality) and to avoid a well documented problem in statistical genetics known as population stratification (McCarthy et al., 2008). When performing association tests on latent subpopulations of different ethnicities or relatedness, spurious associations may arise due to differences in allele frequencies between groups, instead of true association with the phenotype (Lander and Schork, 2001). Subjects were removed based on self-reported ethnicity, later verified by multi-dimensional scaling (MDS) analysis (see previous study: Stein et al., 2010b), leaving 172 AD patients (78 women/94 men; mean age ± standard deviation=75.54 ± 7.62 years), 356 MCI subjects (130 women/226 men; mean age: 75.23±7.22), and 203 healthy elderly controls (93 women/110 men; mean age: 76.15 ± 4.99). We did not split the subjects by diagnosis for this analysis in order to exploit the broadest phenotypic continuum (Petersen et al., 2000) and maximize statistical power to detect genetic associations (Cannon and Keller, 2006).
Baseline MRI scans for each subject were analyzed using tensor-based morphometry (TBM) as described previously (Hua et al., 2008). Briefly, high-resolution T1-weighted structural brain MRI scans were acquired at 58 ADNI sites on 1.5T scanners with a protocol developed for multiple site consistency (Jack et al., 2008; ADNI also collected some data at 3T, which we did not analyze here; see Ho et al. 2010). Additional image corrections were applied to all images in a processing pipeline including: GradWarp correction of geometric distortion (Jovicich et al., 2006), B1-correction to adjust image intensity non-uniformity (Jack et al., 2008), N3 bias correction to adjust intensity inhomogeneity across a scan (Sled et al., 1998), and geometric scaling determined by a phantom scan acquired at each subject’s scanning session to tune scanner and session-specific calibration errors (Jack et al., 2008). Images were linearly aligned using a 9 parameter algorithm to the International Consortium for Brain Imaging template (ICBM-53; Mazziotta et al., 2001) to align brain positions to a common standard space, adjusting for global scaling.
The TBM analysis was performed following the protocol of our prior study, which showed clinical and cognitive test scores correlated with temporal lobe volumes, in a subset of the ADNI population (Hua et al., 2008). A minimum deformation template (MDT) was created based on a random subset of the healthy elderly controls at baseline. The MDT provides an unbiased representation of MRI scans expected from a group of average healthy elderly persons. We generated maps of localized volume difference for each subject compared to the MDT using an inverse-consistent, symmetric, intensity-based nonlinear warping algorithm (Leow et al., 2006). Maps of localized volume differences (called Jacobian maps) are estimated using the Jacobian determinant of the deformation matrix, which itself is a voxel-level estimate of volume excess or deficit compared to the MDT. Jacobian maps for each individual were then down-sampled using trilinear interpolation to a 4 × 4 × 4 mm3 voxel size to reduce computational burden. The value at each voxel in the Jacobian map represents a percentage volume difference compared to the MDT; we used this voxel-based measure of volume difference as the phenotype for genetic association tests.
Genome-wide genotype data were collected at 620,901 markers on the Human610-Quad BeadChip (Illumina, Inc., San Diego, CA). For details on how genetic data were processed, please see Saykin et al., 2010, and Stein et al., 2010a. Different types of markers were genotyped (including copy number probes), but only SNPs were used in this analysis. Several SNPs were excluded from the analysis based on standard filtering criteria, measures used in many other GWAS studies (Wellcome Trust Case Control Consortium, 2007): call rate <95% (42,670 SNPs removed), significant deviation from Hardy-Weinberg equilibrium P < 5.7×10−7 (871 markers removed), autosomal chromosomes only (10,686 SNPs removed), and an Illumina GenCall quality control score of <0.15 to eliminate “no call” genotypes (variable number of missing genotypes across subjects). We chose to remove SNPs with a minor allele frequency <0.10 (161,354 SNPs removed) based on our sample size. With our sample of 731 images, we are underpowered to detect associations with SNPs where the minor allele frequency is less than 10%, unless effect sizes are large (Wang et al., 2005; Flint et al., 2010). In addition, excluding SNPs with low minor allele frequencies avoids the risk of finding significant associations where only a small subset of subjects have the rare allele type and do not represent an accurate sampling of the phenotype of interest. If a very low minor allele frequency cutoff is used (e.g., MAF < 0.01) in samples of fewer than a thousand subjects, this may result in cases where an association is driven by a single subject. Clearly, such a result may be unreliable and is unlikely to replicate, so the higher MAF cut-off guards against this.
Due to the filtering based on Illumina GenCall quality control measures, individual subjects have some residual missing genotypes at random SNPs throughout the dataset. Because PCReg requires data without missing genotypes and to maximize the number of subjects included in the analysis, we performed imputation using the software, Mach (version 1.0), to infer the haplotype phase and automatically impute the missing genotype data (Li et al., 2009). After all rounds of quality control and preparation, 437,607 SNPs remained.
Using the retrieval interface of the software package PLINK (version 1.05; http://pngu.mgh.harvard.edu/~purcell/plink/), SNP annotations were made by continuously soliciting the TAMAL database (Hemminger et al., 2006) based chiefly on UCSC genome browser files (Hinrichs et al., 2006), HapMap (Altshuler et al., 2005), and dbSNP (Wheeler et al., 2006). The newly annotated SNPs were grouped by gene, where “gene” is defined by the gene transcript region including both introns and exons. We chose not to include SNPs upstream/downstream from the gene region. This may miss SNPs in promoter or regulatory regions for a gene, but avoids choosing an arbitrary window that may select regulatory SNPs for some genes, but not for other genes whose regulatory regions lie beyond the window length. SNPs that were not located in a gene were excluded (224,057 SNPs removed). All splice variants were considered as belonging to the same gene. After applying SNP filtering criteria, SNP annotation, and gene grouping, 18,044 genes were left for analysis out of the estimated 20,000–25,000 protein coding genes in the human genome (International Human Genome Sequencing Consortium, 2004).
Independent tests of statistical association with imaging measures were performed for each gene at 31,622 voxels within a whole-brain mask of the MDT across 731 subjects. To test the joint effect of all SNPs in a gene on the volume difference at each voxel, we employed a multiple partial-F test. A multiple partial-F test works by first estimating the fit of a “reduced model” of any number of nuisance variables on a given dependent variable and then estimating the fit of a second “full model” with the nuisance variables and any number of independent variables on the same dependent variable. Each association test results in an F statistic, which represents the joint effect of the independent variables on the dependent variable, controlling for the effects of nuisance variables already in the model. The multiple partial-F statistic was calculated for each gene at each voxel using equation 1 below. k is df(full)-df(reduced) and RSS is the residual sum of squares:
Multiple partial-F tests are well suited for testing the effects of multiple predictors on a given phenotype, but genetic data sometimes complicate testing because SNPs in the same gene are often correlated due to high LD. When the SNP values in a cohort of subjects are treated as a vector (whose components are the SNP value in each subject coded in an additive manner: 0, 1, or 2), then the adjacent SNPs can make different subjects’ vectors collinear. The dependence among these almost collinear SNP vectors in the multiple partial-F test model can lead to improper signs of beta coefficient estimates, wildly inaccurate magnitudes of beta coefficients, large standard error estimates (Kleinbaum et al., 2008), and false inferences. The reason this occurs is that standard regression models require the inversion of a set of “normal” equations, and when predictors (here SNP vectors) are highly correlated, the equations are not of full rank. This leads to unstable or unreliable solutions. One way out of this predicament is to use a type of regularized or penalized regression, such as ridge regression (also known as sparse regression or Tikhonov regularization), which can be used when there are high correlations among the predictors. Alternatively, dimension reduction may be performed (which we do here), to create a set of predictors that explain the variance in the data but that are no longer correlated.
To avoid the complications of collinearity in the statistical model, we first performed principal component analysis (PCA) on the SNPs within each gene, storing all of the orthonormal basis vectors of the SNP matrix that explained the first 95% of the variance in the set of SNPs. Basis vectors with the highest eigenvalues (higher proportions of explained variance) were included until 95% of the SNP variance was explained, and the rest were discarded. These new “eigenSNPs” approximate the information in the observed SNPs, but lack the collinearity that disrupts the multiple partial-F test models. By first performing PCA followed by a multiple partial-F test, our method may be considered a variant of PCReg and produces F statistics equivalent to those proposed in Wang and Abbott (2008). In this study, the independent variables built into the multiple partial-F test full model were the column vector output from PCA performed on each gene with age and sex as covariates. In this way, we tested the joint predictive effect of variation throughout a gene on brain volume variations on a voxel-by-voxel level.
The total number of tests of association for vGeneWAS is very high (18,044 genes × 31,662 voxels). Because of the massive processing requirement, we coded the PCA and multiple partial-F test steps of PCReg using the R statistical package (version 2.9.2; http://cran.r-project.org/) using the doMC “multi-core” package (version 1.2.1; http://www.revolutionanalytics.com/) to split processing over multiple cores in a single CPU. Processing was parallelized over a cluster of 10 high performance 8-core CPU nodes using the Laboratory of Neuro Imaging (LONI) Pipeline (http://pipeline.loni.ucla.edu/). For further data reduction, we only saved data on the gene with the lowest P-value at each voxel. This is comparable to our prior work using voxelwise testing of all 500,000 genotyped SNPs, where only the SNP with the lowest P-value was retained at each voxel (Stein et al., 2010a). The total time required to complete an analysis was approximately 13 days.
To examine the situations where PCReg exhibits better (or worse) performance than traditional simple linear regression, we compared the two methods directly on real genetic data. Performing tests on real genetic data as opposed to simulated data is important because the power of each method depends upon the underlying LD structure. Generating simulated data that mimics a chosen LD structure can be just as biased as selecting actual genes, though a significant treatment of the issue of power in PCReg is discussed in Wang and Abbott (2008). We used temporal lobe volume (TLV) summary measures obtained by Stein et al. (2010b), as the phenotype for testing associations for both methods. We performed a genome-wide scan of every SNP from our filtered and annotated genotype data (only including SNPs located within genes) using simple linear regression with SNPs coded following an additive model. We took the top SNP from the analysis, and the rest of the SNPs from the same gene, and performed PCReg on all of the SNPs in that gene. In addition, we performed a gene-wide scan of all of the genes in our dataset using PCReg with SNPs coded following the additive model. We selected the top gene from the analysis and then ran individual tests of association using simple linear regression at each SNP within the top gene. In this way, we were able to compare the performance of each method in cases where the underlying genetic structure might favor one method over the other.
As we noted in Stein et al. (2010a), the minimum P-value at each voxel, in the null case with n independent tests, approximately follows a probability density function (PDF) such that (Ewens and Grant, 2001):
The PDF derived from equation 2 is a Beta distribution with parameters α=1 and β=n. At each voxel, selecting the minimum P-value for the top gene then follows a Beta(1, n) distribution, where n is the independent number of genes tested.
However, it is well known that the adjacent SNP values within genes are not statistically independent (Frazer et al., 2007). Genetic loci are inherited in contiguous segments, and some genes co-segregate in blocks. The allele frequencies and structure of genes that co-segregate are more similar than would be expected by chance if they were assumed to be independent. Because of this, the effective number of independent tests (Meff) is less than the total number of tests performed (M). By determining Meff, we obtain a more accurate estimate of the total number of independent tests performed with vGeneWAS, given the LD structure of our genotype data.
In our sample, we estimated Meff by performing permutation tests at three randomly selected, uncorrelated voxels in the brain. We regressed each of the 18,044 genes on the permuted residuals of the reduced model after including the age and sex covariates at each run, and stored the minimum P-value. Note that, in this case, the phenotype data is null. However, because it is computed from the real data after adjusting for age and sex, the phenotype data (image values) have the same range and statistical distribution as the data tested for genetic associations. By using the genes, one at a time, as regressors on this null data, one can develop a distribution of the resulting P-values, under null conditions, that can be used to calibrate the significance values that are ascribed to the observed data. As only the minimum P-value is retained (for the best fitting gene), one can build up a reference distribution for the minimum P-values, to help gauge the level of surprise in seeing associations in true data. We repeated this process 5000 times at each of the three randomly chosen voxels and merged the data. The distributions of null minimum P-values from each voxel were nearly identical (Figure 1). Storing the minimum P-values of the permutation tests yields the expected null Beta distribution given our data. We used a maximum-likelihood function (betafit; Matlab, The Math Works, Inc.) that estimates the best fit for the null Beta distribution by varying the β parameter of Beta(1,β). The value of β approximates the effective number of independent tests (Meff) performed on our data. We then apply a beta inverse transform using the approximated β parameter so that the distribution of P-values is now on uniform distribution that deviates from the null when there is a signal.
After correcting for the effective number of independent gene-based tests performed at each voxel, we still need to correct for the multiple comparisons across voxels. We used the original false discovery rate method (FDR; Benjamini and Hochberg, 1995), which identifies whether there is any statistical thresholding of the uncorrected P-value maps that keeps the rate of false positive results within a predefined threshold (we chose 5%, which is conventionally used). This means that, if the results pass the FDR test, approximately 95% of the voxels declared as significant associations will be true positives (averaged over many experiments). In addition, we tested a less conservative variant of FDR, the positive FDR (pFDR), which operates under the condition that at least one true positive finding exists in the data (i.e., one of the null hypotheses is rejected) and yields q-value correction thresholds similar to the original FDR method (Storey et al., 2003). The pFDR test is implemented in the R statistical package called “qvalue” (Version 1.22.0).
A certain amount of spatial smoothness is expected among voxels in an image. This is most likely explained by the non-independence of volume difference measures at adjacent voxels. Relative volume maps were generated using tensor-based morphometry (TBM), which relies on non-linear registration of each subject’s imaging data to a common template. The degree of spatial smoothness in the Jacobian maps derived from the gradient of a deformation field depends on the choice of the regularizer used by the warping algorithm (Laplacian, elastic, fluid, sKL, etc.) and on the resolution of numerical grid chosen to solve the differential equations whose solution is the deformation field. Volume difference maps based on the deformation field vectors are spatially smooth, as are any resulting statistical maps. In addition to image smoothness, certain noncontiguous voxels in distant regions of the brain can have surprising covariance patterns despite their spatial separation (Fillard et al., 2006).
We examined whether the size of voxel clusters associated with the same gene from our vGeneWAS analysis differed from the cluster sizes expected under the null hypothesis of no association at all, given the non-independence of signals at adjacent voxels in our images. In addition, we wanted to determine whether the number of unique, top genes from across the brain significantly differed from the number of top genes expected by chance. We generated simulated cluster maps by first creating a correlation matrix of r values representing the Pearson’s correlation between any given voxel and all other voxels in the brain. Next, we randomly selected (without replacement) a voxel, Vs, and all corresponding voxels with an r2 value (proportion of variance explained) greater than 0.8 from the correlation matrix. We chose to only select voxels from the correlation matrix with an r2 greater than 0.8 because this provided the largest cluster size estimates in the simulated output maps. The r2 value of each voxel-to-voxel relationship was then used to divide the interval [0,1] of a uniform distribution such that the correlation between a voxel and Vs was directly related to the area under the curve occupied by that voxel’s area under the curve or “bucket.” The size of each voxel’s bucket was recalculated each time we chose a new Vs. We selected a random number on the interval [0,1], and, depending on its value (which “bucket” it fell in), assigned the same categorical variable link (e.g., a, b, c) to Vs and the voxel whose bucket was selected from the uniform distribution. This linked the two voxels. The probability of a given voxel being chosen from the uniform distribution was directly related to how correlated it was to Vs. We continued the process by randomly choosing a new Vs from the correlation matrix. If a randomly selected voxel Vs did not contain a linking variable, but selected a voxel from the uniform distribution that already contained a link, then Vs was assigned the linking variable of the voxel selected from the uniform distribution. If Vs already contained a linking variable and the voxel chosen from the uniform distribution had not previously been assigned a variable, the two voxels were linked using the existing linking variable of Vs. If both Vs and the voxel selected from the uniform distribution already contained a linking variable, we kept each variable as-is and then continued with the process. Finally, voxels that were not correlated to any other voxels in the image, with an r2 value greater than 0.8, were assigned a non-linking random variable. After iterating through every voxel in the image, each voxel had a categorical variable that either linked it to other voxels or only to itself. We ran this entire simulation process 100 times, generating a new simulated cluster map each time. By considering the correlation of a given voxel to all other voxels in the image, as opposed to using a single summary measure of smoothness throughout an image, we were able to model the expected clustering among adjacent voxels and non-independent, spatially separated clusters.
Based on the 3D pattern of voxels and the variables linking voxels together, we used a nearest neighbor algorithm to measure cluster sizes of adjacent voxels with the same linking variable value. Using the cluster size estimates from each simulated map, we were able to determine the expected distribution of cluster sizes based directly on our study dataset. In addition, we used the total number of unique linking variables in each simulation as an estimate of the number of independent voxels in our dataset. Because non-independent, correlated voxels may tend to be associated with the same gene, we can use the total number of independent voxels to estimate the number of top associated genes we would expect to find in null cluster maps made from our actual test data. We used the estimated number of independent voxels, Vi, to randomly select (with replacement) a gene from the set of 18,044 genes and repeated the selection Vi times. We found the number of unique genes represented for each simulated output map and then took the average.
To examine the differences between gene-based and SNP-based association methods (which are more standard), we compared the results of PCReg to linear regression using temporal lobe volume (TLV) data from a previous study (Stein et al., 2010b) as the phenotype. We chose to focus on the top gene or SNP identified by each method in order to examine performance when the variant chosen is deliberately selected to favor one of the two methods. GRIN2B was identified as the gene with the SNP variant that was most significantly associated with TLV (P = 4.03×10−7). We identified each of the 129 SNPs within the GRIN2B gene, and then performed linear regression at each SNP and PCReg as a single gene test with TLV as the phenotype. The −log10(P-value) of each SNP-based test is shown in Figure 2a with single gene test results overlaid (black dotted line). It is clear that the main effect detected with linear regression is much greater in this case. It is important to note that we tested each of the 129 SNPs within the GRIN2B gene, which would require any significant P-values identified to be corrected for multiple comparisons before further study. In this example, however, there are several SNPs that beat the Bonferroni-corrected significance threshold (α = 3.8×10−3). In comparison, the gene-based test of GRIN2B using PCReg was a single test not requiring correction for multiple comparisons and maintained a nominal significance value (P = 0.012). Also, we compared BEST3 - the gene identified to be most significantly associated with TLV via PCReg - with the linear regression output of each SNP within the gene (Figure 2b). The significance of the main effect of the gene-based test is much stronger (P = 2.9×10−4) than the best linear regression result (P = 0.063). This demonstrates a case where variance components from individual markers that are not significant via linear regression may be combined into a single significant test statistic.
Wang and Abbott examined whether the power to detect associations in genetic data is influenced by the number of eigenSNPs included in a PCReg model. They found that models with greater numbers of eigenSNPs do not have increased power to detect associations (Wang and Abbott, 2008). However, each additional eigenSNP included in the model uses a degree of freedom. It is then possible that PCReg and similar regression methods are biased toward selecting effects of smaller versus larger genes (Chapman and Whittaker, 2009). We examined or results for gene-size bias and verified that the number of eigenSNPs in the PCReg model within the top genes from our run of vGeneWAS was not correlated with the observed P-value, using a Pearson’s product-moment correlation test (r = 0.0045; P = 0.42). In addition, we verified that the number of eigenSNPs in each of the 18,044 genes at a single voxel was not correlated with its significance level (r = 0.0066; P = 0.29). We also compared the number of eigenSNPs in each gene (mean and median: 14.3 and 9) with the number of eigenSNPs in the top genes from our analysis (mean and median: 13.6 and 5). It remains possible that we missed effects of very large genes, but this is inevitable in small samples as the number of eigenSNPs needed to adequately encode the majority of variation in large genes tends to approach the sample sizes, reducing the available numbers of degrees of freedom for the whole-gene tests.
We generated maps of significance where each color-coded voxel in the brain shows the P-value of the most highly associated gene at that voxel (Figure 3). There are several spatially contiguous regions throughout the brain with raw minimum P-values lower than P < 10−7. In addition, some of the top genes identified show symmetric clustering across hemispheres. Brain structures are highly symmetric between hemispheres, at least for most brain regions, so symmetric genetic associations may be biologically plausible because the volumes of symmetric structures co-vary across subjects, so they may share similar genetic determinants. However, evidence of symmetric patterns of association in the brain does not necessarily imply biological plausibility (Fillard et al., 2006).
We used a simulation-based test to build the expected null distribution of cluster sizes given our image data. We compared the distribution of the cluster size values in simulated maps to the cluster sizes obtained from vGeneWAS. The proportion of the null (simulated) maps that contained small clusters is much greater than in vGeneWAS, while the proportion of the vGeneWAS map that contained large clusters was greater than in the null maps (Figure 4). The minimum and maximum cluster sizes for the simulated maps were 1 and 14 voxels (64 and 896 mm3), respectively. The minimum and maximum cluster sizes for vGeneWAS were 1 and 429 voxels (64 and 27456 mm3). This demonstrates that a large proportion of clusters of voxels associated with the same top gene are larger than would be expected based on completely null data, even taking into account the non-independence of voxels in our dataset.
Based on our simulated cluster maps, we used the number of unique clusters as an estimate of the number of independent voxels. The estimate of the number of independent voxels based on 100 runs of the simulation tests was 11900.8 ± 50.6 (mean ± standard deviation) out of the 31662 total voxels. We performed tests to estimate the number of genes we should expect to find in our analysis based on the non-independence of voxels in our data. We used the number of independent voxels estimated from the simulation to randomly select (with replacement) from our list of 18,044 genes. We tallied the number of unique genes represented for each simulated cluster map and found the average was 8721.4 ± 44.9 (mean ± standard deviation). We measured the total number of unique genes to be 5333 from our run of voxelwise GeneWAS. The number of observed genes is significantly lower than the number of genes expected based on the null cluster maps (P < 0.01). Combined with our cluster size comparisons, this suggests that the top genes identified in our analysis tend to have a much more broadly distributed effect than expected based on null data, even taking into account the intrinsic spatial non-independence of our data. The top 20 genes most significantly associated with any voxel are listed in Table 1.
The GRB-associated binding protein 2 gene, GAB2, is the most significantly associated gene in our analysis and has previously been linked to late-onset Alzheimer’s disease (LOAD) (Reiman et al., 2007). Reiman et al. (2007) identified 10 SNPs from the GAB2 gene that were significantly associated with LOAD and APOE allele status in 1411 cases and controls from 20 NIA-sponsored Alzheimer’s Disease Centers. Replication attempts in independent samples have yielded mixed results (Ramirez-Lorca et al., 2009; Lin et al., 2010; Chapuis et al., 2008), but large meta-analyses of several databases shows that GAB2 may indeed have a moderate effect on the development of LOAD (Ikram et al., 2009; Schjeide et al., 2009). Specifically, the meta-analysis of the marker rs2373115 in nine studies has an odds ratio of 0.85 and a 95% confidence interval for the odds ratio of [0.76, 0.94]. In addition, the AlzGene website lists GAB2 as being in the top 20 genes likely related to Alzheimer’s disease (September 3, 2010; http://www.alzgene.org/). In vivo testing shows that GAB2 is over-expressed in certain brain regions such as the hippocampus and posterior cingulate cortex in patients with LOAD (Reiman et al., 2007). Experiments with small-interfering RNA (siRNA) and GAB2 reveal that the normal function of GAB2 proteins prevents the formation of serine-262 phosphorylated tau tangles (Reiman et al., 2007). No studies, to our knowledge, have considered morphometric effects of GAB2 variants. The GAB2 associations show a symmetric signal in the white matter superior to the lateral ventricles (Figure 5).
The second most highly associated gene, leucine-rich repeat and death domain containing protein (LRDD), is expressed in the brain and may mediate cell apoptosis and DNA repair (Telliez et al., 2000). In addition, LRDD has been implicated in the p53 tumor-suppression pathway likely by signaling cell apoptosis in response to DNA damage (Brown et al., 2009). LRDD was the most significantly associated gene in a cluster of voxels in a white matter tract of the occipital lobe, possibly the optic radiations (Figure 5).
Associations with protein tyrosine phosphatase receptor type beta, PTPRB, are detected in the cerebellum (Figure 5). PTPRB interacts with neural receptors and cell-adhesion molecules and is involved in neurite development and neuronal differentiation (Ishiguro et al., 2008). PTPRB has also previously been associated with alcohol and drug abuse via genome-wide search (Ishiguro et al., 2008). In addition, an expression study found that PTPRB encoded proteins are present in the gastric mucus and other tissues of gastric cancer patients (Wu et al., 2006).
The fourth and fifth most significantly associated genes are zinc finger protein 462 (ZNF462) and immunoglobin superfamily member 5 (IGSF5), respectively. ZNF462 is the most significantly associated gene in a cluster of voxels in the upper-left grey matter of the parietal lobe (Figure 5). Interestingly, IGSF5 shows symmetrical clusters of association in the temporal lobe and the surrounding cerebrospinal fluid (CSF) at the base of the brain (Figure 5). Neither gene is well studied, but IGSF5 may be involved with junction cell adhesion (Hirabayashi et al., 2003).
Other genes of interest identified in our analysis include ARALAR, which encodes a calcium-binding mitochondrial protein that is highly expressed in the brain (del Arco and Satrustegui, 1998). ARALAR has previously been associated with autism (Ramoz et al., 2004), but the claims are controversial (Rabionet et al., 2006). CHRM5, is a muscarinic acetylcholine receptor M5 coding gene and has previously been associated with schizophrenia (De Luca et al., 2004). S100B, encodes a zinc-binding protein over-expressed in patients with Alzheimer’s disease and interacts with Tau proteins (Yu et al., 2001).
After permutation testing to determine the effective number of independent gene tests, we need to model the function parameters so that we can transform the data for correction for multiple comparisons. The effective number of independent tests was estimated to be 15,636, which is a moderate reduction from the 18,044 genes measured directly in this experiment. We therefore chose to model the null distribution as Beta(1, 15636). The probability density function (PDF) of Beta(1, 15636) on the normalized histogram of observed P-values fits the data well with only small deviations from the original Beta(1, 18044) (Figure 6a). We note, however, that our estimate is determined both by SNP density and degree of coverage in the SNP marking scheme of our study. Experiments that use different SNP-chips, include sex chromosomes, or use different annotation methods may encounter different estimates. To determine how well the expected null distribution compares to the observed PDF, we compare each distribution directly in a Q-Q plot (Figure 6b). The expected null distribution also fits the observed data well.
P-values suitable for multiple comparison correction via FDR methods should have a probability distribution on the interval [0,1] that is uniform in the null case, i.e., its cumulative distribution is diagonal in the null case (Benjamini and Hochberg, 1995; this is the basis for the false discovery rate method). Our Beta-distributed experimental P-values need to be corrected so that they meet the assumptions of the FDR model. Using the analytic β parameter from the null Beta distribution, we fit a cumulative distribution function (CDF) to our observed data yielding a new distribution of corrected P-values that deviate from the uniform distribution only when the data are not null. A histogram of the observed corrected P-values (Figure 6c) shows that the cumulative distribution is approximately equivalent to a uniform distribution. A Q-Q plot of the expected null distribution corrected P-values against the observed corrected P-values shows that the two distributions differ (Figure 6d). A Q-Q plot of two identical distributions will lie on the null 45-degree diagonal line (y = x). There are two things that can cause a Q-Q plot to deviate from the null: incorrectly modeling a distribution or significant data. The line representing the observed compared to the expected results in the Q-Q plot in Figure 6d is steeper than and deviates from the null line at lower P-values. This suggests that the distribution of Pc values is left skewed and more dispersed than the theoretical uniform distribution (Thode et al., 2002). It is possible to apply a further correction to our observed Pc distribution by using a Q-Q plot of an analytic null distribution versus the theoretical uniform distribution as a hash table. However, we are only selecting the genes with the lowest P-value at each voxel so monotonic P-value corrections will not change the distribution of Pc-values.
We used two methods to control the FDR of the corrected P-values (Pc). We used the original FDR method (Benjamini and Hochberg, 1995), which appropriately controls for multiple comparisons when the covariance of test statistics shows a positive regression dependency (Benjamini and Yekutieli, 2001). We found that the false discovery rate for the second most highly associated gene in our results (LRDD) could only be controlled at a threshold of q = 0.30 (i.e., allowing a 30% false discovery rate) after applying a statistical threshold of Pc=5.36×10−4. In addition, the pFDR q-value threshold (Storey et al., 2003) was q = 0.23 for the most significantly associated gene at any voxel (GAB2). In other words, the vGeneWAS results could not be controlled at the conventional false discovery rate, but show promise.
Voxelwise GeneWAS results in a map that shows only the top gene at each voxel. The top gene at each voxel may be the most significant gene in a certain region, but it may also have a more distributed effect throughout the brain, with effects in additional regions where it is not the top gene. In addition, genes that do not have a large main effect might never be selected in this type of analysis, but still could have a large distributed effect on the brain.
We tested the effect of the top gene in our analysis, GAB2, at every voxel across the brain using PCReg. We stored each P-value in a map and applied the original FDR method. Voxels surviving the FDR threshold are shown in Figure 7. These are post hoc tests, so are exploratory, and require replication in independent samples, but it is quite clear that GAB2 has a much greater distributed effect on the brain than could be determined from the vGeneWAS results. Future implementations of vGeneWAS might consider the effects of multiple genes at a voxel to account for the case where a gene is significant in its effect of explaining variations in the image, but is not necessarily the top gene. In addition, vGeneWAS could be further improved by considering the distributed effects of genes. If a gene has an effect over a large region, but is not the top voxel, it will be completely overlooked in the current implementation of vGeneWAS. Adaptation of cluster-level inference to these maps would be of interest, as well as tests that combine cluster extent and height (Hayasaka and Nichols, 2004). Existing adaptations of the original FDR method, such as “searchlight” FDR, could be useful here as it produces region correction thresholds that are sensitive to small clusters of positive signals in imaging data, but appropriately conservative in its correction of false positives (Langers et al., 2007).
To understand the contributions of each individual SNP in the GAB2 gene, we performed post hoc association tests for each SNP with the phenotype value from the top voxel in the brain. It should be noted, however, that choosing the GAB2 gene to compare the results of SNP-based linear regression with gene-based PCReg provides P-values which are biased by the previous gene-wide brain-wide search because GAB2 was identified using PCReg. There were 20 SNPs from the GAB2 gene in our genotyped data. Of these, only three passed the nominal significance level in SNP-based association tests (P = 0.05). The most significant SNP identified, rs7927923, has P = 9.1×10−3. The other two significant SNPs, rs1981405 and rs1893447, showed effects with P = 0.027 and P = 0.049, respectively. A total of 16 SNPs out of 20 are in high LD with the most significantly associated SNP (r2 > 0.6). Only one of the SNPs from our analysis overlapped with SNPs used in previous GAB2 association studies, most likely because we are using different genotyping platforms. Clearly, the gene-based test was more powerful at detecting an association in this case than each SNP tested individually (compare the dotted line with the colored dots in Figure 8).
To assess the differences in power afforded by vGeneWAS relative to existing SNP-based methods, we compared the Pc-values from vGWAS obtained in our previous study by Stein et al. (2010a), with the Pc-values resulting from vGeneWAS (Figure 9). The proportion of Pc-values greater than a given FDR threshold for each method is directly related to differences in effect sizes. The FDR of the results from vGWAS could only be controlled at a threshold value of q = 0.50, whereas the FDR threshold for vGeneWAS is somewhat lower, although not passing the conventional FDR level (q = 0.30; Figure 9). This suggests that the vGeneWAS method may have more power, in principle, to detect genetic associations, although neither test controlled the false discovery rate at the conventional level.
Here we present a method to conduct a voxelwise gene-wide association study (vGeneWAS), testing the aggregate effect of multiple SNPs within each gene. In summary, (1) we implemented a gene-based association test using principal components regression (PCReg); (2) we performed association tests at every voxel within a full brain mask where the value at each voxel was the local volume difference relative to the mean template while controlling for age and sex; (3) we generated a beta distribution of P-values by selecting only the most significant gene at each voxel; (4) with permutation tests, we estimated the effective number of tests performed; (5) we corrected for multiple comparisons in a two step procedure - we estimated β using the CDF of the analytic Beta distribution and then corrected the new uniform distribution using two different FDR methods.
None of the genes identified passed the standard FDR threshold (q = 0.05). However, many of the genes identified have previously been associated with brain differences or disorders. The top gene identified is a known Alzheimer’s risk gene, GAB2, lending plausibility to the method. Many of the genes identified are highly expressed in the brain or differentially expressed, depending on disease status. The findings in this study warrant further examination and replication attempts.
Our method selects the top gene at each voxel, to reduce the amount of data. Choosing only the top gene at each voxel, however, can hinder the extensibility of our results. This may miss many genes with distributed effects, if the main effect of the gene is never the largest at any voxel. Future implementations of vGeneWAS could consider the relationship among voxels when performing association tests. Liu et al. (2009) used parallel ICA to relate brain network data from fMRI to SNP data. They selected only a small set of 367 predefined SNPs based on a set of candidate genes for schizophrenia; this does not leverage all of the available data in the genome. Similar approaches have been attempted on voxel-based morphometry (VBM) data from structural MRI (Jagannathan et al., 2010). However, this approach used the same subset of SNPs used by Liu et al. (2009). Vounou et al. (2010) proposed a method called sparse reduced-rank regression (sRRR) which uses the sparseness of signals to simultaneously select phenotypes and genotypes. Power estimates suggest that sRRR is more powerful than using individual tests at each voxel; this may prove to be very useful in the future.
Principal components regression (PCReg) is an efficient method to test the joint association of all SNPs within a gene simultaneously. PCReg detected associations with genes missed by SNP-based regression (Figure 2b). By leveraging the LD in a gene, PCReg encodes variance throughout a gene to test for associations. We identified situations where SNP-based regression models may have more power (Figure 2a). If a single SNP has a large main effect, then testing the joint effect of all SNPs within that gene may dilute the association; the cumulative P-value from gene-based tests may be lower. However, when one considers the drastic reduction in the number of independent tests when comparing SNP-based linear regression with PCReg, gene-based testing offers advantages.
Another concern with PCReg and related regression models is that each predictor added to the regression model consumes a degree of freedom. There may then be some detection bias in the regression model, where smaller genes are found to be more significantly associated with the phenotype than larger genes, because the regression models of small genes have more degrees of freedom (Chapman and Whittaker, 2009). While we did not observe this effect, it is an important factor to consider when interpreting results. Additionally, SNPs combined into a single test statistic in PCReg could have different directions of effects, disrupting the power to detect an association. However, the situation where a gene contains SNPs with negative correlations with respect to the phenotype may be relatively rare as it requires two nearby loci to be in LD with different causal variants (Chapman and Whittaker, 2009).
Other multivariate regression methods may offer greater power to detect genetic associations than PCReg, which is used here as an example. Wang and Elston (2007) compress genome-wide genotyping data across subjects using a Fourier transform, and assign weights to the low-frequency components in a regression model. This method is similar to PCReg, as it collapses the number of genetic tests performed while capturing much of the variance across markers. Kernel-based methods have been implemented as non-parametric gene-based tests to increase power over SNP-based methods; however, these methods have only been implemented in case-control studies (Mukhopadhyay et al., 2010). Ridge regression (Malo et al., 2008; Sun et al., 2009) and lasso-based penalized regression (Zou et al., 2006; Wu et al., 2009) can both powerfully detect associations in genetic data. In fact, direct comparisons between ridge regression, lasso-based penalized regression, and PCReg show that the first two methods may be more powerful than PCReg depending on the underlying genetic architecture (Bovelstad et al., 2007; Benner et al., 2010). However, ridge regression and especially lasso-based penalized regression are extremely computationally intensive compared to PCReg. There is a huge computational requirement to complete a vGeneWAS analysis, which searches the whole image in addition to the whole genome. Due to this, we decided to strike a balance between power and computational complexity to complete the analysis in a feasible time frame. Future implementations of vGeneWAS could be improved by using additional multivariate regression methods, although they may need to be modified for speed.
A current limitation of our method, as described here, is that in its current form family-based designs (such as pedigree structures) cannot be used. The patterns in allele frequencies in a family cohort depend on kinship, and mixed-effects models would be required to control for kinship structure. Several such methods exist for SNP-based linear regression, such as Efficient Mixed-Model Association (EMMA; Kang et al., 2008). However, to the best of our knowledge, multivariate mixed-model regression has not been attempted in a genetic context. One multivariate gene-based method called versatile gene-based association (VEGAS) avoids multivariate mixed-model regression by converting SNP-based P-values into a multivariate gene-based test statistic (Liu et al., 2010). As there are already many methods for SNP-based mixed-model regression, VEGAS is aptly suited to perform gene-based association tests in family-based populations. As the VEGAS test statistic is determined by SNP-based P-values, it will not be able to detect associations where the cumulative P-values are not significant. In this way, if a series of SNPs contribute a small amount of variance in a gene, VEGAS will miss them, because the SNP-based method will as well.
While this is one of the largest imaging genetics studies to date, our sample size still may be underpowered to detect moderate effects of genetic variants on the brain. Future studies in imaging genetics are likely to benefit from meta-analysis, which aggregates GWAS results from multiple cohorts to determine reproducible genetic associations. This aggregation of large datasets can be used to boost power to detect SNPs with smaller effects. One such effort now underway is the ENIGMA project (ENIGMA Consortium, 2011). In cases where there is not enough data available to perform a true meta-analysis, discovery and replication datasets may be useful. In early tests using brain images, some genetic associations seen in a discovery cohort have been replicated in independent samples (e.g., Rajagopalan et al., 2011). However, our main purpose in this study is to demonstrate a method to conduct voxelwise, gene-based analyses, which will become more powerful as imaging databases continue to grow rapidly worldwide in size and content. In assessing whether our results may generalize to new datasets, we note that we examined only a tight age range in our study (elderly subjects), and this may affect the genes found to have morphometric effects on the brain. If the genes have a varying expression pattern over time some of the top genes detected in our analysis may not be dominant during other parts of the lifespan. Although we controlled for age effects on brain structure in our analysis, it is still unknown whether the identified genetic associations with brain morphology are under some age-related influence; it is also not known if these genes are expressed in a typical/atypical age-related fashion. Datasets drawn from different parts of the lifespan would offer maximal power to detect genetic variants relevant for brain structure (as discussed in Braskie et al., 2011; see Rajagopalan et al., 2011, for an example).
One limitation of GWAS analyses is that they overlook rare variants, which are also emerging as key players in the genetic underpinnings of mental disorders (Bansal et al., 2010). Our method does not consider these rare variants, but may play an important role in explaining variance in complex traits that is not accounted for by common variants. Examination of rare variants is still relatively costly (as it requires deep sequencing of large numbers of subjects). Some types of rare variant can be genotyped on SNP-chip platforms (such as copy number variants) they require separate analytical techniques from those considered in this paper (Bansal et al., 2010). Each of these limitations will be more feasible to address when the cost of deep sequencing drops and sample sizes are large enough to reliably implicate many genes simultaneously.
Our gene-based test may be more powerful than univariate methods in certain cases, but not always. The five top genes identified in the present study do have some biological plausibility; some are known to be expressed in the brain and implicated in brain disorders. However, there are other genes missing in the short list of genes in the present findings that are frequently found using univariate approaches and strongly implicated in complex behavioral pathologies across mammalian species, such as the BDNF val66met substitution.
To explain this, we note that the analysis of currently available imaging genetics data is very underpowered. In addition, sample sizes needed to reliably detect a genetic association are even greater when multiple genes are assessed or when genome-wide search is conducted. By contrast, the BDNF val66met substitution is often treated as a candidate gene, and if that is done, it is conventionally agreed that its effects must only satisfy a nominal significance level of P < 0.05, if no other genes or SNPs within it are tested. In our own work on a different cohort scanned with DTI (Chiang et al., 2010a), we were able to detect robust associations between the BDNF val66met polymorphism and fiber integrity (fractional anisotropy) assessed with DTI. We were also able to replicate these associations in two non-overlapping (independent) samples of subjects. Even so, the significance level used (P < 0.05) was far more lenient than the very strict thresholds required to control for false positives when the whole genome is searched. As such, false negative results in a GWAS study (e.g., not finding a significant association of the BDNF val66met substitution when it is in fact a true association) does not mean that a gene does not affect a phenotype, just that an association was not detected at the very stringent statistical cutoff applied to account for multiple comparisons across the genome.
A review of the literature also suggests that BDNF val66met, while a popular target of study, has a mixed history. The BDNFval66met substitution has been inconsistently implicated in mood disorders, Alzheimer’s disease, and quantitative measures of memory (Bagnoli et al., 2004; Combarros et al., 2004; Nacmias et al., 2004; Nishimura et al., 2004; Tsai et al., 2004; Desai et al., 2005; Matsushita et al., 2005; Vepsalainen et al., 2005). In a secondary, post hoc, analysis, we tested the effect of the BDNFval66met allele in this current sample using standard univariate regression (with a dominant model, controlling for age and sex) and it did not survive correction for multiple spatial comparisons. An association test of the BDNF val66met allele with the hippocampal volume of the 731 subjects in this dataset did not survive the nominal significance level of P = 0.05. As such, we were not able to use BDNF as a “gold standard” gene; arguably, there are not yet any such genes with universally replicable associations on brain structure that can be used to gauge the face validity of novel methods.
Another limitation is that any approach that stops after selecting only one gene per voxel is not biologically plausible as a model of phenotypes with complex genetic determination. As such, the development of gene-based tests should be considered as a way station towards a more sophisticated treatment of complex genetic effects, in pathway or gene-gene interaction models (Inkster et al., 2010). However, testing gene-gene interactions in the vGeneWAS framework is computationally intractable as the number of tests quickly approaches nCk=n!/((n−k)!k!) at each voxel in the whole brain to test for interactions among all sets of k different genes drawn from an overall pool of N genes. Additionally, interaction effect sizes are generally much smaller than the main effects we are searching for in this paper, so our sample size would need to be much larger to accommodate the necessary correction for multiple comparisons and smaller effect sizes (Cordell, 2009). As genome-wide interaction analysis (GWIA; Marchini et al., 2005) is computationally intensive and underpowered with current imaging datasets, we recently developed an alternative method (Chiang et al., 2011b,c) to detect likely gene-gene interactions among SNPs, without having to compute all N(N-1)/2 pairwise or all n!/((n-k)!k!) kth-order interactions on the genome. Two advantages of this genetic network analysis, relative to genome-wide interaction analysis (GWIA) are apparent: (1) genetic correlation can tap into the natural latent structure of gene action in a brain image; (2) voxel clustering by genetic affinity leads to high power to find SNPs with correlated effects in genome-wide scans.
Gene-based tests of association across the genome and brain have not been attempted before, to our knowledge. Recently, imaging genetics studies have focused on single-locus associations with summary brain volume measures or 3D statistical parametric maps. vGeneWAS advances the burgeoning field of imaging genetics by providing the framework to perform multivariate, gene-based association tests. It does not restrict analyses by requiring prior hypotheses about a specific causal variant or ad hoc region of interest. vGeneWAS is the first attempt to apply gene-based tests to morphometric imaging data and opens up more possibilities to discover putative genetic variants that contribute to differences in brain structure. This may help when the main effect of each variant in a gene is too small to detect with traditional SNP-based methods.
Although vGeneWAS is a multivariate, gene-based method, we identified genes previously associated with brain disease using SNP-based tests. Many variants in the GAB2 gene are implicated in the development of late-onset Alzheimer’s disease (LOAD) and are thought to interact with the APOE epsilon 4 allele. In the pattern of effects for GAB2 on the brain (Figure 7), the highlighted areas are generally periventricular, and ventricular enlargement is a prominent characteristic of AD (de Leon et al., 1989; Chou et al., 2010). As we noted in our prior papers on TBM in Alzheimer’s disease (Hua et al., 2008), there is occasionally a ring of voxels around the lateral ventricles that show partial volume effects that mostly like reflect ventricular expansion. Clearly, the ventricular expansion itself indirectly results from the diffuse loss of brain parenchyma, so the changes detected there may also reflect, to some extent, atrophic processes more remote than the voxels singled out in the voxel-based maps. LRDD is highly expressed in the brain and is involved with DNA repair including signaling apoptosis in tumor cells. PTPRB is associated with addiction to drugs and alcohol and may be involved with tumor regulation (Telliez et al., 2000; Wu et al., 2006; Ishiguro et al., 2008; Brown et al., 2009). Based on gene expression and links to brain diseases, many of the genes identified in our analysis may have differential morphometric effects across the brain. In addition to some of the more well-studied genes, we identified many genes such as IGSF5 and ZNF462 that have little research available to infer their plausibility. However, almost all of the genes identified in our analysis are highly expressed in the brain, which at least suggests that the genes may have a role in brain function. Further analysis is required to examine to what extent each gene variant identified in our analysis mediates brain differences.
Many of the associations identified here seem to have a plausible story, but we need to consider that some of the patterns of association, especially clusters of association, may be due to short-range spatial correlations in the images. Adjacent voxels in brain scans tend to covary, as do Jacobian maps used to represent a localized measure of volume difference. These methods rely on non-linear algorithms that generate spatially smooth deformations. In addition to the simulated (null) cluster size maps in this study, Stein et al. (2010a) found that a small amount of spatial clustering is seen even if the genetic data is null. Also, voxels were down-sampled which may introduce partial volume effects. However, performing a vGeneWAS scan on non-down-sampled, original sized images is estimated to take 4,372 days (or approximately 12 years) to complete. To this end, the extent to which a gene affects regions of the brain should be interpreted cautiously; however, certain patterns of gene effects that appear in non-adjacent structures or in large clusters may signal gene effects not attributable purely to spatial smoothing or partial volume effects.
To better understand the contribution of genes to global versus local brain structure differences, we conducted association analysis on both (1) the globally normalized brain images and (2) estimates of total intracranial volume (eTIV) that contain information on overall differences in brain scale. We searched for specific gene effects on global brain volume differences using our gene-based method. We computed brain volume measures (eTIV) from our dataset using the automated FreeSurfer package (Fischl et al., 2002). Using the eTIV measure as the phenotype we tested each of the 18,044 genes for association. Looking at the top genes that we found in our analysis of the normalized images, none of the top 20 most highly associated genes was associated with eTIV phenotype. This provides further evidence that the genetic effects we are detecting exert influences on regional brain volumes rather than simply reflecting non-specific effects on the overall volume of the brain.
Additionally, results should be interpreted cautiously when global anatomical normalization is used. By removing, as far as possible, the effects of individual brain size variation from the data, it is possible to discover genes that may have a specific effect on a particular structure, above and beyond any overall genetic effects on brain size (as brain size itself is heritable). Global normalization is commonly performed in all brain mapping studies, as many extraneous factors affect an individual’s head size, body size or height that may not be relevant for cognition or for understanding brain function. Global anatomical normalization adjusts for this source of variance in the data, to a large extent, making more localized effects easier to identify. Even so, by applying global anatomical normalization, some genes may be missed that influence the total size of the brain. In fact, if a gene were responsible for influencing brain size, but had a uniform effect on all brain regions, it would be missed in the current analysis, as global effects are discounted. As such, in addition to mapping gene effects, it makes sense to also perform genetic analyses of whole brain size, as has been performed in two recent studies (Paus et al., 2011; ENIGMA Consortium, 2011).
In conclusion, our method may be used to perform gene-based tests on any 3D brain maps, such as data from voxel-based morphometry, diffusion tensor imaging, and cortical surface data. In addition, we found a set of candidate genes that may substantially affect brain morphometry and warrant further study.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904, 3U01AG024904-03S5). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics, Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., as well as non-profit partners the Alzheimer’s Association and Alzheimer’s Drug Discovery Foundation, with participation from the U.S. Food and Drug Administration. Private sector contributions to ADNI are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation. We also thank the many contributors to ADNI-1 genotyping sample curation at NCRAD (Kelley Faber), performing BeadChip assays at TGen (David Craig), and bioinformatics problem solving (Indiana U: Kwangsik Nho; UC Irvine: Anita Lakatos, Guia Guffanti; Pfizer: Bryan DeChairo).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.