This study identified specific genetic variations associated with caudate volume in the human brain, in 1198 subjects. This is one of the largest brain imaging studies ever performed. There was sufficient power to trace heritable variation to specific variations on the genome, though not at a genome-wide significance level. We replicated the same genetic associations in samples from two continents (U.S. and Australia), separated in mean age by 50 years, and using data collected on scanners with different field strengths (4 Tesla and 1.5 Tesla). Additional replication in still larger samples would be advantageous, but this confirmation in two independent samples suggests that these associations may be robust and may persist throughout life.
The caudate volume is a reasonable starting point for investigating genetic influences on brain structure because it is highly heritable (), reliably delineated by automated recognition programs (
37,
38) (
Supplementary Figure 3), and has an established link to psychopathology. The estimates of caudate volume heritability from the BLTS cohort (shown in ) are around 0.76 for the ACE model (one of the standard classical twin models used to assess heritability), and around 0.90 for the best-fitting AE model. This agrees with a prior study assessing caudate volume in twins (
2), which showed caudate heritability of 0.70 to 0.79 in an ACE model. That study analyzed many other brain structures as well and, though the heritability coefficients of different regions were not directly compared for statistical significance, the caudate showed consistently high heritability relative to other structures.
A relatively large region on chromosome 5 was found to have replicated significance in its association with caudate volume in each of the independent populations, including genes
WDR41 and PDE8B ( and ). Functionally, the region containing both of these genes is essential to dopaminergic neuron development in zebrafish (
39).
WDR41 was also useful in improving the performance of a diagnostic classification algorithm, that used gene expression patterns to distinguish schizophrenia patients versus healthy controls (
40).
PDE8B is highly expressed in the rat brain and in neuronal cells (
41). The protein product of the gene, phosphodiesterase, is a key protein in the dopamine signaling casade. Dopamine binding to receptors stimulates or inhibits cAMP production, which is subsequently degraded by phosphodiesterase 8B (
42-
44).
PDE8B is associated with susceptibility to major depression and antidepressant treatment response (
45), and has higher expression in Alzheimer’s disease relative to controls (
46). Additionally, autosomal-dominant striatal degeneration is caused by a mutation in
PDE8B (
10).
The possible relation of these genes to a Mendelian disorder is also of great interest. Although specific variants known to cause Mendelian disorders do not necessarily influence normal variability or psychopathology, the same genes may be relevant for normal variability and psychopathology. In genetic studies of obesity, for example, common variants have subtler but similar effects to highly penetrant rare Mendelian mutations (
47). In that study, common SNPs within
ABCG8 and
LCAT increased risk for dyslipidemia. Mendelian mutations within those same genes are causal for dyslipidemia. Similarly, in our study, common SNPs within the
PDE8B/
WDR41 region were associated with differences in caudate volume and a Mendelian mutation within the
PDE8B gene is causal for an autosomal dominant form of striatal degeneration. This shows that Mendelian mutations may be clues for selectively picking genes to understand the normal variability, even though the specific Mendelian mutations themselves may not be involved in the normal variability.
Such replicated genetic hits suggest that our findings are consistent with the literature on dopamine function in the caudate. The caudate receives projections from the dopaminergic neurons of the
substantia nigra and has high concentration of D
1 and D
2 dopamine receptors (
48). These genes are crucial for the development and function of dopamine neurons. This provides biological plausibility that they may also contribute to variations in caudate anatomy.
WDR41 and
PDE8B-mediated differences in caudate structure accounted for 2.79% and 1.61% of the trait variance in the ADNI and BLTS samples, respectively at the most associated SNP. These genetic influences on dopamine function and brain structure may also influence behavior, as dopamine is essential for normal cognitive function (
49,
50). As such, the genes identified here may become candidates for examination on studies of disorders that affect the caudate, to determine whether they are over-represented in subjects with developmental insufficiencies or deterioration in caudate function.
Three other genes were identified as influencing caudate volume in the ADNI cohort, but were not replicated in the BLTS cohort.
GMDS encodes an enzyme involved in metabolism pathways, and is also important for neuronal migration (
51).
C10orf46 (also called
CAC1) has been characterized as a cell cycle associated protein (
52).
TMSB4X is expressed in the brain and involved with corticogenesis (
53) and with actin polymerization (
54). Lack of replication in both cohorts may be due to false positive findings or age-specific gene effects. Additionally, though the
DRD2 Taq1A allele was previously identified as putatively affecting caudate volume (
19) as well as availability of striatal dopamine D
2 receptors (
55), we found little evidence for
DRD2 Taq1A association with caudate volume in either cohort.
Some strengths and limitations of this study deserve comment. First, we identified some variants of interest for caudate volume; however, we are unable to provide mechanistic evidence for how these single base pair differences in the genome affect brain structure. Further mechanistic understanding could be derived by studying both the expression and protein function of the gene products that lie downstream of the SNP variations identified here. Unfortunately, no expression or protein data is currently available in either cohort to directly test these hypotheses. Second, we provide strong support for a particular region in the genome associated to caudate volume, yet it remains to be demonstrated that the genetic factors identified here are of interest for pathophysiology. Third, the two neuroimaging samples are taken from different parts of the lifespan. Replications of SNPs many indicate gene effects that persist, or have different modes of action, throughout life. Lack of replications could either be true negatives, or may reflect age- or cohort-specific effects. In a sense, the use of two very diverse samples on two different continents presents a very high bar for replication. Due to large differences in the mean age of the samples, it would be logical to assume that some robust genetic events may not be simultaneously found in both of these young and old cohorts. For example, there may be a greater preponderance of aging or apoptotic events in the ADNI sample and more developmental or synaptogenic processes in the BLTS sample. As such, the use of two very different samples is likely to identify genes of enduring relevance across the lifespan. This may miss or fail to replicate effects that are only occurring, or are more dominant, in late or early life. On these grounds, replication should not be taken to imply that the genes found in our study operate on the same biological processes over the lifespan. Nor should it be taken to mean that genes not found in our study are not influential – other genes could impact caudate structure only during one phase of life. Fourth, like other multifactorial traits such as height (
56), individual common variants have small effect sizes and account for only a small proportion of the overall heritability so can only be detected with large sample sizes. Missing heritability might be attributed to low power, rare variants, un-genotyped variants, epistatic interactions, or epigenetic contributions to heritability (
57). Finally, the ADNI cohort includes subjects across the continuum of healthy aging to mild cognitive impairment to Alzheimer’s disease. Any genetic association in ADNI could be mediated by normal atrophy that occurs with healthy aging or by the disease. To account for this, we were able to perform an analysis controlling for diagnosis through permutation. This showed little change in the degree of association, implying that illness category is not driving the association. Furthermore, the broad range of imaging phenotypes in ADNI is sensitive to effects that may be overlooked if the discovery sample were more narrowly defined. As single genes are likely to have small effects on behavior, several studies advocate examining multiple cohorts where the spectrum of observable variation is larger than that in the general population (
20,
23,
58,
59), especially in the discovery phase. Even so, we replicated the association in our young sample (healthy twins) so the gene effects are not restricted to those who are elderly or ill, and are also detectable in young people.
In this study, we assessed caudate volume rather than surface morphology because volume is an easily measured summary phenotype that is known to associate with disease. Additionally, performing simultaneous searches across both surface vertices and the genome requires complex statistical methods (
60,
61) not yet optimized for surfaces. Volume effects are also more interpretable and can be readily verified by many other groups.
Large GWAS commonly use a genome-wide significance threshold of
P < 5×10
−8 (
56) but less conservative thresholds have been established using permutation testing or by estimating the effective number of tests on the genome (
62). Here we used a search criterion to select SNPs that were highly associated in the larger cohort, at
P < 1×10
−5, and then tested for replication in a separate cohort. This threshold does not represent a genome-wide significance threshold, but rather a two-stage process that identifies interesting SNPs to carry forward to a second stage in which they can be replicated. This threshold value is somewhat arbitrary but has been used previously in the literature to identify interesting SNPs in large association studies (
63).
It is of interest that although we found a replication across samples, the individual associations did not reached genome-wide significance level in each smaller sample. Similarly in a previous GWAS study (
64), a top SNP was found in one cohort that was not genome-wide significant but replicated in others with a lower threshold. The meta-analysis in our study of the individual cohorts separately did not reveal genome-wide significance values for any SNP (). Thus, despite the replication in two samples, even more studies are needed to verify this association.
The marginally greater effect size for genetic association in the right versus left caudate may be due to the known asymmetries in caudate volume. As we found in a recent non-genetic study of a partially overlapping sample (400 ADNI subjects), the right caudate was 3.9% larger than the left in controls, on average, and 2.1% larger in MCI subjects - an asymmetry not found in AD (
13). This same asymmetry is reported in most, but not all, large morphometric studies (
65-
69). In the ADNI cohort, which focuses on elderly subjects, lower right caudate volume was associated with conversion from MCI to AD, with baseline ratings of dementia severity, immediate and delayed logical memory scores, future decline over one year in MMSE scores, and tau and p-tau protein levels in the cerebrospinal fluid (
13). Taken together, these observations suggest that a depletion in caudate volume may be associated with deteriorating cognition, but cognitive associations may not be detectable in healthy subjects as other brain systems may compensate functionally for mild atrophy or developmental insufficiency. Future meta-analyses in even larger samples, may be sufficiently powered to relate genetic differences in brain structure to observable differences in cognition or risk for the diseases in which the caudate is implicated.
Here we demonstrate a replicated - thought not genome-wide significant - association in a sample that is much smaller in size than those used in some current GWAS studies (
56). This strongly suggests that MRI-based measures of brain structure are powerful, genetically informative tools with which to search the genome and may be used successfully to find genetic variants in multi-site genetic meta-analyses such as through the Enigma project (
http://enigma.loni.ucla.edu). Our results highlight a region of the genome that may provide a stronger understanding of caudate neurobiology, brain structure in humans, and predisposition for the development of psychiatric and neurological illness.