|Home | About | Journals | Submit | Contact Us | Français|
Psychiatric disorders such as schizophrenia are commonly accompanied by cognitive impairments that are treatment resistant and crucial to functional outcome. There has been great interest in studying cognitive measures as endophenotypes for psychiatric disorders, with the hope that their genetic basis will be clearer. To investigate this, we performed a genome-wide association study involving 11 cognitive phenotypes from the Cambridge Neuropsychological Test Automated Battery. We showed these measures to be heritable by comparing the correlation in 100 monozygotic and 100 dizygotic twin pairs. The full battery was tested in ~750 subjects, and for spatial and verbal recognition memory, we investigated a further 500 individuals to search for smaller genetic effects. We were unable to find any genome-wide significant associations with either SNPs or common copy number variants. Nor could we formally replicate any polymorphism that has been previously associated with cognition, although we found a weak signal of lower than expected P-values for variants in a set of 10 candidate genes. We additionally investigated SNPs in genomic loci that have been shown to harbor rare variants that associate with neuropsychiatric disorders, to see if they showed any suggestion of association when considered as a separate set. Only NRXN1 showed evidence of significant association with cognition. These results suggest that common genetic variation does not strongly influence cognition in healthy subjects and that cognitive measures do not represent a more tractable genetic trait than clinical endpoints such as schizophrenia. We discuss a possible role for rare variation in cognitive genomics.
Memory and cognitive problems are characteristic of many common psychiatric and neurological disorders. Patients with psychiatric disorders such as schizophrenia and bipolar disorder manifest with cognitive impairments that are unresponsive to treatment (1,2) and it has been shown that the level of cognitive function is a strong predictor of the ultimate outcome of the disease (3,4). Alzheimer's disease is characterized by severe memory and cognitive decline with no effective treatment. It is therefore essential, both for the patients and for society, to develop treatments that improve cognitive symptoms in these disorders.
It was hoped that the development of whole-genome technologies and their application to common disease would shed light on the neurobiology underlying psychiatric disease, leading to better treatment options. Unfortunately, several genome-wide association studies have now been performed with schizophrenia patients and controls and have shown it to be unlikely that common single nucleotide polymorphisms (SNPs) or copy number variants (CNVs), as represented in current genome-wide SNP platforms, will succeed in explaining much of the variation in disease predisposition (5–10). Findings have been similar for bipolar disorder (11–14). Although very large-scale genetic studies have yet to be reported, it now seems that common variation does not account in large part for the strong heritability of these disorders (15,16).
One suggested explanation for this ‘missing heritability’ (17) is that clinical endpoints do not reflect biologically cohesive entities and that patients with diagnoses such as bipolar disorder and schizophrenia may in fact have a whole range of overlapping syndromes which could be better classified and more successfully investigated if they were defined by more specific phenotypic measures or endophenotypes (18,19). Such measures would be easier to investigate genetically because they would represent a more direct link between the genetic dysfunction and its specific biological consequences (20–22).
Cognitive function has been shown to be highly heritable (23–26). Additionally, cognitive impairments associated with neuropsychiatric disorders are also present in unaffected relatives at a higher rate than in the general population and are present in patients even when the illness is not active (27). These facts suggest that cognitive measures perfectly fit the description of ‘endophenotypes’ for neuropsychiatric disease (22). It has therefore been suggested that by studying relatively small sample sizes of healthy subjects, we will be able to find associations between cognitive phenotypes and common genetic variation.
To date, a single genome-wide association study has examined the role of common genetic variation in memory (using a word recall task) and implicated the genes KIBRA (28) and CAMTA1 (29). We could not, however, replicate the KIBRA association in our own studies (30) and the CAMTA1 association has yet to be independently replicated. In addition, over the past 5 years, hundreds of papers have reported associations between particular genetic polymorphisms and memory or other cognitive phenotypes in both healthy volunteers and patients with schizophrenia and other neuropsychiatric disorders (15). Many of these studies have concentrated on a small number of genes in the dopamine and serotonin pathways [e.g. COMT (31–34), DRD4 (35,36), HTR2A (37–39)], or the val66met polymorphism in BDNF (40,41)]. However, despite the huge body of research, findings have been inconclusive. The interpretation of these studies has been hampered by small sample sizes, publication bias (42,43) and lack of correction for population stratification. Unfortunately, this has not been unusual in the field of candidate gene association studies (44). In the wake of genome-wide association studies, however, clear standards of evidence for both executing and interpreting human genetic association studies have emerged (45,46). Here we attempt to apply those standards in our interpretation of the association evidence in the same candidate genes that have been widely studied in cognitive genomics, as well as in analyzing whole-genome association data for multiple cognitive phenotypes.
We have investigated the role of common genetic variation in human cognition as assessed using tests from the Cambridge Neuropsychological Test Automated Battery (CANTAB). The CANTAB is a set of automated, computerized tests that assess different aspects of cognitive function. The tests are based on tasks that have been successfully used to investigate the neural and genetic basis of cognitive function in animals and have been shown to detect cognitive impairments in many neuropsychiatric conditions including Alzheimer's disease and schizophrenia, as well as being sensitive to differences in cognitive performance in healthy subjects (47–51). A recent study showed that subjects at a high genetic risk of schizophrenia were impaired in performing CANTAB spatial working memory (SWM) tests, regardless of whether they were manifesting any psychotic symptoms (52). Other studies have shown impairments in CANTAB test performance at early stages of schizophrenia development (53) and also independently of current symptoms in both schizophrenia and bipolar disorder (54). These studies provide support for the use of CANTAB tests as measurements for schizophrenia endophenotypes.
It is of value to examine the CANTAB test measures individually due to the likelihood that the different test measures reflect different cognitive processes that are underlain by different molecular pathways. However, it is possible that there are also genetic determinants of a general cognitive ability. We therefore examine both individual cognitive measures (n = 10) and a principal component measure that reflects performance across all tests.
The full CANTAB battery (see Materials and Methods) was assessed in 1000 subjects and the short battery comprising just the spatial and verbal recognition memory (SRM and VRM) tests was assessed in a further 630 subjects. After genotyping quality control checks, Eigenstrat analysis and exclusions based on drugs or illness, 1295 subjects remained, comprising 789 that completed the full battery and 506 that completed the short battery. Details of the cohort are displayed in Table 1.
To select which CANTAB measures to use for phenotypic analysis, we used data from 99 monozygotic and 99 dizygotic twins that performed the full CANTAB test battery to assess heritability of the different measures (Table 2). The least heritable measures were paired associates learning (PAL) first trial score, SWM within errors (all stages) and SWM between errors (four box stage). These measures were therefore not examined as phenotypes for genetic analysis. To reduce redundancy between the measures, we also dropped SWM between errors (6 boxes) as this had a low heritability and was highly correlated with SWM between errors (eight boxes). Although spatial span (SSP) had a low heritability (14%), there was no more heritable alternative measure from this task so we retained this as a phenotypic measure.
In linear regression models [implemented in PLINK (55)] including sex and 18 EIGENSTRAT axes as covariates (as well as additional covariates shown in Table 3), we found that no single polymorphism showed a genome-wide significant association in the discovery cohort: the lowest P-value for any phenotype was 1.1 × 10−7. The most strongly associated SNPs are shown in Table 3 and the top 100 associated SNPs for each phenotype are shown in Supplementary Material, Table S1. Only six SNPs appeared in the top 100 associations for more than one phenotype (not including PC1). These are shown in Supplementary Material, Table S1 and include SNPs in the 3′-UTR of MDM4 and KATNAL1, and intronic SNPs in HTR2A (HTR2A SNPs further discussed below). Associations with PC1 were among the least significant of all of the traits.
We then went on to examine the role of common CNVs in performance of the CANTAB tests using a set of 286 SNPs shown to tag common CNVs [(56) and provided by pers.comm. from Drs Altshuler and McCarroll]. Of the 286 tagging SNPs, 178 were represented on all three genotyping chips employed during this study and passed QC in our dataset. None of the CNV-tagging SNPs showed an association with any phenotype in our dataset that was significant at P < 0.05 after correction for all 178 tagging SNPs and 11 phenotypes (significance threshold, P = 0.000026). The most strongly associated CNV-tagging SNP was rs4735895, which tagged a CNV at chr8:584944-585908 within the ERICH1 gene, and associated with IED extradimensional shift errors at P = 0.0008. One SNP associated with 6/11 of the phenotypes [IED, SSP, PAL eight patterns, VRM, pattern recognition memory (PRM) and PC1], with P-values ranging from 0.04 to 0.002. This SNP, rs7829965, tagged a CNV at chr8:24201375-24207011 that was just upstream from the ADAM18 gene.
Next, we attempted to replicate previous published associations with memory phenotypes where this was possible in our dataset. We searched for genetic association studies performed on healthy human subjects in whom a genetic variant had been associated with a cognitive task based on short-term SWM, immediate verbal recall, PAL, attentional set shifting or sustained attention. For these we looked to see if the specific previously associated variant was represented in our study, and also what was the lowest P-value in our study in the gene and surrounding 20 kb (Table 4).
For the eight previously associated SNPs that were represented either directly or by proxy in our study, four associated with P < 0.05 with one or more of our phenotypes, and a further two showed borderline associations at P ≤ 0.06. However, when the significance threshold is corrected for the 11 phenotypes, none of these associations are significant. Since these SNPs may not tag the causal variants, we also looked for association with all genotyped SNPs in each gene, which revealed some associations that withstood correction for multiple testing for all SNPs tested within the individual genes. The strongest associated SNPs were a cluster of three intronic HTR2A SNPs: rs2770298, rs9316235 and rs927544, which associated with PAL errors at eight patterns (PAL8) at P = 0.00003–0.00004. The first two of these are in strong linkage disequilibrium (LD) with each other and are not obviously functional. The third SNP, rs927544, is only in partial LD with the other two associated SNPs and associates with expression of the HTR2A gene at P = 0.04 [in immortalized B-lymphocytes, using data from the GENEVAR database (57), annotated by WGAviewer (58)], but not in healthy brain tissue from the SNPEXPRESS database [http://people.genome.duke.edu/~dg48/SNPExpress/) (59)]. It is interesting to note that these SNPs are not in strong LD with the previously associated functional SNP rs6314, (which did not associate with any of our phenotypes), nor with the other SNPs recently associated with 5 min and 24 h delayed (but not immediate) verbal recall (38).
Three other genes, NRG1, KIBRA (aka WWC1) and AKT1 had associations that remained significant after correction for all tested SNPs within the individual genes. The strongest associated SNP in KIBRA was rs17633196, which associated with PAL8 with a P-value of 0.0002. This intronic SNP was not associated with the previously associated SNP rs17070145 (r2 = 0.06 in HapMap CEU) but was associated (r2 = 0.77) with a non-synonymous coding SNP, rs17551608 that associated with PAL8 at P = 0.0007. Given that we have previously failed to replicate effects of KIBRA on long-term verbal memory (30), these data apparently associating KIBRA with short-term spatial memory should be considered with caution. The strongest associated SNPs in AKT1 and NRG1 were rs2494738 and rs1481765, respectively, both intronic SNPs with no obvious function.
It should be noted that if we test 869 independent SNPs with 11 different phenotypes, we would be likely to see associations at or below 0.00003 one in four times (0.00003*9559 = 0.27) by chance. We therefore find no convincing evidence supporting the role of any specific polymorphism in and near the set of candidate genes. We can also address whether there is evidence of a general enrichment of low P-values amongst the 869 polymorphisms by evaluating a quantile–quantile plot comparing observed and expected P-values (Fig. 1). This plot reveals an apparent excess of low P-values, suggesting the possibility of multiple variants with modest effects within and near these candidate genes.
Finally, a number of genomic regions have been implicated in neurocognitive processes due to the association of rare structural changes with neurological or psychiatric illness. It is possible that these deletions are indicative of genomic regions that harbor common or rare genetic variants that are important in cognitive processes. We therefore focused specifically on these regions to see if there was evidence for increased association with healthy cognition. We selected seven large deletion regions: 1q21, 8p11, 15q11, 15q13, 16p11.2, 16p12.4–p13.11 and 22q11, and three specific genes: NRXN1, CNTN4 and APBA2 (Table 5). Looking just at the PC1 score, we examined the association P-values for each of these regions to see if there was an excess of low P-values. We found that only one of the regions, the NRXN1 gene, showed stronger association values than expected by chance (lowest P = 0.00003, significance threshold after Bonferroni correction for 310 SNPs, P = 0.0002). The Q–Q plot indicated that the P-values for this region for association with PC1 were generally lower than would be expected by chance (Supplementary Material, Fig. S1). Since there was high LD across this region, we repeated the analysis, this time using PLINK (55) to prune the dataset such that all SNPs were in approximate linkage equilibrium with each other. After this, 57 of 310 original SNPs remained, and the lowest P-value was 0.00050, which remained lower than the Bonferroni corrected P-value threshold for 57 SNPs of 0.0009, and the highest associated P-values were still much lower than expected under the null hypothesis (Supplementary Material, Fig. S1). Because 10 subjects had deletions in the NRXN1 gene (ranging in size from 5 to 322 kb), we looked to see if these were associated with a particular allele of the strongest associated NRXN1 SNP (rs4971648), however these deletions were distributed across the rs4971648 genotypes so are unlikely to be responsible for this association. None of the other regions showed evidence of significant association with PC1 (Table 5). Some of the Q–Q plots indicated a degree of association much less than would be expected by chance, but after elimination of SNPs in LD, the P-value distributions conformed approximately to that expected under the null hypothesis, although some remained somewhat lower than expected (Supplementary Material, Fig. S2).
In summary, we have looked for genetic associations with 10 cognitive phenotypes and their first principal component in a large sample size using a genome-wide SNP panel thought to represent the majority of common genetic variation in non-African populations (60). We were not able to identify any common SNPs that associated with any of the cognitive tests after correction for all of the tested SNPs.
We also examined, for the first time, the role of common CNVs in these cognitive phenotypes, using a set of CNV-tagging SNPs reported by McCarroll et al. (56). Again, we could not find any associations that remained significant after correction for multiple testing. However, in this case, we cannot claim to have performed a comprehensive analysis, since many known common CNVs cannot be identified with the genotyping platforms used in this study.
Although our study is a negative one in the sense that there is no polymorphism that can be clearly connected to cognitive performance, there is an intriguing suggestion in our data of a role for polymorphisms near previously studied candidate genes. We emphasize that this observation does not provide a replication of any previous findings, because the variants showing the strongest association in our study are not the same variants previously associated, violating now accepted standards of replication (46). On the other hand, the apparent enrichment of low P-values in the candidate genes does warrant consideration. One possible explanation is that there are multiple causal variants in these genes creating variable signals of association. We must view these observations as highly tentative, however, since it is difficult to assign formal statistics to patterns in Q–Q plots.
Because several genomic regions have recently been associated with neurological and psychiatric disorders such as epilepsy, autism, schizophrenia and mental retardation, we also focused specifically on a set of these regions to see whether there was evidence for association with healthy cognition. Perhaps surprisingly, these regions were, in general, rather less associated with our PC1 cognitive score than would be expected under the null hypothesis. The NRXN1 gene, however, did show an excess of low P-values, and the lowest P-value withstood correction for all the SNPs tested at that locus. This suggests one of two possibilities: either the strongly associated SNPs themselves (all intronic) or a genetic variant they are in LD with directly affect the cognitive scores or that there are multiple rare genetic variants in this region that by chance occur more frequently with one allele of a common variant than the other. Given the multiple recent reports of rare deletions in NRXN1 in schizophrenia and autism (6,61–65), this locus seems to warrant further investigation.
In order to assess power, we first note that since we are dealing with a quantitative trait, the power is a function of both the inter-genotype differences and the allele frequency, and these parameters are confounded into the proportion of population variation that the polymorphism is responsible for. A further caveat is that the variant must be sufficiently common to be well represented either directly or indirectly on the gene chips (that is, they have a minor allele frequency of at least ~5%). We assume that such variants are nearly perfectly tagged and estimate power by simply considering the proportion of variation for the trait in the total population that a given SNP is responsible for. We can then evaluate the probability that a test statistic will reach a given significance threshold for such an SNP. Thus, for the tests assessed in ~750 people, we had 95% power to detect a common variant that explained 5% or more of the variation in the cognitive trait, 84% power to detect a variant accounting for 4 and 58% power to detect a variant accounting for 3%. For the larger sample sizes (SRM and VRM), we had close to 100% power to detect a variant accounting for 4% or more of the variance, 95% power for a variant explaining 3, and 67% power to detect a variant accounting for 2% or more of the variance. Although it is possible that we could have failed to detect some genetic variants in this study with strong effect sizes, especially those that are not well tagged, it is highly improbable that we would have failed to detect multiple such associations. We can therefore conclusively rule out the possibility that the high heritability of human memory and related cognitive phenotypes is fully accounted for by a small number of SNPs with very strong effects.
There are a number of factors that may serve to reduce the power of this study design to detect associations with certain types of genetic variant. Since the majority of this cohort had some form of higher education and were aged below 40, we have reduced ability to detect associations that are specific to older or less well-educated populations. It is also possible that EIGENSTRAT has not completely corrected for population stratification and this may reduce the power to find truly associated variants that differ strongly in frequency between the two main race/ethnicities in the study (Asian and European).
We also note that there are many types of learning and memory that we have not assessed in this study, including delayed or remote memory—which could theoretically be under much simpler genetic control. We cannot generalize these findings to other cognitive phenotypes. However, we have examined multiple cognitive domains in this study, and, in a separate study, have also failed to find genome-wide association with traditional neuropsychological test measures (Cirulli et al., unpublished data). This evidence, together with the similar findings emerging from multiple neuropsychiatric traits, suggest that neurocognition, in health and disease, is not going to be a simple phenotype to genetically decipher.
There are a number of possible explanations for the lack of positive findings in this study. It is possible that the traits being measured are too noisy to gauge a reliable phenotype. The phenotypes used in this study, as for other genetic studies of cognition, were collected during a single test session, and many non-genetic variables are known to affect cognitive performance from day-to-day, including fatigue, hunger, motivation, affective distress, illness. However, we performed heritability assessments on the data gathered in this study by comparing dizygotic and monozygotic twin pairs and found that most of the measures had a very substantial heritable component by this measure. Some of the heritabilities (especially IED and PRM) may be over-inflated here due to a low correlation in dizygotic twins, however, the heritability of memory as measured by CANTAB tests is supported elsewhere using other methods (23). This suggests that these measures are suitable phenotypes for a genetic study. It is possible that methodological weaknesses inherent in the monozygotic–dizygotic twin comparisons have led to systematic overestimation of the heritabilities of cognitive traits. For instance, monozygotic twins, due to their very similar appearances, may be treated more similarly than dizygotic twins in educational environments. Other types of heritability study, e.g. comparison of adopted children to biological parents (66) or of twins reared apart (24) give estimates of heritability of ~0.3–0.5 for cognitive tests similar to those used here. However, no measure of heritability is without flaws and it is possible that the genetic component of cognition has been systematically overestimated.
Three further explanations for these findings remain consistent with a role for common variation in normal cognition: first, those CNVs, or other types of common genetic variant (e.g. microsatellites) that were not well represented by tagging SNPs in this study, could account for the heritability of these traits. However, due to the very large SNP panel used, and the degree to which these SNPs represent the total amount of common variation in the genome (56,60), it is unlikely that most of the genetic contribution to cognition happens to be both common and unrepresented.
Secondly, it could be that common genetic variation underlies cognitive traits but that the variants interact with each other to such an extent that they do not produce detectable main effects in these sample sizes. Tackling this possibility is going to be difficult, given the number of possibilities for even simple two-way interactions. One is faced with a high risk of either type I error (if one fails to adequately correct for multiple testing) or type II error (since the interaction effects would have to be huge to achieve formal significance in consideration of the number of tests). With currently assembled datasets, it seems implausible to perform non-targeted screens for interactive genetic variants with effects on neurocognition. Interactive effects between predetermined candidate variants could be assessed but presently there are no good candidate polymorphisms that consistently replicate between different datasets. Thirdly, the genetic variation could be completely attributed to hundreds of common variants each with a tiny effect size, however there are some reasons for doubting that is the case for cognitive disorders and arguably for normal cognitive function (6,58).
As has been suggested for other neuropsychiatric traits (67,68), it is possible that epigenetic modifications affect cognition in healthy subjects. This kind of genetic contribution would be undetectable with these methods and remains to be explored.
Perhaps the most likely explanation for these findings is that the genetic variation underlying neurocognitive traits is too rare to be detectable with current genotyping platforms. There is strong recent evidence for this in other neurocognitive traits such as schizophrenia, autism and epilepsy, in which large, rare CNVs have been associated with disorders that have not shown obvious associations with common variation in genome-wide screens (6,69–72). This hypothesis has considerable implications for the future study of cognitive traits.
Whether the majority of associated variants will be only marginally or substantially less common than those we have examined in genome-wide association studies we do not yet know. If the variants are only marginally below the detection threshold for current genome-wide association studies (minor allele frequency ~3–5%), then the new catalogue of variants down to a frequency of 1% generated by the 1000 genomes project (73) will provide a useful set of variants for further study of cognitive phenotypes. However, very large sample sizes would be needed, particularly given the measurement error inherent in cognitive testing.
If, on the other hand, the most important variants are less than 1%, this framework will not work, and complete genomic resequencing at a high coverage will be necessary to detect memory-influencing genetic variants. However, it is not obvious how to apply this methodology to a trait such as normal variation in cognition. For traits that clearly have extreme phenotypes, discovery of very rare associated genetic variants may be possible by searching for enrichment of causal variants in subjects with the extreme phenotypes, when compared with controls. However, the relevant extreme phenotypes for cognition are not obvious. Consideration might be given to looking for association with neurophysiological phenotypes, such as ion-channel activity in cultured neurons, or synaptic vesicle size (74); or to detailed phenotyping of cognitive tasks that may more directly reflect such specific activity, such as visual attention (75). Another possibility would be to use subjects with exceptional memory (76) or very high IQ. Alternatively, if the endophenotype hypothesis is correct, the same genes that control cognition in healthy subjects are those that have gone awry in schizophrenia. If this is the case, a promising direction would be to search for rare genetic variants that are enriched in schizophrenia patients (e.g. NRXN1) and then look at the effects of rare variants in those same genes in healthy people.
Finally, one clear and surprising finding that has emerged from the recent associations of rare CNVs with neuropsychiatric disease is that the same rare variants are associated with multiple neuropsychiatric conditions (such as autism, mental retardation and schizophrenia), rather than being confined to particular disease classifications, and are also present in apparently unaffected people (6,72,77–79). Since all these neuropsychiatric conditions can be associated with some form of cognitive impairment, there is some possibility that these rare genetic variants are acting indirectly to contribute to these disorders by causing cognitive changes. Differences in the ultimate phenotype (which, if any, disease state) may then be dependent on interaction with other genetic or environmental influences. If this is the case, detailed cognitive assessments of patients and unaffected relatives carrying such variants may reveal specific cognitive deficits associated with the rare variants.
In conclusion, our findings indicate that cognitive endophenotypes will not be the simple solution to the problem of complexity in schizophrenia genetics, and that, like schizophrenia itself, the heritability of cognitive traits cannot be accounted for by common variants with strong effect. We suggest that these findings may be attributed to a stronger role for rare variation in cognition than previously expected and suggest some possible future directions for cognitive genomics.
The US participants were recruited by means of IRB-approved advertisements displayed around the Duke University and North Carolina State University campuses, and emailed to student and staff list servers. Subjects that performed the full battery were scheduled for a test session during which they filled out consent forms and questionnaires, gave 20 ml of blood and performed the cognitive tests. The twins were recruited during a scheduled interview as part of their participation in an ongoing twin study at the Department of Twin Research and Genetic Epidemiology at King's College, London, UK. Their DNA was already banked but they filled out the same questionnaire data and underwent the same cognitive tests as the Duke and NC State participants. Both twins were tested during the same interview session on the same day. The CANTAB comprised the following tests, in this order: PAL, SWM, verbal recall (VRM) intra-extradimensional set shifting (IED), rapid visual processing (RVP), PRM, SSP and SRM (see Supplementary Methods for further details of the test battery). The battery took ~1 h to complete and was administered to all participants in a private room under supervision of a trained administrator who read the instructions from a script. The test scores were automatically assigned and stored in the CANTAB computers for later extraction. Participants that took part in the short battery were recruited by means of posters and display boards in high traffic areas of the campuses. They filled out consent forms and questionnaires (Supplementary Methods), gave a saliva sample and underwent testing in a private room at the time of recruitment, without appointment. Testing took 5–10 min and was performed by trained personnel as for the full battery. A total of 326 of the full battery participants, as well as all 674 of the short battery participants underwent a modified test battery, designed to remove some practice stages and eliminate ceiling effects. Details of the modifications are described in Supplementary Methods.
The following test scores were extracted from the twin data (all tested with CANTAB clinical mode): PAL total errors six patterns, PAL total errors eight patterns, PAL first trial memory, SWM between for four, six and eight box stages, SWM total within for four, six and eight box stages, SWM strategy, IED EDS errors, RVP probability of a hit, PRM percent correct, SSP span length, SRM percent correct and VRM free recall. The heritability of each measure was then estimated as twice the difference in correlation between MZ and DZ twins: h2 = 2[r(MZ)−r(DZ)].
The following measures were included in a principal components analysis: PAL errors six patterns, PAL errors eight patterns, SWM between errors (eight boxes), SWM strategy, IED EDS errors, RVP probability of hit, PRM % correct, SRM % correct, SSP span length, VRM immediate recall. All subjects with no missing data for any of the measures were included, regardless of whether they did the original or the modified test battery (n = 701). Only one of each twin pair was included. Three components had an eigenvalue above 1: the first, with an eigenvalue of 3.63, accounted for 36% of the variation in the data and all tests contributed in the same direction to this axis, suggesting that this axis may reflect general cognitive ability. When PC1 was compared between monozygotic and dizygotic twins, the heritability was found to be 0.88. The score on the principal component axis 1 was therefore also included as a cognitive phenotype.
All further stages included only one of each twin pair. The samples were genotyped on the Illumina Infinium HumanHap550 version 1 (555352 SNPs) or version 3 (561466 SNPs) BeadChips or the Infinium HD 610-quad BeadChips (598966 SNPs, invariate CNV probes were automatically excluded from the SNP association files). The Infinium BeadStudio raw data analysis was carried out as described in Fellay et al. (80). For further details, see Supplementary Methods. After QC procedures, 1576 subjects remained. Only SNPs that were present on each of the three genotyping platforms (Illumina 550v1, 550v3 and 610) were included in the genetic association testing (n = 535 752). Using the 1295 QC-passed subjects, we then used the PLINK ‘--missing’ function to examine missing data. We found that 47 183 SNPs had missing data in 1% or more of subjects. This is because clustering and calling was done within the different genotyping batches and SNP failures differed between the genotyping runs. These 47 183 SNPs were excluded from analysis, and after this all individuals had a genotyping rate of 99% or greater. We then removed a further 12 598 SNPs with a MAF < 0.004. This criterion ensured that at least six individuals of the rare genotype are present in the dataset of 768 subjects that had completed the full battery, to control for error in the estimation of asymptotic P-values (as alleles with MAF this low or lower have no chance of approaching significance). The final SNP dataset then totaled 475 971, with a total genotyping rate of 0.999518.
To correct for population structure, we used the EIGENSTRAT approach of Price et al. (81), which derives the principal components of the correlations among the genotyped SNPs. The scores on the significant principal component axes can then be included in the genetic association test to control for the possibility of false-positive associations due to population stratification. For further details, see Supplementary Methods.
We collected data on neurological and psychiatric history for 827/1347 participants (details of questionnaire in Supplementary Methods), and based on this we excluded two subjects that had suffered a stroke and one who had had a brain tumor or brain surgery. In addition, a pharmacist examined all the drugs and drug combinations that the subject reported and recommended the exclusion of 49 subjects, of whom 21 were taking amphetamine based drugs, 11 were taking drugs that may cause excessive drowsiness and 17 were removed based on their particular types and combinations of antiepileptics, antipsychotics, antidepressants and antianxiety drugs. This left 1295 individuals in the analysis.
In order to determine which factors to include as covariates in the genetic analysis, we performed linear regressions to determine the effects of different covariates on the different cognitive phenotypes. Only subjects with genetic data were included. The covariates analyzed were: age, sex, whether English was learned as second language, site of testing, whether the original or modified battery was used, whether the full battery or the short battery (comprising just SRM and VRM) was taken and education. With the exception of sex, which was included in all analyses, the covariates were only included if they showed a significant effect on the particular cognitive phenotype. Owing to differences in educational systems between the USA and the UK, this was classified broadly into four categories: not completed college, current undergraduate student, obtained an undergraduate degree, current graduate student OR obtained a postgraduate degree. Because some twins were missing education data, for tests where education was a significant predictor of score these subjects were excluded. We tested for SNP associations with the 11 memory phenotypes with a linear regression model using the PLINK software (55) and including 18 EIGENSTRAT axes as covariates, as well as the additional covariates shown in Table 3. An additive genetic model was assumed. The results were then annotated using the WGAViewer software (82).
This work was supported by start-up monies awarded to David Goldstein by the Duke University Institute for Genome Sciences and Policy. The TwinsUK cohort receives funding from the Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013)/grant agreement HEALTH-F2- (ENGAGE project grant agreement HEALTH-F4-2007-201413 and the FP-5 GenomEUtwin Project QLG2-CT-2002-01254). The study also receives support from the Department of Health via the National Institute for Health Research (NIHR)comprehensive Biomedical Research Centre award to Guy's and St Thomas’ NHS Foundation Trust in partnership with King's College London. T.D.S. is an NIHR senior Investigator. The project also received support from a Biotechnology and Biological Sciences Research Council (BBSRC) project grant (G20234).
This study was funded by Duke IGSP start-up monies awarded to David Goldstein. We are grateful to all the participants who took part in this study. We also thank Nicole Walley, Jeff Dawson, Helen Onabanjo, Paola Nicoletti, Ana-Patricia Wagoner, Joshua Elmore, Liisa Bevan and Janice Hunkin for test administration, and Ian Cartland and Stuart Hacker from Cambridge Cognition who helped us design and implement the modified versions of the tests.
Conflict of Interest statement. None declared.