This first GWAS of mathematical ability and disability has nominated 46 SNP associations across two high- vs. low-ability samples, 10 of which have been validated in a third sample spanning the entire distribution of ability as a test of the QTL hypothesis. As we report no large effects, our results are compatible with those of studies of other cognitive abilities (Butcher et al. 2005a
; Meaburn et al. 2007
) and complex traits (McCarthy et al. 2008
), and suggest that genetic influence on mathematical ability is caused by multiple QTLs of small effect. Even so, when combined into a set, the 10 SNPs account for 2.9% of the phenotypic variance in our sample. The nomination of this set of SNPs in two high- vs. low-ability samples, and the significant influence the set demonstrates over individual differences across the normal distribution of ability, supports the QTL hypothesis that the same genes affect the entire spectrum of phenotypic expression. The QTL hypothesis is bolstered further by the findings that the 10-SNP set demonstrates a linear association with mathematics scores across the distribution (), and that children in our sample with 10 or more of the 20 risk alleles are nearly twice as likely to be in the low-performing group. The 10-SNP set has some predictive value for low mathematical performance in our sample (PPV = 0.46, NPV = 0.70). With no large effects expected, if future research in larger, more highly powered independent samples can replicate and add to our findings, there may come a time when such a SNP set will be useful in predicting genetic risk for mathematical difficulties or genetic precocity.
The main limitation of this study is power. Although the sample size was large, its power is limited to detect SNP associations of the small effect size that emerged from the GWAS. The pooling approach used to nominate SNP associations reduced power further (Barratt et al. 2002
). This is reflected in the fact that genome-wide significance levels were not reached in the two-stage scan. Although the addition of the second scanning stage improved the SNP-selection process, and ensured co-twins were in different samples, performing a joint analysis of samples 1 and 2 would have increased power (Skol et al. 2006
). Nevertheless, the economical pooling method retained 80% power to detect QTLs of 1% and 1.25% effect sizes in samples 1 and 2, respectively, and nearly a quarter of SNPs selected from these samples replicated with QTL associations in the individually genotyped sample 3. Still, as suggests a presence of additional associations of small effect sizes that sample 3 is underpowered to detect, further investigation in more highly powered samples is desirable.
The creation of a composite mathematics score from teacher ratings and web-test results represents a second limitation. However, as these two component measures were highly phenotypically and genetically correlated, and as we have reported equally strong 10-SNP-set associations for both measures, we believe this was a valid way to increase our sample size. Another issue concerns the large number of false positive results expected when conducting multiple tests on a genome-wide level. Our two-stage design was intended to go some way towards dealing with this problem. However, there are still many SNPs exhibiting RAS differences in one or both of our first two samples that may reflect true associations, yet have not been further investigated here because of financial restrictions. We have not corrected any of the P-values obtained from the pooled samples for the number of tests performed. This is because these first two stages were intended simply as a means of screening SNPs for inclusion into the individual genotyping stage of our design. The P-values were used only to rank SNPs in the first two pooling stages. As multiple-testing correction would not alter the rank order of the SNPs, it would not have affected the outcome of these screening stages.
Another limitation is the overlap of n = 380 between samples 2 and 3. As the exclusion of these individuals greatly depletes the extremes—and therefore the statistical power—of sample 3, and as these overlapping individuals' genotypes and quantitative trait scores for mathematics is new information which was lost in the pooling stage, we decided not to exclude them from the main analysis. However we did re-run all analyses on the smaller sample, with some promising results. In addition to this direct overlap, 156 sample 1 individuals have co-twins in sample 2, and 716 sample 3 individuals have co-twins in samples 1 or 2. As this may positively bias our findings by over-inflating P-values it is one of the most important limitations of the study, however when striving for the largest possible N from a twin sample, such an overlap was unavoidable.
Although the three samples were not completely independent, information gathered from samples 1 and 2 concerned allelic and mathematical-performance group averages, and was used solely to identify SNPs for testing in a third sample comprising only one twin of a pair, using individual genotypic and phenotypic information. Nevertheless, we have selected and then tested SNPs in samples which overlap entirely in the phenotypic measures used, and also to a large extent genetically. Although this matching of measures and sample demographics overcomes many of the problems faced in the replication of molecular genetic findings, it also limits the ability to generalize our findings to a wider population. Further investigation of these SNPs in independent samples with greater statistical power is vital before we can draw any definite conclusions regarding their contributions to mathematical ability and disability. Indeed, our findings will almost certainly be subject to a ‘winner’s curse’ effect [discussed in Newton-Cheh & Hirschhorn (2005)
and Kraft (2008)
], in which the already small effect sizes reported have actually been overestimated in our discovery sample. The future of molecular genetic investigation into mathematics will ideally involve far larger, more highly powered samples to detect the expected small effects.
Although none of the SNPs identified fall within coding regions, or any known binding/splicing sites of interest, they can contribute to a SNP set of potential markers for mathematical ability and disability. They may also highlight possible candidate genes for mathematical ability and disability (). One example is that of NRCAM
, a gene encoding the Bravo/NrCAM neuronal cell adhesion molecule (Grumet M, 1997
), a protein involved in neuron-neuron connections in the developing and mature nervous system, and implicated in synaptic plasticity and memory processes (Hoffman 1998
In addition to this involvement in brain function, NRCAM
has been reported to be associated with autism (Bonora et al. 2005
; Marui et al. 2008
). Although the intronic SNP implicated here (rs2300052) has not been studied in relation to autism, it is in high LD (r2
> 0.70) with all previously associated NRCAM
-tagging SNPs (Bonora et al. 2005
; Marui et al. 2008
) based on HapMap data (The International HapMap Consortium 2007
). Of the SNPs previously associated with autism, rs2300052 is in highest LD (r2
= 0.83) with rs2300045. Common haplotypes estimated from HapMap data indicate an association between the rs2300052 allele conferring lower mathematical ability and the rs2300045 allele conferring autism risk. This is in keeping with the observation that although some autistic savants exhibit high mathematical ability, autism is generally associated with lower IQ, and even within high-functioning individuals with Asperger's syndrome, mathematical ability is significantly lower (Chiang & Lin 2007
). Although some studies reject NRCAM
as an autism candidate (Hutcheson et al. 2004
), our data suggest a possible link between low mathematical performance and autism risk through NRCAM
function, although effects are likely to be small.
Of particular interest are MMP7, GRIK1
, the genes associated with the top three ranking SNPs in our study, whose associations remained significant after Bonferroni correction for multiple testing. MMP7
encodes a member of the matrix metalloproteinase (MMP) family. MMPs are involved in the breakdown of extracellular matrix during normal physiological processes such as embryonic development, growth and tissue repair (Chakraborti et al. 2003
encodes an ionotropic glutamate receptor kainate 1. Kainate receptors mediate neurotransmission and synaptic plasticity (Bortolotto et al. 1999
; Huettner 2003
), and dysfunction has been implicated in a number of psychiatric phenotypes (Gratacòs et al. 2008
; Woo et al. 2007
encodes the dynein axonemal heavy chain 5 protein. Dynein is the force-generating component of cilia, the correct functioning of which is essential in all areas of embryonic growth (Hornef et al. 2006
), and DNAH5
in particular has been demonstrated as vital for normal brain development (Ibanez-Tallon et al. 2004
The intricate involvement of these genes in development—especially the direct links of GRIK1
to brain development—is indicative of the variety of genes one might expect to exert small effects over cognitive abilities such as mathematics. It is likely then that the influence of such genes would also be evident in other cognitive domains. Indeed, quantitative genetic research indicates a substantial genetic overlap between reading, mathematical and general cognitive ability (g) (Kovas et al. 2005
; Markowitz et al. 2005
). Although the 10 QTL associations identified here neither fall within previously reported dyslexia linkage regions (McGrath et al. 2006
; Paracchini et al. 2007
), nor overlap with findings of association studies of reading (Meaburn et al. 2007
; Seshadri et al. 2007
) and g (Butcher et al. 2005a
), there may still be an overlap in their influence. Along with the essential replication of our results in large independent samples, one interesting future direction may be to explore the generalist genes hypothesis at the molecular genetic level, by investigating the effects of these SNPs on other cognitive abilities.