IQ Scores Normally Distributed in the GAIN-ADHD Sample
Phenotypic and genotypic data were obtained from dbGAP. IQ scores from 627 individuals were quantified with the Wechsler Intelligence Scales for Children (WISC)29
and had a mean of 100.7 (SD = 15.7) and a median of 101.6. Skewness was calculated to be 0.063, and the excess kurtosis relative to a normal distribution was −0.057 (). Although this sample was originally ascertained for ADHD, this suggests that IQ scores are normally distributed in this sample.
Distribution of the IQ Scores in the 627 Individuals Analyzed in the GAIN-IMAGE Sample
Genome-wide SNP-by-SNP Analysis Does Not Detect Robust Associations
After QC, a total number of 438,783 SNPs were available for genome-wide association analysis. Of these, 179,725 SNPs mapped to 16,674 genes. The median number of SNPs per gene was 4.0, and the mean was 11.1, ranging from 1 to 1271 SNPs. First, we tested whether variation in single genes showed robust association with cognitive ability by applying a standard (SNP-by-SNP) genome-wide association analysis with the use of regression analysis implemented in PLINK28
and including only SNPs mapped to genes. Results of the SNP-by-SNP association analyses for all SNPs mapped to genes are shown in .
Manhattan Plot Showing Results of SNP-by-SNP Association with Cognitive Ability of the 179,725 SNPs Expressed in Genes
With the genome-wide threshold being 1 × 10−8
, none of the single SNPs reached significance. Obviously, this demonstrates mainly a lack of power and underscores the need for larger sample sizes or more sophisticated use of available information. With the current sample size of 627 subjects and a genome-wide significance level of 1 × 10−8
, there was sufficient power for the detection of SNPs explaining at least 6.7% of the variance (Genetic Power Calculator30
). For the detection of SNPs of small effect size (e.g., 2%) with reasonable power (0.80), a sample of 2138 subjects would have been needed.
Analysis of Synaptic Functional Gene Groups Identifies Association with Cognitive Ability
Before testing detailed functional groups, we first tested whether genes expressed in synapses are associated with cognitive ability. This group contains 1024 genes, of which 900 (22,324 SNPs) were included on the Perlegen chip. Our results suggest that SNPs associated with cognitive ability are not randomly distributed across the genome, but tend to cluster in genes that are known to be expressed in synapses (p = 0.001) ().
Results of Association Analysis of All Synaptic Genes with Cognitive Ability
The 900 genes that are known to be expressed in the synapse and were included in the Perlegen chip were assigned to one of 17 functional gene groups on the basis of cellular function. Because some of the synaptic genes did not fit into any of the functional synaptic gene networks defined, we also tested this group of remaining synaptic genes as a negative control (the “unknown” group). The 49 genes in this group are expressed in the synaptic terminal, but their function is currently unknown.
lists the results of the joint association analysis of SNPs within each of the functional groups, and and provide the quantile-quantile (Q-Q) plots for each of the tested functional gene groups.
Results of the Association Analysis of Functional Gene Groups and Biological Pathways with Cognitive Ability
Results of Association Analysis Corrected for λ of the Twelve Most Significant Functional Gene Groups
Results of Association Analysis Corrected for λ of the Remaining Functional Gene Groups and All Tested Biological Pathways
As expected, the “unknown” group did not show any evidence for an association with cognitive ability (empirical PEMP
= 0.880). The group of synaptic heterotrimeric G proteins, however, yielded an overall test statistic for association with cognitive ability that was better than that expected for a group that size on the basis of chance alone (PEMP
= 1.9 × 10−4
), given the preset threshold of 0.0022. This suggests that the combined effect of genes within this group plays a role in explaining variation in cognitive ability. In the case of a single SNP, such a p value would correspond to an overall calculated effect size of at least 3.3% of the variation in cognitive ability.30
However, it should be noted that this calculated effect size (based on p value, sample size, and significance level) is based on single-SNP effects and is difficult to interpret when many (nonindependent) SNPs are involved.
None of the genes in the group of heterotrimeric G proteins have been associated with cognitive ability previously. This functional gene group consists of the following genes:
G protein alpha 11 (GNA11
]), G protein alpha 12 (GNA12
]), G protein alpha 13 (GNA13
]), G protein alpha 14 (GNA14
]), G protein alpha 15 (GNA15
]), G protein, alpha inhibiting activity polypeptide 1 (GNAI1
]), G protein, alpha inhibiting activity polypeptide 2 (GNAI2
]), G protein, alpha inhibiting activity polypeptide 3 (GNAI3
]), G protein, alpha activating activity polypeptide, olfactory type (GNAL
]), G protein, alpha activating activity polypeptide O (GNAO1
]), G protein, q polypeptide (GNAQ
]), GNAS complex locus (GNAS
]), G protein, alpha transducing activity polypeptide 1 (GNAT1
]), G protein, alpha z polypeptide (GNAZ
]), G protein, beta polypeptide 1 (GNB1
]), G protein, beta polypeptide 2 (GNB2
]), G protein, beta polypeptide 3 (GNB3
]), G protein, beta polypeptide 4 (GNB4
]), G protein, beta polypeptide 5 (GNB5
]), G protein, gamma 2 (GNG2
]), G protein, gamma 3 (GNG3
]), G protein, gamma 4 (GNG4
]), G protein, gamma 5 (GNG5
]), G protein, gamma 7 (GNG7
]), G protein, gamma 10 (GNG10
]), G protein, gamma 11 (GNG11
]), G protein, gamma 12 (GNG12
Two genes in this group were not included in our analyses because there were no SNPs available in the Perlegen chip: GNAT1 and GNG3. Notably, none of the single SNPs within this network would have been detected in the context of a genome-wide SNP-by-SNP analysis, with the lowest p value being 4.9 × 10−5. In fact, only four SNPs within this functional gene group reached a p value below 10−3, and only 12.8% of SNPs had a p value below 0.05. Three of these four SNPs with a p value below 10−3 are in the GNAQ gene, and the 46 SNPs with a p value lower than 0.05 are distributed throughout 11 different genes: GNA14, GNAQ, GNG2, GNAS, GNG4, GNG11, GNB5, GNAI1, GNAL, GNAO1, GNB3, with frequencies of 13, 9, 8, 5, 3, 3, 1, 1, 1, 1, and 1, respectively. This suggests that the effect of the functional group cannot be explained by the effect of a few individual SNPs or genes but must be ascribed to the combined effect of multiple genes in the functional gene group. This is also evident in the Q-Q plot of the heterotrimeric G proteins in (upper left panel), which shows that the distribution of the observed p values for the functional group of genes encoding heterotrimeric G proteins are consistently lower than expected under the null hypothesis of uniform distribution, with no single p value standing out.
Biological Pathway Analysis Shows No Association with Cognitive Ability
Because the total collection of synaptic genes was associated with cognitive ability, we tested four biological pathways that are known to involve synaptic functioning as one of their roles in cognitive ability. These pathways involve the dopaminergic, glutamate, serotonergic, and cannabinoid pathways and follow the rationale of vertical grouping. Of these pathways, the serotonin pathway showed the largest number of genes overlapping with the synaptic genes. The dopaminergic, glutamatergic, serotonergic, and cannabinoid pathways yielded the following empirically derived p values: dopaminergic, PEMP = 0.5006; glutamatergic, PEMP = 0.3883; serotonergic, PEMP = 0.6211; cannabinoid, PEMP = 0.8309 (see also and ), suggesting that collective testing of genes in synaptically relevant biological pathways is less successful in identifying genetic variation underlying cognitive ability than collectively testing genes that are grouped according to function in a biological process (horizontal grouping).
Correction for Population Stratification
The possible effects of population stratification on our results were explored via two methods. The Q-Q plot (corrected for λ) based on all SNPs available for analysis is provided in A. Without correction for λ, a slight deviation from the uncorrected expected distribution of p values under the null hypothesis was present, which was quantified in a genomic inflation factor of λ = 1.05672. This deviation is may be due to true association or may be indicative of false positives due to population stratification. Applying the genomic control correction method, we corrected all test statistics by the genomic inflation factor. Although all synaptic genes were no longer statistically significant as a group, this did not significantly change the results for the heterotrimeric G proteins (PEMP = 0.00062).
Distribution of Association Results for All SNPs that Survived QC
Second, given that the primary sample is known to include subsamples as a result of data being collected in different sites and countries, we calculated Z scores within each site and conducted all analyses again. The Z score procedure ensures that there are no mean trait differences left across subpopulations and therefore rules out spurious associations due to the known subpopulation structure. The genomic inflation factor using the Z scores was calculated as λ = 1 (see Q-Q plot in B). Again, the results remained significant (PEMP = 0.0015) for heterotrimeric G proteins.
On the basis of these results, we conclude that possible spurious effects due to population stratification cannot account for the detected association of the heterotrimeric G proteins and cognitive ability.
Validation of Significant Functional Gene Group in Independent Population-Based Sample
The Avon Longitudinal Study of Parents and Children (ALSPAC) study served as a validation sample. ALSPAC is a UK-based, population-based, prospective birth cohort with extensive data collection on health and development of children and their parents, predominantly those of white European origin, and has been described previously.31
Ethical approval for the study was obtained from the ALSPAC Law and Ethics Committee and the local research ethics committees. Genotyping on 1543 individuals was initially performed with the Illumina HumanHap 300K BeadChip for 1568 blood-derived DNA samples. After QC, the clean data set comprised 1507 samples (excluding individuals with potential non-European ancestry, more than 5% missing genotype data, sex-inconsistent X-heterozygosity or a genome-wide heterozygosity of more than 36.4% or less than 34.3%).
Because this validation sample is based on a general population sample and does not consist of individuals ascertained on the basis of ADHD, any replicated effect would confirm that the observed association is related to cognitive ability in general and is not specific to individual differences in cognitive ability in an ADHD population. The validation sample (n = 1507) had 100% power for detection of an effect size of 3.3% against a significance level of 0.05 (one test conducted).
Cognitive ability was measured in children 8 yrs of age with the Wechsler Intelligence Scale for Children (WISC-IIIK).29
A short version of the test, consisting of alternate items only (with the exception of the coding task), was conducted by trained psychologists.32
Verbal (information, similarities, arithmetic, vocabulary, comprehension) and performance (picture completion, coding, picture arrangement, block design, object assembly) subscales were administered, the subtests scaled and scores for total IQ derived.
Of the 27 genes in the heterotrimeric G protein relay group, four genes were not covered in the validation sample. The validation sample included 265 SNPs mapped to 23 genes (versus 359 SNPs mapped to 25 genes in the original sample) in the G protein group. The difference in available SNPs was due to the difference in platforms used in the original and validation samples. The two genes (GNB2 and GNG11) that were present in the original sample but were not covered in the validation sample include the GNG11 gene, which was one of the most significant genes of the group, with three SNPs showing a p value < 0.05 (rs4262 = 0.009793; rs180236 = 0.01612; rs180241 = 0.02378). shows the Q-Q plot of all tested SNPs in the validation sample and suggests that all or most SNPs are important and that no single SNP drives the observed association of the G protein group.
Q-Q Plot of All SNPs in the Heterotrimeric G Protein Relay Group in the Validation Sample
Of the 46 SNPs in the G protein group that had a p value < 0.05 in the original data set, 27 SNPs had a proxy SNP with an r2 > 0.8, of which seven SNPs are identical between the two data sets and another seven SNPs have an r2 of 1. For reasons of comparison, gene coverage was determined on the basis of LD structure and genomic density and was based on the HapMap CEU LD structure. It was calculated by the sum of the typed SNPs as well as the tagged SNPs divided by the total known common SNPs within a gene (see ).
Gene Coverage Rates of the 27 Genes in the Heterotrimeric G Protein Group in the Initial and Validation Samples
In line with the recommendation by Holmans et al.,16
the validation study focused on the functional gene group rather than on a direct SNP-by-SNP replication. This may result in a conservative p value in the validation sample, given that the SNPs that showed most evidence for association in the initial sample are not directly included in the validation study. However, the main goal is to validate the association with the functional gene group, not with single SNPs. This is also in line with the assumption advocated here and in Holmans et al.,16
that it is more powerful to treat the functional gene group as the unit of analysis and not the single SNPs.
Analyses were conducted in PLINK28
similarly to the method applied in the original sample, and 10,000 permutations were used to determine the empirical p value of the combined effect of all included SNPs. The Σ-log10
(P) of the heterotrimeric G proteins was 136, with an empirical p value of 0.047, validating our findings in an independent cohort.