With this background, we aimed to determine to what extent data regarding genetic associations with the schizophrenia phenotype might be enriched by examining correlations of the same genetic targets with cognitive phenotypes. This approach may be considered a triangulation of genes at the intersection of schizophrenia and cognitive impairment. In order to gather data relevant to this we elaborated on the Phenowiki database architecture and created a new knowledgebase and web service (UCLA CogGene; see www.CogGene.org
) specifically to represent genetic association findings for cognitive phenotypes. We aimed to have an interface similar in some ways to those available in SZGene and ALZGene, which provide forest plots of effect sizes, but our data differ insofar as the phenotypes to be represented are quantitative trait scores (rather than categorical diagnoses). Further, due to the richness of the cognitive phenotype data (which includes both test names and then specific measurement variables or indicators
within each test), we created features in CogGene to enable dynamic sorting and computation of weighted effect size statistics over groups of results that can be selected simply by clicking on the effect labels.
The data discussed in this paper and the CogGene system are now viewable at www.CogGene.org
, and information is posted on the site regarding how to submit additional contributions. The findings described here were culled from publications identified through literature mining if they cited the names of at least one gene and related polymorphism, and at least one cognitive test (the names are from a lexicon developed in the Consortium for Neuropsychiatric Phenomics at UCLA; see www.phenomics.ucla.edu
). From these publications we selected those with usable data (i.e., with data specifying at least one statistical association between a specific SNP and a specific cognitive test indicator), and extracted quantitative effect sizes for associations between SNP’s and cognitive test indicators. We highlight that these data were selected to represent results from healthy samples, in order to maximize the independence of findings from those in SZGene (and thus enabling us to inspect possible overlaps in the “top hits” free of the potential confounds between schizophrenia and cognitive impairment phenotypes). The results can now be browsed, sorted and reanalyzed using custom-designed software tools that permit visualization and execution of “meta-analysis” (sample size weighted averaging) over selected effects under user control. In brief, the CogGene system permits visualization of the effect sizes for specific allelic variants on the cognitive trait scores (expressed as Cohen’s d
statistic, which is the standardized difference between group means), and the 95% confidence intervals around these difference scores, in an interactive Forest plot. This is similar to the representation of genetic association data in other widely used resources (SzGene, AlzGene) which use similar (but static) Forest plots to show allelic associations with case-control differences (but in these examples, the effect sizes are expressed as odds ratios rather than group differences). A typical screen-shot of CogGene is shown in .
Screen-shot illustrating features of the CogGene web service
The SZGene (SchizophreniaGene) database contains information from 1727 studies, reporting data on 1008 genes, and 8,788 polymorphisms; this database has 287 meta-analyses (see www.szgene.org
; accessed 5/31/2011)[32
]. SZGene ranks its “Top Results” using the HuGENet interim guidelines published by Ioannidis and colleagues[33
], which consider the amount
of evidence (i.e., Grade A is given to studies where the total number of minor alleles exceeds 1,000); consistency
of evidence (i.e., Grade A is given only when inconsistency is modest, for example I2
< 25); and bias
(with Grade A given when there is probably no bias).
Inspecting initial entries into the CogGene database, we note that the quality of genetic association studies for cognitive phenotypes so far is relatively low. For example, none of the studies meets the criteria to be considered Grade “A” following HuGENet criteria for amount of evidence, and only 3 studies would receive a Grade of “B” (i.e., with minor allele frequencies greater than 100; the rest would all be considered Grade “C”). Analysis of existing studies is further complicated by the lack of uniformity in phenotype definition, rendering replication of results difficult to determine because few studies use exactly the same indicators. Finally, the degree of bias in the cognitive studies may be considered relatively high, given the paucity of large effect sizes.
Examining the 45 “Top Results” of SZGene, we find that 10 of the same genes are listed in CogGene. Among these 10 genes, we find that only evidence supporting association for two of these genes (APOE, HTR2A) is considered Grade “A” in SZGene. APOE (e2/3/4; contrasting 4 versus 3 allele) is significantly associated with schizophrenia among Caucasians; and HTR2A (rs6311; contrasting A versus G allele) is associated with schizophrenia, also selectively in Caucasian samples.
provides a graphical summary of the “Top Results” from CogGene, considering only those individual SNP effects that had 95% confidence intervals not including zero difference between allelic variants. Among these, only the APOE genotype overlaps with those identified as a Top Result in the SZGene database, and as shows, the average effect size for APOE is small (d
= .069, 95% confidence interval = .014 to .124). It should be recognized that this effect for APOE genotype is small in part because it is averaging together effects on different cognitive indicators. We have for APOE two studies with the same cognitive indicator (Buschke Selective Reminding Test, Long Term Recall), and the same contrast among alleles; these two studies [34
] have overlapping samples so probably the larger of the two studies should be relied on by itself. On a positive note, this single study [35
] showed a medium effect (i.e., comparing the e2/2 and e2/3 to e3/3 had d
= .29 and comparing e2/2 and e2/3 to e4 allele carries had d
= .39), with total sample size of 912 and minor allele frequency of 76. On the other hand, this more detailed inspection of the findings indicates that this result stands as an isolated finding without replication.
Among the Top results in CogGene, none of the effects so far would be considered significant at conventional genome-wide levels, at least in part because the sample sizes are so low. For example, only the APOE and DTNB1 findings are supported by a study with a sample size exceeding 500 cases, and the largest effect (CACNA1C) is supported by a study of only 80 people, only 10 of whom possessed the minor allele at the investigated locus (rs1006737). These results are consistent with those reported by Sabb and colleagues [30
], where as noted above, all effect sizes were in the range of d
= .09 to d
= .23, with the single exception (d
= .44) being a study for which total sample size was only 201. This highlights the possibility of publication bias and so far small sample studies, which poses a major challenge to cognitive genomics, and the likelihood that many of the reported associations will turn out to be false positive results. Currently, the literature remains rife with findings that center on selected “candidate” genes that have been investigated at least in part due to inertia from earlier positive reports (for example, the study of APOE genotype in schizophrenia reflects more the “smoke” from positive findings in Alzheimer’s disease than the likely “fire” in schizophrenia). This bias may soon be overcome as more GWAS results and then genome sequencing findings are disseminated. At that point the biggest priorities will be to obtain unbiased sampling of the “cognitive phenome,” else we will run similar risk of biases from studying the wrong candidate phenotypes that we currently face in studying false positive candidate genotypes [36
]. This will be an interesting challenge for future investigations, which will need to balance consistency and standardization of phenotyping that are critical for replication, with sufficiently broad sampling to help reduce phenotyping bias. We hope that further development of CogGene will help aggregate findings across investigations, increase our understanding of where relevant signals may lie, and shed light on the design of future studies and collaborative research programs. The most recent findings regarding the genetics of schizophrenia and cognitive impairment phenotypes suggest we are likely to face a deluge of associations with very small effects, and a smaller number of rare variants possibly with larger effects, along with likely complex gene-gene and gene-environment interactions. These observations make the availability of a collaborative knowledge-building tool like CogGene particularly valuable, because sifting through the findings, and aggregation of results across diverse studies, may ultimately be more important than results from any single study. By structuring knowledge in CogGene we hope also to facilitate links to other knowledgebases (such as the Entrez systems supported by the National Library of Medicine) to promote biological discovery and better constrain our models of the causal paths that connect the human genome to complex disorders of brain and behavior.
A related challenge pertains to developing standards for cognitive phenotyping and refinement of ontologies that can help formalize knowledge within this scientific domain. Sabb and colleagues showed how fickle investigators can be, introducing new concept labels despite lack of change in the actual measurement methods [14
]. We have suggested frameworks for developing cognitive ontologies elsewhere [13
], and the Cognitive Atlas project (www.CognitiveAtlas.org
) is dedicated specifically to development of a consensus ontology about cognitive concepts and their measurement. This work will be essential to help determine which specific findings can be meaningfully averaged in meta-analytic studies that will ultimately help us identify and understand what are likely to be myriad small signals relating cognitive phenotypes to the genome.
Finally, the development of tools like CogGene can help represent quantitative trait data for genetic associations and thus offer a means for collaboration, storage, and reuse of knowledge that is important to the dimensional representation of phenotypes. This is compatible with the National Institute of Mental Health Strategic Plan, and specifically of potential value to the new Research Domain Criteria (RDoC) initiative [39
], which aims to support research on phenotypic dimensions that may be more informative than traditional diagnostic phenotypes.