|Home | About | Journals | Submit | Contact Us | Français|
Bipolar Disorder (BD) is a complex genetic trait that is known to be highly heritable. Up to 93% of the risk for type-1 BD has been attributed to heritable factors according to one large multi-centered US twin study (Kieseppa et al. 2004). These and many other factors have fueled an intense search for the genes that predispose for BD. Aberrant gene expression patterns – some involving neurotransmission pathways - have been well documented to exist in tissues from BD patients (Altar et al. 2009; Le-Niculescu et al. 2009). However, exactly which gene variants underlie the heritability of this disorder remains elusive.
Ever since the search for BD genes began in earnest some 20 years ago, there have been different schools of thought about the best approach for identifying genetic variants. At first, most investigators favored family-based linkage studies which employed chromosome recombination statistics (LOD scores). The concern that recombination resolution was not sufficient led other investigators to initiate genetic studies of unrelated populations (case-control association studies). Their view was that bipolar illness should be linked to common alleles because of its widespread and uniform dispersion at 1–3% of the world wide population (Weissman et al. 1996). Others developed a hybrid approach called family-based association tests (FBATs), where trios or quads of family members are studied. The HapMap project (Thorisson et al. 2005; Zheng and McPeek 2007) added more and more single nucleotide polymorphisms (SNPs) from many human populations (Kelsoe 2004). Of particular relevance were the haplotype blocks and tagged SNPs that demarcate them (e.g., tagged SNPs: tSNPs) (Haiman and Stram 2008; Nannya et al. 2007). Mapping tSNPs across virtually the entire human genome gave rise to a whole new genetic approach: the genome wide scanning approach. Overlaid on this were mathematical advances in the statistics for large SNP datasets to “impute” genotype data, and to assess gene-gene and gene-environment interactions (Askland et al. 2009; Baum et al. 2008b; Johnson et al. 2009; McMahon et al. 2010; Moskvina et al. 2009; Purcell et al. 2009).
Currently, the most favored approach is the Genome Wide Association (GWA) study - at least until technology advances and it becomes practical and affordable to perform whole genome sequencing on large cohorts of subjects. The appeal of GWA studies is their atheoretical nature combined with ability to scan large numbers of SNPs across the genome. To date, there have been 12 independently published GWA studies on the etiology of BD (Baum et al. 2008a; Ferreira et al. 2008; Hattori et al. 2009; Lee et al. 2010; McMahon et al. 2010; O'Donovan et al. 2008; Purcell et al. 2009; Scott et al. 2009; Sklar et al. 2008; Smith et al. 2009; Sullivan et al. 2009; Wellcome-Trust-Consortium 2007), plus a GWA study solely on the pharmacogenomics of BD (Perlis et al. 2009). Traditional candidate association studies and linkage studies of BD have also continued to appear at a rapid rate. Many genetic variants have therefore been studied by all three major genetic approaches: linkage studies, candidate case-control association studies, and GWA studies.
The hope of the first wave of GWA studies had naturally been to unify the field around specific gene findings. Instead, the first GWA studies have yielded only marginally significant findings with little consensus. Some signs of consensus may be emerging now that the sample size is approaching over 10,000 – as discussed later in this article – but the findings appear to explain only a small fraction of the overall disease liability. To be fair, the disappointment in GWA studies is not limited to psychiatric phenotypes, but characterizes most complex genetic studies to date (Galvan et al. 2010; Niculescu and Le-Niculescu 2010a, b). But, the daunting news for BD comes from recalculating the statistical power to move forward. As one recent reviewer put it, “If real gene associations do exist, they must be so weak as to require sample sizes of 100,000 cases and 100,000 controls to detect them” (Moonesinghe et al. 2008). In hindsight, the problem may have been in placing too much expectation on the GWA technology when it first was developed. Rather than being an end point in themselves, we believe GWA studies should compliment other approaches. “Other” approaches that we refer to in this article include candidate gene association and linkage studies as well as studies focused on epigenetic, neurobiological, and environmental aspects of the disease. Herein, we describe a centralized database containing a variety of genetic study designs of BD.
The main motivation behind our database lies in the hope that if weak gene findings can be viewed together, they might guide future studies. Suppose, for instance, one holds to a theory about gene X in BD. At present, it would be extremely difficult without an organized database to compare all the previous positive findings of gene X with all the previous negative findings, and then design the right kind of study to move forward with. Likewise, there is difficulty for genetic researchers when ranking their new findings in context with all other contender gene findings for BD. These types of questions have been very difficult to answer even for those in the field, not to mention for researchers in other disciplines who may wish to use the accumulated findings in other ways.
The traditional approach for unifying diverse findings has of course been meta-analysis, a statistical strategy for pooling data from independent studies. To date, two meta-analyses have been published on GWA studies of BD (Liu et al. 2010; McMahon et al. 2010). However, meta-analysis cannot capture the entire literature on BD genes because it presumes that the data is of similar type and quality across studies. Important case-control candidate studies – not to mention linkage studies and FBAT studies – cannot be pooled with GWA data by meta-analysis. To deal with diverse genetic approaches and findings, our database displays semi-quantitative data (p values etc.) side-by-side with qualitative issues of study design, numbers of subjects, numbers of SNPs, diagnosis subtypes etc. Although we do not attempt an overall analysis in this article, the database has been designed with an eye towards future meta and mega analytical strategies similar to those described by Le-Niculescu et al (Le-Niculescu et al. 2009). We believe our database will eventually support new analytical methods that may yet be developed for complex datasets.
The database can be found online at http://bioprogramming.bsd.uchicago.edu/BDStudies/. Users can now compare the relative merits of many types of gene findings in both semi-quantitative and qualitative manners on a gene-by-gene and study-by study basis. Most importantly, the database looks forward to facilitating efficient meta and mega analyses. Herein, we describe the database as well as provide two lists of “top” candidate genes for BD to date. We welcome future collaborators who may wish to analyze the database further.
To obtain primary information, PubMed (www.pubmed.gov) was searched repeatedly with two key words: “bipolar gene”. Extensive searches were conducted with these key words from May 1, 2006 – December 31, 2009. By the end date, 3,388 distinct citations had been looked at - at least their titles, abstracts and key words. We required abstracts to mention specific numbers of BD patients before the full articles were obtained and read. Two further cut-off criteria were then applied to each paper: (a) the papers had to contain both genotypic and phenotypic information, and (b) the data had to be genic (naming genes rather than just loci). By the first criteria, a large number of papers were excluded because they provided only gene expression data, with no genotypic data. By the second criteria, even more articles were excluded because they were linkage studies that failed to mention specific genes. To confirm and extend the list, another online search was then conducted using “manic depressive gene” as key words. This second PubMed search yielded 594 citations. This list underwent the same selection criteria as above. Most articles in the second search had already been found in the first search. Only 10 new papers met criteria. A master list of 574 articles was finally compiled as the basis for establishing the database.
The 574 articles found by the PubMed searches were then subjected to specific rules governing data lifting. While these rules were our own creation, they were devised in order to make the database as bias-free as possible. The following a priori rules of data selection were thus employed for data lifting:
Two additional rules:
The database was constructed as a Microsoft Excel spreadsheet. It was then uploaded for management as a MySQL database. A web-based user interface was implemented to enable readers to search the database based on gene names and aliases. Some of the gene names may be different than in their original articles because they were standardized to HUGO gene names (http://www.genenames.org/) for the database. To be accommodating, we have provided an option to query aliases of the gene names.
Readers of the database should find the Synopsis Statement column informative. Information provided under this column typically begins with a direct quotation from the original article. This is meant to encapsulate the interpretation originally put on the finding. After this quotation, there will typically be a series of qualifying statements such as alternative p values – especially if those alternative p values tend to expose alternative conclusions. Caution is also revealed when the authors present “corrected” p values (i.e. for multiple measures) rather than uncorrected p values. The exact meaning behind each p value is also provided unless the usual two-tailed, uncorrected, lowest p value is listed. Under the Synopsis Statement one may also learn when “borrowing” subjects occurred. This happened in studies that pooled previously published samples with new ones. It is important to keep track of truly independent samples (new samples/ new data) for future meta and mega analyses. In short, the Synopsis Statements provide not only the original author’s statements but other critical information for understanding the finding.
This Results section has purposely been limited to the broad parameters of the database itself. Those seeking specific genes - for instance lists of “top genes” – are referred to the Discussion section. The reason we have not listed specific genes under the Results section is it would be presumptuous and inappropriate to claim any truly original findings based on a database of other people’s studies. Furthermore, the database is only a starting point for researchers to organize their thoughts. It is not capable of providing definitive statements about the genes that actually underlie BD, short of future sophisticated statistical analysis.
On its inception date (January 29, 2010), the database held 893 different genes (Fig. 1). These came from 574 unrelated research articles that had been found in PubMed. In actuality, the articles encompassed 638 different independent studies because many were multiplex articles (see Methods for definition of multiplex articles). Simply counting the studies in the database revealed that 32.6 % of the first findings had been called “positive” by the authors. Most, however, were later realized to have had the “winners curse” (Niculescu and Le-Niculescu 2010b). That is, they failed to be replicated subsequently. Most of the other findings were called “negatives”, except for a few situations where findings were equivocal. This process of simply counting the positive and negative studies, however, does not in any way provide a definitive picture of the genes for BD. This is because scant few studies were performed gene-wide, that is interrogating all the haplotypic blocks in a single gene. Therefore, it remains for the reader to weigh each polymorphic finding on a study-by-study basis, rather than simply counting positive and negative studies.
The most common type of study found was the candidate gene association type. There were 587 candidate gene association studies on launching the database. From these, 544 different genes were gleaned for the database (Fig. 1). By comparison, only 12 GWA studies were enough to contribute 262 unique genes to the database (Fig. 1). The database also contained findings from 22 genic linkage studies (linkage studies focused on specific genes). Besides these, the database contained 4 rare allele studies (4 unique gene entries) and one twin study focused on a single gene (Kakiuchi et al. 2003). More than 70 different gene entries in the database were lifted from a mixture of different types of genetic studies in one or more article.
It was noted that 55.5 % of the genes in the database had been repeatedly interrogated by different researchers using different populations (i.e. replication studies). The remaining 44.5% of the genes had been studied just once in connection with BD. The more frequently studied genes were those thought to involve antidepressants. The three most studied genes in the database were: (a) the serotonin transporter gene (SLC6A4 gene) with 52 different genetic studies of BD etiology, of which 59.6% of the studies have yielded negative findings; (b) the Brain Derived Neurotrophic Factor gene (BDNF gene) with 35 different studies of BD etiology, of which 46% of the studies have yielded negative findings, and (c) the Catechol-O-Methyl Transferase gene (COMT gene) with 29 different studies of BD etiology, of which 59% of the studies have yielded negative findings.
Efforts to unravel the genetic architecture of BD have been ongoing for at least two decades. Because most reports have been subsequently unreplicated (winner’s curse), there is now a general assumption that if “real” positives do exist in the literature they almost surely have emerged amidst a sea of false positives too under-powered to be definitive (Hakon 2009). That being the said, we will highlight only the most significant and repeated findings from our database. Two lists of “top” genes are shown in Tables 1 and and2.2. The first list comes exclusively from candidate gene association literature (Table 1). The second list comes exclusively from GWA study literature (Table 2). Specific details are provided in other sections of this Discussion section. Suffice to say, for now, that there is no overlap between the top findings in these two lists (Tables 1 and and2).2). The reader is cautioned, however, that the p values in these tables are uncorrected for multiple measures and that only allelic-type p values are listed. We will discuss the individual genes in these lists under separate sections below and attempt to provide some rationale for the non-overlap of the two lists.
We wish to emphasize certain qualities of our database which tends to set it apart from other databases: (1) peer-reviewed summary-level data only, (2) inclusion of negative as well as positive findings, and (3) a comprehensive nature that exceeds most previous databases. The Schizophrenia Forum has a similar database for schizophrenia genes (MacDonald and Schulz 2009), but it barely touches on BD. The Psychiatric GWA Study Consortium also shares some genotype and phenotype data online (Cichon et al. 2009; O'Dushlaine et al. 2010), but it does not attempt to chronicle candidate gene association studies. A large database maintained jointly by the NIH and the US Center of Disease Control and Prevention (Lin et al. 2006) also provides additional summary-level findings, but not GWA findings. There are also online databases from independent laboratories covering a variety of psychiatric disorders that include BD (Johnson and O'Donnell 2009; Konneker et al. 2008; Lin et al. 2006). However, these databases contain mostly candidate gene association findings. The Sullivan Lab Evidence Project (SLEP) (Konneker et al. 2008) provides one of the best online databases of summary level gene findings. Although the SLEP offers a lot (linkage data, copy number variant (CNV) data, expression microarray data, meta-analysis data, and genome-wide data), it lacks candidate gene association data. It also does not include negative findings. Two other online databases are the Open Access Database of Genome Wide Association Results (Johnson and O'Donnell 2009) and a website of the National Human Genome Research Institute (http://www.genome.gov/gwastudies). The former offers raw data from 118 diverse GWA studies on different diseases (according to data sharing agreements), and the latter database offers summary-level findings from 559 diverse GWA studies on different diseases (both are growing as of the writing of this paper). Both, however, lack candidate gene association findings. All the URLs for these sites are given as links on our webpage. Our database therefore finds its niche amongst these other sites as the only database containing summary level findings of both negative and positive nature, from a comprehensive collection of diverse types of studies.
Since the entire genome is theoretically present on one platform in any large-scale GWA study, some have argued that GWA findings should be valued above all other findings (Pearson and Manolio 2008). We agree that the GWA researcher has many reasons for pride. But, a few limitations do exist. These limitations mostly pertain to the problems inherent in handling vast amounts of data. Several thousands of subjects are necessary in GWA studies to even approach sufficiency of statistical power. And, as mentioned in the Introduction section, the recently recalculated statistical power for these studies may actually require hundreds of thousands of subjects (Hakon 2009). GWA studies are also restricted to the common alleles, meaning only those with minor allele frequencies >1% (debatably >5%). More problematic, some frequently studied variants have not been included in any of the GWA platforms. Until the age of full genome sequencing arrives, we therefore caution against assuming that GWA findings automatically supersede all other findings made by other research designs.
We would also emphasize the few advantages of the humble candidate gene association design. Its advantages are as follows: (1) Candidate gene studies hold inherent “face validity” because candidate genes are chosen precisely because of their known biological and/or pharmacological underpinnings. (2) Candidate gene studies stand to out-perform GWA studies when it comes to studying isolated populations or pedigrees. This is especially true if multiple rare mutations are involved (Walsh et al. 2008). This is because rare alleles fall entirely outside the boundaries of detection by GWA studies. (3) Candidate gene studies usually possess credible statistical power because of limited numbers of genotypes per study compared to GWA studies. This is not to say that candidate gene studies are perfect - or even optimal. Until recently, candidate gene studies fell well short of being able to interrogate even the known haplotype blocks of a single targeted gene. Many candidate gene studies were restricted to “alleles of interest”. This generally meant alleles encoding amino acid changes. While this seemed OK at the time, our awareness of regulatory elements in DNA has put to rest the notion that non-protein-coding variants should be waived as of no value. The problem of incomplete gene coverage in almost all early candidate gene association studies must be factored in.
Having a correct gene model also must be factored in. In the case of BD, the model of inheritance clearly is not classical (i.e., standard Mendelian single gene dominant or recessive). But, beyond this broad statement, little is certain. Most recent investigations have approached the study of BD genes from a “common alleles/ common disease” model, arguing that this model best explains the prevalence of the illness (Hemminki et al. 2008; Iyengar and Elston 2007; Tesli et al. 2009a). In fact, the common alleles/ common disease model lies at the heart of the GWA studies. Other investigators, however, have approached the puzzle of BD by looking for rare alleles (Walsh et al. 2008) and their often-related copy number variants (Saus et al. 2010). Still others have looked for interactions between allelic variation and environmental factors (Cornelis et al. 2010; Kieseppa et al. 2004; Liu and Piletz 2006). Others have focused on epigenetic variation (Crow 2007). There are also models of inheritance involving biochemical pathways (O'Dushlaine et al. 2010; Zhang et al. 2010). These models consider pleiotropy whereby genetic aberrations may cause a multiplicity of phenotypes through biochemical pathways. Depending on the starting model that one chooses, entirely different types of studies will emerge.
Candidate gene association studies of BD began in the early 1990’s. At that time, genotyping was confined to microsatellites, restriction fragment length polymorphisms (RFLPs) and/or copy number variants detected on gels. Today, almost all association studies use automated genotyping platforms. The development of the HapMap Project (Kelsoe 2004) made it possible to interrogate virtually all known heritable blocks (i.e., haplotypes) in each gene (Farrall and Morris 2005). There have been several good review articles written about SNPs in linkage disequilibrium and tagged SNPs (tSNPs) in genetic studies of BD (Badner and Gershon 2002; McInnis et al. 2003; McQueen et al. 2005; Segurado et al. 2003; Serretti and Mandelli 2008). There are also reproducible “hot” chromosome regions from these linkage studies, and all of these regions have generated candidate genes. Credible candidate gene association studies now required to enlist at least 200 cases and 200 controls for publication (being averaged-sized studies in recent years according to the database).
The top gene findings from candidate studies are listed in Table 1. To be selected for this list, a gene – not necessarily a particular SNP – had to have been studied at least 5 times in independent populations and come through with at least two more positive than negative studies. For simplicity sake, only the most robust studies are being shown for each gene in Table 1. It should also be realized that none of the genes listed in Table 1 can claim absolute consensus (i.e. without at least one negative report in their portfolios). There were invariably failures to replicate somewhere in the literature. Keep in mind also that the p values listed in Table 1 are not bad when compared to those of the top findings from the GWA literature (Table 2) if one adjusts for starting statistical powers. In alphabetical order, the first gene in Table 1 is the BDNF gene. Shown is data for rs6265 of the Brain Derived Neurotrophic Factor (BDNF) gene on chromosome 11p13. It is followed by the CD36 gene. Shown is data for rs2637777 of the Thrombospondin receptor gene (called CD36 gene) on chromosome location 7q11.2. Next is the DAOA gene. Shown is data for rs1935062 of the D-Amino acid Oxidase Activator (DAOA) gene on chromosome 13q34. Next is the DISC1 gene. Shown is data for rs821616 and rs1411771 of the DIsrupted SChizophrenia 1 (DISC1) gene on chromosome 1q42.1. Next is the GRIN1 gene. Shown is data for rs35655437 of the Glutamate Receptor Ionotropic NMDA 1 (GRIN1) gene on chromosome 9q34.3. Next is the NDUFV2 gene. Shown is data for rs1156044 of the NADH dehydrogenase Ubiquinone FlaVoprotein 2 (NDUFV2) gene on chromosome 18p11.31-p11.2. Next is the TPH2 gene. Shown is data for rs4131348 of the Tryptophan Hydroxylase 2 (TPH2) gene on chromosome 12q21.1. Finally, there is the TRPM2 gene. Shown is data for rs1618355 of the Transient Receptor Potential cation channel subfamily M2 (TRPM2) gene on chromosome 21q22.3. Again, each of these genes had been studied at least 5 times in independent populations, and yielded at least two more positive than negative associations with BD as of writing this report.
The top 30 gene findings from the first wave of GWA studies are listed in Table 2. This list was compiled by ranking p values from individual studies. One of the first things to be noticed is that the list in Table 2 seems at first glance to be completely unrelated to the list of Table 1. The reason for this is unknown. Some previous reviewers have dug deeply into the published GWA findings to here and there find some weak significance for the genes in Table 1 also coming forth in some GWA studies (Wellcome-Trust-Consortium 2007). However, the evidence is weak. We might speculate that because the designs of the studies in Tables 1 and and22 are vastly different (targeted association studies versus GWA studies), this may explain the results. For instance, the genes in Table 1 may simply pertain only to the inheritance of isolated BD populations, while the genes in Table 2 may pertain only to the inheritance of large general BD populations. However, the point we raise next is that some of the findings in Table 2 are beginning to achieve consensus from replication findings. Our database may be particularly well-suited for exposing the replicability of these findings.
In chronological order, the first gene to note in Table 2 is the Diacylglycerol Kinase eta gene (DGKH). Its association with BD was considered the main finding of the first GWA study (Baum et al. 2008a). These scientists found a triad of SNPs in DGKH to be statistically associated with BD in both their “parent” and “replication” studies. The grand finale analysis in that report yielded p =1.5 × 10−8 for the association with BD (Baum et al. 2008a). This is still considered a strong finding, particularly because DGKH is known to encode a protein in the lithium-sensitive phosphatidyl inositol pathway. Two targeted gene association studies have also replicated the DGKH finding (Ollila et al. 2009; Squassina et al. 2009). But, another two targeted gene association studies have failed to replicate it (Tesli et al. 2009b; Yosifova et al. 2009). The jury is still out on DGKH.
The second major gene reported by Baum et al (2008a) was the SORCS2 gene on chromosome 4p16.1. Finding a “hit” at chromosome 4p16.1 was of interest because this chromosome region had previously been associated with BD in a large consensus linkage study (Serretti and Mandelli 2008). The SORCS2 finding was also anchored by a triad of SNPs in both the parent study and the replication study of Baum and coworkers (2008a). Their grand finale analysis of the top SORCS2 SNP yielded significance at p = 1.4 × 10−5 for association with BD (Baum et al. 2008a). The SORCS2 finding has also enjoyed multiple replications in the literature (Johnson et al. 2009; Ollila et al. 2009; Smith et al. 2009). It was replicated first by a convergence genome analysis using SNPs with p values < 0.05 within genes overlapping multiple bipolar samples (Monte Carlo simulation at P < 0.00001)(Johnson et al. 2009). It was later replicated by a gene-centric analysis (Baum et al. 2008a) of the Wellcome Trust Case Control Consortium (WTCCC (Wellcome-Trust-Consortium 2007)), and of Sklar et al (Sklar et al. 2008), and of a case-control comparison of substance abusers who may share genes predisposing for BD (Johnson et al. 2009). The SORCS2 finding was also replicated in a European GWA study of BD (Smith et al. 2009) and in a targeted gene association study of Finnish people by Ollila et al (Ollila et al. 2009). In fact, we could find only one study that has mentioned failing to associate the SORCS2 gene with BD (Yosifova et al. 2009) (though not mentioning it is worrisome in some studies). The SORCS2 gene is also of interest because it encodes a vacuolar protein sorting 10 (VPS10) protein of a domain-containing receptor and it seems to be strongly expressed in the central nervous system (CNS).
The second GWA study of BD was published by the WTCCC (Wellcome-Trust-Consortium 2007) (Table 2). The WTCCC studied BD in the context of six other genetic disorders in people from 12 British geographical regions. In total, the WTCCC genotyped 16,179 individuals: roughly 2,000 cases per disease (n= 1,868 BD) inclusive of 2,938 shared controls. Their strongest overall signal for BD was at SNP rs420259 under a recessive model, at chromosome position 16p12, which yielded a genotype-wise p value of 6.29 ×10−8. This SNP lies close to the PALB2 gene (Partner And Localizer of BRCA2) which is involved in stability of chromatin and the nuclear matrix. The PALB2 findings has support from one replication study (Yosifova et al. 2009) but not from another (Ollila et al. 2009).
The third GWA study of BD was reported by Sklar and colleagues (Sklar et al. 2008) (Table 2). This was a multiplex article with a parent GWA study (1,461 BD1 cases and 2,008 controls) followed by two replication studies of fewer subjects and pre-selected SNPs. Their first replication study was a FBAT and used NIMH-derived samples (n= 409 trios). Their second replication study was a case-control association design and used samples from the University of Edinburgh (UE: n= 365 cases, 351 controls). These investigators sought concordant findings across all three studies. This led them to highlight genes involving calcium. There was the DFNB31 gene at 9q32-q34 (Deafness, autosomal recessive gene) and the CACNA1C gene at 12p13.3 (Calcium channel, voltage dependent, L-type, alpha 1C subunit). Our database notes that significant associations with BD for the DFNB31 gene were actually observed earlier in the GWA study of Baum and coworkers (2008a) and also in the WTCCC study (Wellcome-Trust-Consortium 2007). The DFNB31 gene encodes a calcium/calmodulin-dependent serine kinase-interacting protein involved in the formation of scaffolds that facilitates synaptic transmission in the CNS (Yap et al. 2003). The CACNA1C gene is discussed in more detail below.
Ferreira and colleagues have also reported (Ferreira et al. 2008) a GWA study of BD (Table 2 line 4). Their article amassed three large collections of cases and controls, totaling 4,387 BD cases and 6,209 healthy controls. Their top finding was with the gene for Ankyrin-3, ANK3. The top SNP in this gene, rs10994336, yielded a strong association statistic of p = 9.1 × 10−9. This was supported by many nearby SNPs in ANK3 (including two SNPs in the statistical range of p <5 × 10−8). It was also supported by haplotype analysis. The database shows support for the ANK3 gene has also come from many outside studies. Seven independent GWA studies have mentioned some degree of positive findings for ANK3, though at different SNPs with BD (Lee et al. 2010; Purcell et al. 2009; Schulze et al. 2009). The Ankyrin-3 protein is also called Ankyrin-G. The protein regulates inactivation gating of the neuronal sodium channel, Nav1.6 (Shirahata et al. 2006). Therefore, the ANK3 gene seems to be achieving good consensus across multiple studies of BD.
The report by Ferreira and colleagues (2008) delved deeply into the CACNA1C gene which was found earlier by Sklar and colleagues (Sklar et al. 2008). They found SNP rs1006737 in the CACNA1C gene to achieve p = 7.0 × 10−8 in association with BD (Ferreira et al. 2008). Not long afterwards, this same SNP was reported (Green et al. 2009) to also confer risk for recurrent major depressive disorder (MDD) as well as for schizophrenia. Thus, SNP rs1006737 has raised attention in the psychiatric research community and may cross diagnostic boundaries with allelic odds ratios similar to those found for BD (~1.15) (Ferreira et al. 2008). It should also be noted that CACNA1C aligns with biochemical studies pointing to a primary defect in BD involving ionic dysregulation (Dubovsky et al. 1989; Dubovsky et al. 1991). For the record, the CACNA1C SNP rs1006737 associated with BD is not one of the better known channelopathy SNPs of the CACNA1C gene. Nonetheless, that SNP rs1006737 lies within the same gene as those often lethal SNPs, has not gone unnoticed.
Another recent GWA study was reported by McMahon et al (2010) (Table 2 line 9). These investigators reported a parent GWA study (n= 645 BD1 cases and n= 1,310 healthy controls) followed by a replication pooled genome wide analysis from the WTCCC (Wellcome-Trust-Consortium 2007), the NIMH (Smith et al. 2009), the Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) (Sklar et al. 2008), and some MDD cases from the Genetic Association Information Network MDD (GAIN-MDD) (Sullivan et al. 2009). These two studies were followed by three more replication studies focused on five targeted SNPs in two genes chosen from the parent study. Replication study-1 used a subset of BD cases and controls from a European ancestry GlaxoSK sample (Scott et al. 2009). Replication study-2 used a “Lausanne GlaxoSK” MDD sample (n= 433 MDD; n= 916 healthy controls). Replication study-3 used a “Munich GlaxoSK” MDD sample (n= 866 MDD; n= 926 healthy controls). McMahon and colleagues finished with a grand finale meta-analysis (excluding the MDD cases). The top finding was for SNP rs2251219 in the PBRM1 gene found on chromosome 3p21.1. This SNP was modestly associated with BD in the parent study (p = 2.0 × 10−3) but also in most of the replication studies (p values from 10−4 - 10−2), excepting three studies: the NIMH sample where the p value was 0.231, the targeted replication study-2 where the p value was 0.322, and the targeted replication study-3 where the p value was 0.123. Their grand finale meta-analysis included 6,683 BD cases (mixed BD1 and BD2 cases) and 9,068 healthy controls. Overall, their top SNP of PBRM1 achieved p = 1.1 × 10−8. PBRM1 encodes a protein involved in kinetochore localization during mitosis and may be a tumor suppressor found to be mutated in breast cancer. The WTCCC had earlier also reported a significant signal for PBRM1 at rs2878628 (p = 0.0005).
Finally, we would be remiss not to mention a series of GWA studies that began with schizophrenic patients by reporting associations for rs1344706 within the ZNF804A gene (Zinc-Finger protein 804A). The evidence for ZNF804A has been strongest when the schizophrenia phenotype is broadened to include bipolar disorder (Williams et al. 2010). Meta-analysis provides evidence for the association of rs1344706 that surpasses widely accepted benchmarks of significance by several orders of magnitude for both schizophrenia (P = 2.5 × 10−11, odds ratio 1.10, 95% confidence interval 1.07–1.14) and schizophrenia and bipolar disorder combined (P= 4.1 × 10−13, OR 1.11, 95% confidence interval 1.07–1.14) (Williams et al. 2010). The accepted cut-off statistic typically applied to GWA studies has been p < 5 × 10−8. Based on this established p value cut-off, the best genome wide findings for BD to date therefore appear to reside in the DGKH, ANK3, PBRM1, and ZNF804 genes.
There is debate about earlier simulated predictions using empirical HapMap data. In one early simulation performed by Nannya et al (2007), the statistical power needed to successfully identify genetic associations by the GWA study method had been estimated to be satisfactory with n= 1,000 cases and n= 1,000 controls. Other voices (Zaykin and Zhivotovsky 2005), however, argued that positive findings would have just a 26% chance of falling into the top 1,000 findings in two independent GWA studies, even when the power to detect an effect was set at 85% (Zaykin and Zhivotovsky 2005). According to this simulation, very few replications would have been expected. The mere fact that we have a few replications so far (SORCS2, ANK3, DFNB51, CACNA1C, PBRM1, and ZNF804) could therefore be viewed as highly encouraging. Nonetheless, the hype given to GWA studies when they first came forth may explain why most people today view our findings as discouraging.
Given so many sources of candidate genes, how many genes should underlie the inheritance of BD? One recent study that compared GWA studies of schizophrenia and BD (Purcell et al. 2009) estimated there might be hundreds or even thousands of different alleles that compose the polygenic inheritance of these illnesses. But, others point out that a polygenic model involving thousands of variants of weak effects is not likely to contribute to the heightened relative risks observed for BD in families (Nurnberger 2008). It has been argued that familial risks can only be explained by variants with large effects or by an oligogenic model involving at most two to three loci (Bodmer and Bonilla 2008; Hemminki et al. 2008).
So, how might the field move forward? A noteworthy attribute of the field is its willingness to pool data from different sources (Johnson and O'Donnell 2009). The single-most replicated finding to date, the ANK3 gene (Ferreira et al. 2008), could only have been achieved by pooling data from six different research teams: the WTCCC (Wellcome-Trust-Consortium 2007), STEP-BD (Nierenberg 2009), University College London (UCL), University of Edinburgh (ED), Trinity University of Dublin (DUB) and a second generation of STEP-BD called STEP2. And, another successful study that pooled data from different centers was able to highlight 69 “gene clusters” which were nominally yet positively associated with BD in at least three previous GWA studies (Johnson et al. 2009). In summary, pooling of samples and data is one way the field may move forward (Manolio et al. 2007).
Another way the field may move forward is through Convergent Functional Genomics (CFG) as described by Le Niculescu and colleagues (Le-Niculescu et al. 2009; Niculescu and Le-Niculescu 2010a, b). This approach, in addition to using p values from gene association studies, uses multiple independent lines of evidence related to the illness. These include mouse gene expression findings and human postmortem brain evidence. The CFG method takes account of several diversified datasets and prioritizes them within a cohort, similar to a Google PageRank algorithm. Genes prioritized by the CFG method may not have the highest p values from each GWA study, but are found to generalize across independent cohorts. Since the CFG approach is based on a gene-centric integration rather than a SNP-centric integration, it reduces heterogeneity. It can and has been used to mine GWA study datasets (Le-Niculescu et al. 2009), and to extract panels of top genes or biomarkers that reproduce well in independent cohorts. Studies that by themselves are relatively underpowered can be mined and made to yield results using CFG, by bringing to bear other large datasets and databases relevant to BD. In fact, we foresee the utility of similar approaches for a future analysis of our database.
Along the line of new analytical methods, a recent re-evaluation of the WTCCC dataset by “network analysis” has revealed the possibility of two-locus epistasis associated with BD (Emily et al. 2009). Epistasis is a phenomenon whereby one genetic variant requires another genetic variant to reveal the trait. Networks of genes might be explored more thoroughly as the field advances (O'Dushlaine et al. 2009; Peng et al. 2009; Walsh et al. 2008; Wang et al. 2007; Zamar et al. 2009). According to this approach a given variant might alone exert only small relative risk for the disease but may synergize through co-inheritance with other risk alleles in a biological pathway to produce a strong predisposition for the disease. If this model holds true, one of its logical corollaries might be that the likelihood of replicating any particular gene finding would depend not only on allele A in studies 1 and 2, but also on how closely the frequencies of key co-variants in genes B, C, and D match across studies.
There have also been papers highlighting the potential significance of interactomes in schizophrenia. At least two reports have described the DISC1 interactome (Camargo et al. 2007; Guo et al. 2009), as well as the DTNBD1 interactome (Guo et al. 2009). These studies are actually concerning schizophrenia, but are mentioned to address the likelihood of future interactome research in BD. The biological function of DISC1 remains poorly understood. Several reports on DISC1 have shown the protein interacts with two other gene products (the transcripts of NDEL1 and PDE4B genes), suggesting roles for DISC1 in the regulation of neurodevelopment and/or cAMP signaling pathways, respectively(Millar et al. 2005). Through functional and pathways analysis, there is evidence that DISC1 may intimately link to synapse function (Millar et al. 2003). The ‘DISC1 Interactome’ is envisioned as a network consisting of 127 proteins and 158 interactions. The other published interactome (Guo et al. 2009) involves the interaction between DISC1 and the dysbindin gene (DTNBD1). These observations suggest the involvement of the DTNBD1 gene which might affect dysbindin protein expression and thereby confer illness (Zinkstok et al. 2007). It is suggested that dysbindin and DISC1 proteins lie in a common biochemical pathway of relevance to Schizophrenia and BD.
There, of course, also remains the nagging likelihood that the diagnosis of BD might not itself be an entirely suitable phenotype to study. Type-1 of the disorder (BD1) is characterized by oscillating episodes of mania and depression, and type-2 (BD2) involves milder manic episodes (hypomania) followed by recurrent depressions. Both subtypes run together in pedigrees (Berrettini 2000). Moreover, major depressive disorder (MDD), schizoaffective disorder and schizophrenia are represented in most families of BD probands (Berrettini 2000). Therefore, BD may be best considered as a spectrum disorder. For future studies in this area we suggest that sub-symptoms be characterized according to the Bipolar Disorder Phenome Database (Potash et al. 2007) or the work of Deo and colleagues (Deo et al. 2010).
Finally, we will conclude with what findings stand out by their absence from our database. We refer to the paucity of positive genetic associations involving genes known to be involved with antidepressants. Proponents of the monoaminergic theory of depression would need to dig very deeply into our database to find any evidence of monoaminergic genes involved in the etiology of BD. This simple fact shows how open the field remains to new gene theories.
We thank Dr. Chindo Hicks of the Loyola University Medical Center and Dr. Elliott Gershon of the University of Chicago for useful comments on preliminary drafts of the manuscript. United States copyright protection for the database has been applied for.