Schizophrenia (SCZ) is a debilitating neuropsychiatric syndrome of unknown etiology with a lifetime prevalence of 4–5 per 1000 (McGrath et al., 2008). Its evolutionary persistence, in spite of a marked reproductive disadvantage, and the nearly uniform incidence of SCZ worldwide, seem to point to a component of human genetic variation common to all populations (Huxley et al., 1964; Jablensky et al., 1992). If SCZ represents a “disease of humanity”, this could presuppose a role for variation associated with the speciation event(s) giving rise to Homo sapiens (Crow, 1997). It follows that the causal variants might reside in “genes” under selective pressure in the human or, to a lesser extent, primate lineages. Until recently, our ability to evaluate this hypothesis was limited by the resolution of comparative genomic studies. However, a newly published cross-species analysis has revealed numerous regions exhibiting signatures of rapid evolution in the human and primate lineages (Lindblad-Toh et al., 2011). Genes harboring or nearby these regions are enriched for immunity, developmental and, intriguingly, brain-related functions including axon guidance, extracellular signaling and receptor activity (Lindblad-Toh et al., 2011). We evaluate empirically the “disease of humanity” hypothesis by testing whether these 562 human-accelerated regions (HARs) and 576 primate-accelerated regions (PARs) are enriched for SCZ association signals.
Genome wide association studies (GWAS) represent a powerful, unbiased approach to the study of common genetic variation in complex human disease. Besides a modest number of statistically-significant signals, large GWAS are also known to yield many additional small and moderately-sized signals distributed over the genome (Purcell et al., 2009). Variants mapping to causal pathways are expected to show a substantial enrichment in smaller p-values. Thus, observing a significant enrichment in small p-values for a putatively causal pathway supports the validity of the hypothesis, while failure to do so casts doubt on the pathway’s relevance to the trait of interest.
We evaluate the strength of the “disease of humanity” hypothesis by testing whether single nucleotide polymorphisms (SNPs) mapping to HARs and PARs are significantly enriched in low p-values from the discovery phase (n = 21,856) of the Psychiatric GWAS Consortium (PGC) GWAS of SCZ (Ripke et al., 2011). To achieve this goal, we used the summary statistics for the 1.25 million SNPs reported by the consortium. Additionally, we considered the possibility of enrichment of HARs/PARs in low p-values from GWAS of bipolar disorder (BD) (Sklar et al., 2011) and major depressive disorder (MDD) (Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium, 2012). Enrichment was assessed using (i) SNPs falling within HARs/PARs and (ii) the previous set of SNPs with the addition of the closest SNP within 25 kb of a HAR/PAR having no SNPs mapping to within its bounds. Mappings of SNPs to HARs/PARs were based on physical positions (hg18) reported by the 29 Mammals Project (http://www.broadinstitute.org/scientific-community/science/projects/mammals-models/29-mammals-project).
Of critical importance is how best to test for enrichment when the distribution of effect sizes of SNPs in pathways can vary widely between a few large signals and many smaller ones. Because no single test is expected to have optimal power across such a large parametric space, we employed two complementary approaches: i) Simes test and ii) sum of squares test (SST). Intuitively, the Simes test is useful for detecting pathways containing a few larger effects. Alternatively, SST is better suited to detecting pathways containing a greater number of small effects. The statistical significance of SST was assessed via 50,000 permutations using linkage disequilibrium information for European subjects from the 1000 Genomes Project (Durbin et al., 2010).
Because large GWAS are expected to yield an excess of small p-values (Purcell et al., 2009), a pathway consisting of randomly chosen SNPs might still exhibit significant enrichment under the weaker (self-contained) null hypothesis of no association. Therefore, we tested the stronger (competitive) null hypothesis of HARs/PARs enrichment above the PGC background. We found that SNPs mapping to HAR and PAR regions were not enriched in small p-values irrespective of disorder or statistical test applied (Table 1). Even when testing against the weaker null hypothesis of no association, we found slight enrichment (Simes p-value = 0.01) only for PARs in BD when including the most proximal SNP within 25 kb (data not shown). The arguably more-relevant HARs were not enriched in low p-values even under the weaker null hypothesis.
The human- and primate-accelerated regions of evolutionary constraint reported by Lindblad-Toh et al. are not detectably enriched in SCZ causal variants. However, as noted by the authors, SNPs were found in fewer numbers and were of lower allele frequency in/near these constrained elements. Thus, the power to detect unobserved causal variants in using common SNPs may be limited for these regions. A more comprehensive approach might jointly consider rarer and structural variants together with common variants. Nonetheless, such a treatment awaits the more distant availability of data from very large sequencing studies.