Recent advances in cataloging human genetic polymorphisms, in addition to the decreasing cost of high throughput SNP genotyping and the development of statistical methodology to analyze large sample sets in a rigorous manner have made genome-wide association studies (GWAS) a feasible method for genetic studies of complex disorders [84
]. Based on the hypothesis that a proportion of the genetic susceptibility for common diseases may be caused by common genetic variants that arose early in human history and thus are shared across members of a population derived from a common set of ancestors (the common-disease, common variant or CDCV hypothesis), this approach has been remarkably successful, with over 150 common variants identified within the past 2 years [84
Genome-Wide Association Studies provide several advantages over linkage and candidate gene studies. With a sufficiently large sample, it is possible to overcome the lack of power in linkage analyses to detect common alleles with low penetrance. GWAS can also detect much smaller associated DNA regions compared to linkage studies, since linkage analyses are based on rare recombination events in only a few generations, thus resulting in large linked regions. Unlike linkage analyses, GWAS rely on historical recombination events in populations over the course of human history, thus resulting in much smaller regions of association. Furthermore, a great advantage of GWAS over candidate genes studies is the fact that GWAS assume no prior biological knowledge of the disease process, but instead test for association across the whole genome in an agnostic approach. Newer GWAS genotyping platforms now also have the additional benefit of containing copy-number probes to allow examination of both SNPs and CNVs in a single experiment [99
For conducting GWAS it is critical that the sample is sufficiently large to provide enough statistical power to reduce Type I error and detect an association. The power of GWAS to detect an association is dependent both on the allele frequency of the disease variant as well as its effect size, expressed as a genotype relative risk (GRR). GRR is defined as the probability of a person with a specific disease variant to have a disease compared to the probability of a person without that gene variant. Larger sample sizes are required to detect alleles of low frequency or of small effect.
As can be seen in , for GRR=1.2 the number of cases and controls needed to detect a causative allele reduces from about 23,000 at minor allele frequency (MAF) =5% to ~ 8,000 at MAF=20% and to ~6,000 at MAF=40%. The effect of GRR on the study size is even greater: at MAF =10% and GRR 1.1, about 47,000 cases and controls are needed, but as GRR increases to 1.2, 1.3, 1.5, 1.7, and 2.0 the required number of individuals decreases to ~12,500, ~6,000, 2,300, 1,300, and 700 respectively. For instance, a GWAS for age-related macular degeneration (AMD) of only 96 cases and 50 controls was able to detect an association with a common variant in the complement factor H gene (CFH) at a nominal P value <10−7
]. The power to detect this association was due to the high GRR caused by CFH variant: the presence of two risk alleles in an individual increased risk of developing AMD by a factor of 7.4 [101
]. On the other hand, in order to detect alleles with modest effect sizes, tens of thousands of cases and controls could be needed, as demonstrated by studies on human height variation with a combined sample size of ~63,000 individuals [102
]. These studies found 54 variants, each with an average size effect of 0.4 cm per allele, indicating that an even larger sample size maybe needed to detect common, small-effect alleles responsible for the residual population variance [105
Figure 2 The number of cases and controls required in an association study to achieve 80% power across a range of minor allele frequencies (MAF) and genotype relative risks (GRR) for a multiplicative model. GTS prevalence was set at 2% (1% GTS genes + 1% OCD genes) (more ...)
As described above, GWAS are designed specifically to detect association of common disease variants (typically with allele frequencies ≥5% in a population) [106
]; GWAS have essentially no power to detect multiple rare variants in either the same gene or in different genes (the multiple rare variants hypothesis) [107
]. However the ability to detect CNVs is a promising parallel approach, since these polymorphisms appear to exist both in common and rare forms and have been demonstrated to contribute to the manifestation of other psychiatric disorders such as autism [62
] and schizophrenia [65
]. In addition, rare, highly penetrant CNVs can be detected in a relatively small sample size, as demonstrated by studies of schizophrenia which consisted of about 150 individuals [63
What has become evident in GWAS of other diseases is that large-scale, multi-center collaborations are needed to obtain a sample sufficiently large enough to provide adequate power for a GWAS. A recent genome-wide association analysis of bipolar disorder, which combined three datasets [109
] totaling over 4300 cases and 6200 controls [109
], detected two markers with genome wide significant association. Combining the datasets was instrumental in providing enough power to reach a genome-wide level of statistical significance, since none of these associations were detected in the individual samples.
In collaboration with other investigators, the TSAICG has undertaken a GWAS for GTS. The GTS GWAS should help identify short genomic segments (either SNPs or CNVs) harboring susceptibility genes for GTS. Once these genes are identified, research can be initiated to elucidate the biological pathways and processes influencing the development of the GTS phenotype. These pathways will hopefully reveal cellular and molecular mechanisms previously unsuspected in GTS pathogenesis, and thus could help to develop new treatment paradigms to significantly reduce the suffering experienced by individuals with GTS and related disorders.