|Home | About | Journals | Submit | Contact Us | Français|
Despite extensive efforts and a vast amount of resources spent, the genetic basis of common, multifactorial disorders remains poorly understood. Following two decades of research with the candidate-gene approach and thousands of relevant publications, very few genetic regions and genes can reliably be considered as true positives; the majority of gene-disease associations claimed by genetic association studies have shown inconsistency and non-reproducibility.1
More recently, the transition to genome-wide association studies (GWAS) has provided a new conceptual framework in the search for variants underlying common disorders.2 Rather than focusing on biological candidate genes, the genome is screened without any prior predilection for specific regions, genes, or variants thereof. Based on this interrogation of genetic variation on a genome-wide basis, GWAS have been characterized as “hypothesis-free” or “agnostic” approaches.3,4 The “hypothesis-free” basis of GWAS offered the opportunity to overcome difficulties and obstacles imposed by the incomplete understanding of disease pathophysiology.
The initial successes of GWAS provided a proof of their concept and dissolved the skepticism about the viability of this approach. Previously unsuspected genes and pathways were uncovered whereas some surprising associations of intergenic regions with common disorders, (such as 9p21.3 locus with coronary artery disease or 8q24 locus with cancer), posed unprecedented questions regarding the mechanistic role of genetic variation.5 Variants in such gene-depleted areas would have never been considered by the candidate-gene approach.
Much of the excitement that GWAS brought to the scientific community was based on the expectation that since these studies are “hypothesis-free”, and thus independent of the preexisting bias of traditional biology, a comprehensive description of the genetic causes of complex disease would become feasible. The available output from these expensive experiments explains a small fraction of disease heritability and the modest genetic effects detected to date are of little predictive value.6,7 Discovered associated polymorphisms are often likely to represent tagging markers rather than the culprit functional genetic variation. Given these tough realizations, a series of methodological concerns regarding the design, analysis and interpretation of GWAS, as well as translation of their findings, have been raised and strategies to address these issues and accelerate progress in the field have been proposed.8–10
However, it remains probably underappreciated that GWAS are dependent on underlying hypotheses despite their “hypothesis-free” label- which account for important limitations of this approach and could explain why the information derived from GWAS is incomplete. Characterizing these experiments as “hypothesis-free” or “agnostic” can be misleading and disregards the fact that the output of any biological experiment is primarily determined by the extent to which the hypotheses tested truly hold. Although not explicitly stated, GWAS are based on a priori hypotheses, dictated by the design of genotyping platforms or the analysis methodologies:
The design of current GWAS makes them suitable mainly for the discovery of common variants conferring low/moderate risks, in the context of the common disease/common variant hypothesis (CDCV).11 Given the current near impossibility of reliably detecting effects of rare alleles (owing to sample size and sequencing constraints), most of GWAS have analyzed single nucleotide polymorphisms (SNPs) with minor allele frequencies of more than 5%. Rare variants, with frequencies lying somewhere between the limits of deleterious mutations and polymorphic variations (i.e. 0.1–1%) are not examined by current GWAS. According to an alternative hypothesis (common disease/rare variant hypothesis), complex traits are caused collectively by multiple rare variants with moderate to high penetrance.12 Evolutionary theories, empirical evidence and data from the HapMap project support this hypothesis.13 Because of their low frequency and individually small contributions to the overall inherited disease susceptibility, rare variants will not be detectable by population association studies based on the use of linked polymorphic markers. Consequently, if the CDCV hypothesis does not hold, or at least holds only in some of the cases, then the “missing heritability” will not be tracked by GWAS. Rare variants are more likely to be detected by extensive resequencing of carefully selected candidate genes in relatively large numbers of carefully chosen cases, together with a thorough analysis of the functional effects of any suspected variants.14 Multiple rare pathogenic variants have been detected by this approach.15,16 However, candidate gene sequencing will probably not suffice to identify important rare variants and sequencing on a whole-genome basis will be required.9 It is anticipated that advances in genotyping technologies and novel genetic variation maps that capture rare variants (1,000 Genomes Project17) will make whole-genome searches for rare variants feasible.
SNPs are the markers that have been selected to serve as the tools for pinpointing the genetic determinants of complex traits. This choice was mainly based on their abundance (over 12 million SNPs), and the technological facilities for their high-throughput analysis. Based on the current design of GWAS, other important sources of variation are discarded, such as structural variants, noncoding RNAs or epigenetic changes. The most common types of variants are gains and losses of DNA, called copy number variants (CNVs), which likely exert important phenotypic effects on gene expression and function.18 Several CNVs have been shown to be implicated in common disorders, as both rare and common genomic changes. Most of the associations between CNVs and complex disorders reported so far have been unveiled through candidate gene or candidate region approaches.18 Despite some successful examples of SNP-tagged CNV association detections,19,20 the first wave of GWAS has largely missed the contribution of CNVs. This occurs because CNVs are not easily tagged by SNPs (influencing Hardy-Weinberg Equilibrium and thus being eliminated during genotype quality-control checks), CNVs might have a wide range of copy number variability, and often fall in genomic regions not well covered by genotyping platforms or not genotyped by the HapMap project.21 Recently developed genotyping chips allow the simultaneous interrogation of SNPs and CNVs across the genome with improved coverage.22 The genome-wide detection of rare CNVs, as well as the discrimination capacity for variants with wide range of variability in copy number, represent important challenges for genotyping technology research. Since the global inventory of CNVs remains incomplete, and with limited empirical data currently available, the extent to which current arrays capture the structural variome and enable a more comprehensive elucidation of the spectrum of genomic variability remains unclear.18,21 Even under optimal desgin, genome-wide SNP screens will still not be informative about the role of other components of the regulatory architecture of the genome, such as noncoding RNAs or epigenetic modifications (somatically-acquired or trans-generationally inherited); complementary techniques, such as high-throughput methylome analysis, will be ultimately needed.23,24
Most of GWAS to date have used a single-locus analysis strategy, in which each variant is tested individually for association with a specific phenotype, assuming independent effects, despite the axiomatic consideration that complex disorders are caused by the complex interplay between multiple genetic and environmental factors.25 A genetic variant that functions primarily through such a complex mechanism might be missed in a GWAS, if the variant is examined in isolation without allowing for its potential interactions with other unknown factors. Many SNPs have been shown to have small individual effect sizes, but their combined effect may be much larger.26 Unfortunately, these useful signals are embedded in a genome-wide sea of noise. Given the large multiple testing, very few signals exceed the genome-wide significance threshold and those that do not exceed this stringent statistical requirement are generally neglected. Exploration of interactions has been the exception rather than the rule in GWAS analyses, and only a few small-scale examples are available.27 A search for epistatic effects is computationally demanding, as several billions of SNP pairs exist for typical genotyping chips and gene-environment interactions pose formidable methodological challenges. Recently, several methods and software packages have been developed for the consideration of the statistical interactions between loci26,27 and between genes and environmental factors.28 The application of these methodologies is expected to enhance the power to detect associations and to provide further insight in the biological and biochemical pathways of disease.26 Although the detection and interpretation of these interactions will not be straightforward, genome-wide interaction testing is the next natural step after the single-locus univariate analysis techniques.
In conclusion, GWAS represent one forward step towards a more complete understanding of the genetic architecture of complex disorders. Unbiased by prior pathophysiological assumptions, more than 350 GWAS have substantially changed the landscape of genetic associations for more than 80 traits and provided new insights into disease mechanisms.4,5,29 The independence of GWAS from the need of biological candidates has led to their characterization as “hypothesis-free” or “agnostic” experiments. However, these terms can be misleading; in essence, GWAS are based on prior hypotheses, implicitly determined by the statistical analysis or the design of the genotyping array. GWAS to date have largely focused on the role of common SNPs and probably most of the heritability of complex disorders captured by these variants has been accounted for.6 The remaining variance may be due to rare SNPs, structural variants, epigenetic effects, other unsuspected genomic mechanisms, gene-gene or gene-environment interactions that have not been adequately modeled.9,27
In the recent past, other venues of research (genome-wide linkage scans and microarray expression profiling experiments)30,31 had been celebrated as “hypothesis-free” and unbiased approaches, considered to be liberated from the need of restrictive presumptions about the causative molecular mechanisms of disease. Nevertheless, these techniques fell short from providing a comprehensive picture of complex disorders genetics. It is important to acknowledge the underlying assumptions of GWAS, since in fact all biological research is based on some hypotheses, although they may not be always explicitly stated.32 More hypotheses of these experiments regarding what is actually interrogated in genome-wide screens may also reveal themselves as the understanding of the molecular mechanisms progresses. Unless the inherent limitations of GWAS are addressed, and complementary genetic analysis methods, such as targeted or whole-genome sequencing, are implemented, the full advantage of genomic scans of human variation will not be realized.
Georgios Kitsios is a Pfizer Post-Doctoral Fellow in Clinical Research. Scientific support for this project was provided through the Tufts Clinical and Translational Science Institute (Tufts CTSI) under funding from the National Institute of Health/National Center for Research Resources (UL1 RR025752). Points of view or opinions in this paper are those of the authors and do not necessarily represent the official position or policies of the Tufts CTSI.