For 20 years, genetic linkage combined with positional cloning has offered a rational and increasingly straightforward route to finding gene mutations that lead to monogenic disease, such as cystic fibrosis and Huntington’s disease (see the Glossary). With a few important exceptions, these searches have led to mutations that alter the amino acid sequence of a protein and that enormously increase the risk of disease.
During the past few years, genomewide association studies have identified a large number of robust associations between specific chromosomal loci and complex human disease, such as type 2 diabetes and rheumatoid arthritis1 (Fig. 1). This approach relies on the foundation of data produced by the International Human HapMap Project and the fact that genetic variance at one locus can predict with high probability genetic variance at an adjacent locus, typically over distances of 30,000 base pairs of DNA2 in the human genome, which contains about 3×109 base pairs. This haplotypic structure of the human genome means that it is possible to survey the genome for common variability associated with the risk of disease simply by genotyping approximately 500,000 judiciously chosen markers in the genome of several thousand case subjects and control subjects.3 Consequently, it is now routine to identify common, low-risk variants (i.e., those that are present in more than 5% of the population) that confer a small risk of disease, typically with odds ratios of 1.2 to 5.0.4
The platform that is used to genotype markers in genomewide association studies and related approaches has uncovered a startling degree of structural genomic variation. Although such variants were known to be causes of rare monogenic disorders,5,6 the extent of structural genomic variation among persons was largely unanticipated, and there is increasing interest in understanding how such variants may confer a risk of common diseases.7,8
The initial contention surrounding the viability of genomewide association studies has largely subsided. However, discussion has centered on evaluating how far such studies will take us in understanding the risks and causes of disease — and thus the time and resources that should be invested in genotyping more case subjects with any one disease to garner what many see as diminishing genetic returns. These issues are discussed in three Perspective articles in this issue of the Journal.9–11 Nonetheless, the current phase of rapid discovery is a remarkable change that ends a long period of frustration, when the investigation of the genetic causes of complex diseases could boast few successes. The data from genomewide association studies and emerging sequencing techniques offer a route to the dissection of genetic causes of human disease (Table 1).12–21 Here we describe this route and some of its challenges.
Genomewide association studies identify loci and not genes per se and cannot easily identify loci at which there are many rare risk alleles in any given population.22 Rather, this approach is designed to find loci that fit the common disease–common variant hypothesis of human disease23,24 (Table 2). Refinement of susceptibility loci and the identification of causal variants may be achieved through fine mapping (see the Glossary).
One observation that has taken many observers by surprise is that most loci that have been discovered through genomewide association analysis do not map to amino acid changes in proteins. Indeed, many of the loci do not even map to recognizable protein open reading frames but rather may act in the RNA world by altering either transcriptional or translational efficiency. They are thus predicted to affect gene expression. Effects on expression may be quite varied and include temporal and spatial effects on gene expression that may be broadly characterized as those that alter transcript levels in a constitutive manner, those that modulate transcript expression in response to stimuli, and those that affect splicing.
Therefore, there are two clear and immediate tasks: to develop an understanding of the genetics of gene expression and to identify disease-linked variants that are too rare to be picked up by association methods and yet have risk alleles of sufficient “strength” to allow detection with the use of linkage strategies (see the Glossary for descriptions of genetic association and genetic linkage). Meeting these challenges will serve efforts to better understand environmental influences on the causes of disease and may facilitate a systems-based understanding of disease, in which we come to understand the full, molecular network that is perturbed in disease.