|Home | About | Journals | Submit | Contact Us | Français|
The relatively new field of genetic epidemiology has witnessed some exciting leaps forward in our quest to understand the population and familial nature of genetic inheritance. We have witnessed the mapping of thousands of genetic loci contributing to both mendelian and complex diseases, and new high-throughput, low-cost sequencing technologies promise to uncover even more. Here I highlight five publications that have shaped how we think about and conduct the task of unraveling the genetic causes of disease on a population scale.
The application of epidemiological principles to human genetic studies was first discussed by Neel and Schull . In a relatively brief time span, genetic epidemiology has been defined and redefined by practitioners with diverse backgrounds, but two important concepts underlying genetic epidemiology have emerged: 1) the study of the etiology of disease among groups of relatives to unravel the cause of family resemblance; and 2) the study of inherited causes of disease in populations .
The essence of a modern genetic epidemiologist’s work is to assess the nature of genetic inheritance within populations and families, utilizing traditional epidemiology, statistical genetics, human genetics, and population genetics. In selecting the following papers, I sought work that, more than highlighting any one gene discovery, pushed forward our ability to make these discoveries. These papers have greatly influenced how we think about and conduct the task of unraveling the genetic causes of disease. These include: 1) the description of the logarithm of odds (LOD) score for a sequential test of linkage in multiple families ; 2) the first report of a generally available computer program to perform linkage analysis ; 3) a theoretical paper that made the case for applying association mapping to identify the genetic basis of complex diseases ; 4) the report that signified the initial completion of the public human genome project ; and 5) the first report of a successful genome-wide association study using SNP markers .
The study of phenotypic variation due to inherited genetic variation intuitively leads one directly to the family as the unit of study. While direct observation of recombination events between genotype and phenotype in a pedigree is the ideal method for assessing linkage to a particular genetic locus in a pedigree, incomplete data and unknown genetic model parameters make this method inefficient. Early statistical approaches to conduct linkage analysis tended to have limited applications in terms of pedigree size and/or were laborious to apply before the advent of computer programs to do so. Morton introduced the LOD score method for computing linkage. LOD scores have the property of being additive and, thus, can be applied to multiple families subsequently summed to test for linkage over a number of independent families. Over the past 50 or so years, numerous extensions and modifications have been made to Morton’s original LOD score method, but this paper laid the foundation for how we continue to conduct linkage analysis today.
The LIPED computer program described in this paper was the first widely available program for conducting genetic linkage studies. Until this point, researchers relied primarily on published LOD score tables to assess linkage. The publication of LIPED added a new tool to the genetic epidemiologist’s toolbox: the computer. This program implemented the Elston-Stewart algorithm  for computing the likelihood in general pedigrees, allowing researchers to compute LOD scores in general pedigree with few exceptions. While this was not the first effort to exploit computers to conduct linkage analysis, this was the first program that was both portable and could compute likelihoods in more than two generation pedigrees. The use of computer programs, most freely available and written by scientists in the field, to analyze genetic data has become an invaluable tool to the genetic epidemiologist, with nearly 500 programs currently available (http://linkage.rockefeller.edu/soft/list.html).
Geneticists had been incredibly successful at identifying genetic loci for single-gene Mendelian diseases through linkage analysis but struggled to identify loci for traits with more complex inheritance patterns such as schizophrenia and diabetes. In this seminal paper, the authors demonstrate that for a locus with a genotypic risk ratio (GRR) of 1.5 (the high end of the range of GRRs observed in current studies of complex traits) and a disease allele frequency greater than 10 percent, 37 to 72 times more families were needed to detect significant linkage compared to tests of association. This paper resulted in a shift in thinking from linkage to association studies for complex traits likely resulting from multiple loci each with modest effects. The major limitation to this approach that the authors noted was the lack of known polymorphisms needed to interrogate the entire genome for association, which they estimated to be 100,000 to 1 million. However, the requisite number of markers in the form of single-nucleotide polymorphisms (SNPs) and the ability to both rapidly and cheaply genotype them would become available in less than 10 years from the publication of this paper, leading to myriad successful genome-wide association studies for complex diseases.
The sequencing of the entire human genome had long been discussed in the genetics literature. The complete sequence of the human genome revolutionized not only genetic epidemiology, but all of human genetics. A reference genome now existed to which all other human genome sequences could be compared. A much more accurate estimate of the total number of genes in the human genome was made of approximately 25,000 genes, much smaller than the 100,000 estimated in the years leading up to the complete sequence. For the genetic epidemiologist, this meant that more markers were available than ever before for the identification of genomic regions harboring disease susceptibility loci. More genomic annotation information was accessible to easily discover which genes were contained in linkage regions. I chose the publication announcing the completion of the public project because this project made its data freely available as it was generated, but I also must recognize the private sequencing effort that was published at the same time . This sequencing effort took a completely different and faster whole-genome shotgun sequencing approach that in some ways pushed the public effort to finish faster. While the privately sequenced genome required an expensive subscription to gain full access to the fully annotated data, the sequencing approach they took built the foundation for whole-genome sequencing that is used today. However, it was the freely available nature and constantly improving annotation of the public project that makes it a giant step forward for everyone in the genetics community.
This is the first report of a genome-wide association study that successfully identified a function variant associated with a disease, in this case a coding change in lymphotoxin-alpha associated with myocardial infarction. However, this study was more important than simply identifying a genetic variant for myocardial infarction; it was the first demonstration that the long discussed genome-wide association study (GWAS) was viable. The study involved genotyping >92,000 SNPs, more markers than had ever been genotyped in an association study, using a multiplex PCR-based assay, the order of magnitude of markers that was mentioned by Risch and Marikangas when they discussed the current limitations to conducting true genome-wide association studies in the mid-1990s. While this was a significant advance, the genotyping of hundreds of thousands of SNPs in this manner is a daunting task. Over the next several years, microarray technology, generally used for gene expression assays, was adapted to conduct truly high-throughput whole genome SNP genotyping [10,11,12]. Then, in 2005, we saw the first publication of a GWAS, this time identifying a functional mutation in complement factor H associated with age-related macular degeneration , which used this new microarray technology. This has led to a flood of GWAS with loci being mapped for a diverse set of diseases, including diabetes, Crohn’s disease, asthma, and glaucoma. Current microarrays are now able to genotype up to 1 million SNPs. This has forced the genetic epidemiology and statistical genetics communities to develop new analytical techniques to deal with this vast amount of data and allow us to realize its full potential. As we continue to explore these enormous GWAS datasets, the field continues to quickly advance, and now we are witnessing the first set of whole-genome sequencing studies that have identified several medically relevant rare loci [14,15,16], an exciting development that promises to further hone our ability to detect genetic loci contributing to heritable diseases within the population.