|Home | About | Journals | Submit | Contact Us | Français|
The recent revolution in genomics is already having a profound impact on the practice of epidemiology. The purpose of this commentary is to demonstrate how genomics and epidemiology will continue to rely heavily on each other, now and in the future, by illustrating a number of interaction points between these 2 disciplines: (1) the use of genomics to estimate disease heritability; (2) the impact of genomics on analytical study design; (3) how genome-wide data can be employed to effectively overcome residual population stratification arising from selection bias; (4) the importance of genomics as a tool in epidemiological investigation; (5) the importance of epidemiology in the collection of adequately phenotyped samples for genomics studies, and (6) for unraveling the clinical and therapeutic relevance of genetic variants once they are discovered.
Technological advancements that allow the genotyping of several thousands of single-nucleotide polymorphisms across the entire genome in an efficient, high-throughput manner have essentially allowed the genetics research community to foray beyond the realm of rare mendelian conditions into the arena of common diseases. Since the publication of the first genome-wide association study three years ago , there has been a flood of such studies detailing the genetic basis of diseases ranging from inflammatory bowel disease to hypertension. The National Human Genome Research Institute catalog lists 184 genome-wide association studies with over a 1,000 SNPs linked to 130 diseases and traits, and this list expands on an almost daily basis . Undoubtedly, advances in genomics and the knowledge that these studies bring are already having a profound influence on the practice of epidemiology, and the interaction between the two fields will continue to be mutually beneficial, especially as epidemiologists develop novel methods to utilize genomic data . This article attempts to describe the symbiotic relationship that has developed between genomics and epidemiology and to illustrate how the two disciplines will continue to rely heavily on each other for their future success. As in all good relationships, there will be give-and-take, but, overall, both fields will benefit.
The reality is that genomics will overshadow the role of family studies in estimating disease heritability. By its very nature, traditional family study methodology focuses on the collection of family history for a large number of cases and controls, a process often requiring many years to complete. Furthermore, even the most assiduous collection cannot avoid bias arising from incomplete data due to family members not knowing, or being unwilling to provide, pedigree data. Stricter privacy laws are eroding investigators' ability to collect clinical and demographic data on relatives without their knowledge or permission, making family studies even harder to successfully complete.
In contrast, the high density of genotype data generated in genome-wide association studies allows disease heritability to be more accurately and more easily estimated. This approach is not hampered by lack of family history, and the analysis can be completed in a relatively short period of time. Despite the upfront high cost of genome-wide studies, overall they still represent considerable savings compared to longer-term family studies. Even the usefulness of family studies for identifying novel relationships between diseases will eventually be superseded as the genetic architecture of human illness is more fully understood, and pathway analysis (i.e. determining which pathways are perturbed in disease based on genome-wide data) is applied to link apparently disparate diseases. Indeed, genomic-based pathway analysis may serve as the basis of a new classification system of human pathology by grouping diseases arising from defects in the same biological pathways. For example, genome-wide association studies have found that variants in two genes associated with increased risk of diabetes also influence prostate cancer susceptibility among men [4,5,6]. Pathway analysis of neurodegenerative diseases, such as amyotrophic lateral sclerosis and Parkinson's disease is already attempting to tease apart the cellular mechanisms involved in neuronal cell death . Though such system biology studies should be currently considered as preliminary, this methodology will significantly improve over time.
Population stratification, where cases are drawn from a different population than controls, continues to be a major issue in case-control studies, despite the enormous effort that is typically expended to adequately match cases and controls. Selection bias interferes with data interpretation by obscuring true associations and by generating false-positive associations that in reality are being driven by differences in the case/control populations. Genome-wide genotype data provide a straightforward solution to overcome this problem, as such data can be used to estimate principal component vectors that are then included as covariates in a linear regression model. This method effectively corrects for residual population stratification, and the incorporation of genome-wide data into standard epidemiological models will be an attractive tool for the future as single-nucleotide polymorphism genotyping costs continue to decrease.
The area where genomics will have the most impact will be in environmental risk factor analysis study design. The myriad of variants that are reported to be associated with Crohn's disease has shown that genetics plays a far greater role in the pathogenesis of common diseases than previously thought . Environmental factors do undoubtedly play an essential role in triggering disease and influencing phenotype, though the emphasis has now shifted to the concept of environmental agents working on a genetically susceptible individual . Such gene-environment interaction will dominate the field of analytical epidemiology for the foreseeable future, though statistical techniques that adequately counter the enormous multiple testing involved in such studies remain to be resolved before the full power of this approach can be realized.
The classic risk factor study design of collecting as many environmental data as possible from as large a cohort as possible will give way to more tailored data acquisition based on knowledge of the underlying genetics and biology. For example, if it is known that variants within a particular biological pathway are responsible for causing a disease, then a parsimonious approach would be to focus data collection on environmental agents known to influence that pathway. Ideally, this targeted hypothesis approach will minimize the study costs by decreasing the sample size and by shortening the study time, while maximizing the chances of detecting relevant agents. A further intriguing possibility is prior selection of case and control subjects based on the presence or absence of a particular genetic marker with the specific aim of decreasing etiological heterogeneity, thereby increasing the ability to detect biologically relevant environmental risk factors. Epidemiological studies of Alzheimer's disease already stratify cohorts based on ApoE status, an approach that led to the identification of repetitive head trauma as a risk factor for developing dementia in carriers of the ApoE4 allele . This approach will be greatly expanded upon in the design of future epidemiological studies.
Epidemiologists are already employing genetics as a tool of investigation, particularly in the area of infectious diseases. Sequencing of the genome of the severe acute respiratory syndrome virus was instrumental in tracing its phylogenetic lineage [11, 12], and a combination of genomic and epidemiological information allowed Chinese officials to trace the genotypic variation of the viral transmission paths [13, 14]. Similar approaches are being employed to understand the evolutionary biology and spread of bird flu and human influenza , both with potentially huge public health impact across the globe. As sequencing costs continue to decrease and whole-genome sequencing becomes a reality, genetics will be increasingly incorporated into neuroepidemiological studies.
Of course genome-wide association studies are not without their own problems, such as confounding arising from population stratification, the need for large sample sizes to detect minor effect alleles and inflated false-positive association rates arising from the several thousand tests that are an integral part of any such study [16, 17]. Epidemiologists can help geneticists overcome these problems, particularly by providing the infrastructure to collect large, well-phenotyped samples from affected and unaffected individuals drawn from similar ethnic backgrounds. Typically these cohorts are derived from population-based, natural history studies of particular diseases, often established many years ago prior to the development of the technology that underpins the genomics revolution. Indeed, there are already examples of how such projects have morphed into genomics in an effort to understand how genetic variation influences population susceptibility to disease. A genome-wide association study based on volumetric brain MRI and cognitive testing of 705 stroke- and dementia-free Framingham Heart Study participants identified significant correlation between SORL1 variants and abstract reasoning, and between CDH4 variants and brain volume . Thus, it is true to say that neuroepidemiologists have long recognized the value of genomics in research, and have invested considerable resources to collect endophenotype data and to bank biological samples from population-based studies in the expectation of technological advances . The future will see a tremendous return on their investment in this crucial infrastructure.
Determining the genetic variants that underlie complex diseases represents only the beginning, and ‘translating’ these discoveries to everyday clinical practice, as diagnostic tools and as therapy, will rely on carefully conducted, population-based epidemiological studies. The aim of these studies will be to understand the relevance of genetic variants associated with a disease within a population to disease within an individual patient. How many risk variants does an individual require before they are destined to develop a neurological disease? Do the variants merely affect age of onset, or do they also influence disease severity and outcome? How do these variants interact with each other to determine an individual's risk of disease, and what is the biological basis for this interaction? In complex diseases arising from multiple different loci in each individual patient, is changing the expression of a single variant sufficient to prevent disease in that individual? Is it too late to institute such an intervention at the time of first presentation, or should we undertake population screening and presymptomatic intervention? All of these questions must be considered before the advantages of our knowledge about genetics can take full effect. Longitudinal, prospective epidemiological studies are the ideal tool to address these issues in a meaningful, scientifically rigorous manner. An example of such a study is underway at the National Institutes of Health, where patients with Parkinson's disease due to mutations in the LRRK2 gene, identified as a key cause of familial and sporadic Parkinson's disease [19, 20], will be followed over a ten-year period to elucidate how symptoms develop over time (www.clinicaltrials.gov, NCT00467090). Such studies are likely to become commonplace in the future, as the genomic architecture of diseases is uncovered.
In summary, there is a long-standing symbiotic relationship between epidemiology and genetics, which the current explosion in genomics will enhance by facilitating a more focused evaluation of environmental triggers, and which epidemiology will feed by providing well-phenotyped clinical samples. The result will be faster, cheaper and better tools for determining disease pathogenesis. The era of genomic epidemiology is truly upon us.
This work was supported entirely by the Intramural Research Program of the NIH, the National Institute on Aging (project Z01 AG000949-02) and the National Institute of Neurological Disorders and Stroke.