|Home | About | Journals | Submit | Contact Us | Français|
Some of the finest examples of successful gene mapping have been for ophthalmic traits and diseases1,2,3. In this editorial, I briefly describe statistical approaches used in gene mapping of human ophthalmic traits and diseases and discuss some challenges and promising new approaches. A number of fine statistical genetic software programs are available. Because I use the statistical genetics package Mendel4 (www.genetics.ucla.edu/software), I provide the names of the appropriate Mendel options to perform specific gene mapping tests. Readers can get a comprehensive and annotated list of other statistical genetic programs from the Robert S. Boas Center for Genomics and Human Genetics (www.nslij-genetics.org/soft/).
Gene mapping methods fall into two broad statistical approaches, linkage analysis and association. Although they are often considered separately, linkage analysis and association are connected and there are sophisticated statistical methods that allow for joint estimation (see as examples, Pseudomarker5 or Mendel's Association_Given_Linkage option6). Linkage analysis estimates trait/disease gene positions relative to genes of known location (markers) by using the degree of correlation between trait/disease phenotype inheritance and marker genotype inheritance (co-segregation). When a marker is close to a polymorphism in an autosomal gene that confers risk of a trait/disease, the offspring's chromosomal segments containing both the trait/disease and marker genes will be the same as those on his/her parental chromosomes; e.g. they will be inherited without any recombination between the two loci. When the marker is far away from the disease/trait gene (or on another chromosome) then the chance of recombination is 50%.
Linkage analysis is very effective when a single gene is both necessary and sufficient to change the trait value or cause disease (Mendelian trait/disease). In particular, genetic model-based linkage analysis, which specifies the number of genes, their mode of inheritance, the probability of the phenotype given the disease/trait genotype (the penetrance), and estimates genetic distances or recombination fractions, has been used very successfully to map genes for Mendelian diseases/traits2,3 in ophthalmic research. The test statistic is often the log-likelihood assuming linkage minus the log-likelihood assuming no linkage. Examples of model-based linkage analysis software are Merlin7 and Mendel's Location_Scores option4.
Multifactorial or complex traits/diseases, like glaucoma and age related macular degeneration (AMD), are the consequence of multiple, possibly interacting, genetic and environmental factors1,2. Genes that underlie these complex traits/diseases are difficult to map because accurately specifying the genetic model can be almost impossible. A popular approach when studying complex traits/diseases are genetic “model-free” linkage analyses. (N.B. These methods are not completely genetic model-free, they make implicit genetic assumptions5). Model-free methods look for increased gene sharing (identity by descent) between relatives with similar phenotypes and, sometimes, decreased identity by descent between relatives with dissimilar phenotypes. Wald tests comparing observed and expected identity by descent are popular for affected relative data; variance-component analyses are often used with continuous traits, and programs that implement these methods include Mendel's NPL8 and Polygenic_QTL9 options, Merlin7, SOLAR10 and Genehunter11.
Linkage analysis relies on the observed recombination between markers and trait/disease genes in families to define the likely regions where the trait/disease genes reside. Therefore, in practical terms, linkage analysis has a resolution limit12. Association analysis provides finer resolution. Association analysis is based on linkage disequilibrium (LD) where the allele frequencies of two closely situated genes will be highly correlated. If the marker and trait/disease genes are not coincident, then over many generations there will be recombination events that eliminate the correlation and result in linkage equilibrium. LD decreases as (1) the time since the introduction of the polymorphisms into the population increases, (2) the distance between the marker and trait/disease loci increases and (3) the minor allele frequencies decrease. Thus association analysis is most powerful when the trait/disease conferring polymorphism was recently introduced, the risk allele is relatively common and the chromosomal region is densely covered by the markers.
Association analysis can be conducted with families or unrelated individuals. Cases (affected individuals) and controls (randomly selected or unaffected individuals) provide a particularly simple study design. The underlying assumption of this analysis is that cases have the same risk conferring alleles at the trait/disease genes (common disease – common variant hypothesis, CDCV). Because the risk conferring genes are in LD with nearby markers, cases have marker alleles in common. So when a marker is close to the trait/disease gene, cases have different marker genotype frequencies than the controls. Thus simple tests of association are contingency table analyses or likelihood ratio tests that compare the marker genotype frequencies of cases and controls. Examples of statistical packages are Mendel's Allele_Frequencies and Cases_And_Controls options13 and PLINK14.
Because the markers need to be quite close to the trait/disease genes in order to have sufficient power to detect LD, association studies were once limited to refining the chromosomal regions first found through linkage analysis or to a small number of candidate genes. In general, these early studies were unsuccessful. However, with the current ability to genotype 100,000 to 1,000,000 single nucleotide polymorphisms (SNPs) in thousands of individuals, genome wide association studies (GWAS) have replaced linkage analysis as the preferred gene mapping approach. The first successful GWAS is the association of a common variant of the complement factor H gene (CFH) with AMD15. Although it is tempting to conclude that many genes for ophthalmic traits/diseases can as easily be mapped with GWAS designs, it is important to remember that the CFH-AMD association represents a particularly strong effect that was also found using the traditional approach of linkage analysis followed by association fine mapping16,17. It is also important to remember that not all trait/disease genes will conform to the CDCV hypothesis. A number of rare variants on different chromosomal backgrounds will exist in those genes, making their detection in a GWAS highly unlikely18.
Currently most association studies start by testing each SNP separately. However, the products of the approximately 25,000 genes in the human genome must interact and so this one gene at a time approach to gene mapping may fail. The development of efficient and powerful statistical methods to uncover gene networks that determine clinical trait values is active area of research19,20 A key assumption of these integrative genetic approaches is that the polymorphisms that ultimately affect clinically observable traits act by perturbing molecular networks. One approach is to construct gene co-expression network modules and determine which, if any, of these modules are correlated with the clinical traits19,20.
The massive amount of data now available present computational and statistical challenges that are active research areas14,21,22. Storage and manipulation of so much data is cumbersome and methods have recently been developed to compress datasets and to efficiently extract relevant subsets14. Implementing diagnostic procedures and interpreting the results also present challenges14,21. Association testing with 100,000+ SNPs and multiple trait/disease phenotypes leads to a serious problem of how to limit the number of false positive results without substantially raising the false negative rate. Sequence data will only increase the problem21. The massive amount of data also provides opportunities, however. As an example, association studies can lead to incorrect results if there is population stratification but with large amounts of SNP data ancestry can be accurately inferred and researchers can control for population stratification or exploit it in gene mapping23.
Statistical genetics plays an important role in ophthalmic research and will continue to play an important role as the amount of relevant data increases. I close with a few comments for those researchers wanting a few simple rules of thumb to determine the optimal statistical approach. Unfortunately, no one study design or statistical approach to gene mapping will be optimal for every ophthalmic trait/disease. The best approach will depend on a number of factors including the prevalence of the trait/disease, the age of onset, the underlying genetic and environmental determinants, resources, and the means of recruiting individuals or families. I highly recommend that, early in the study design phase, researchers collaborate with a statistician to determine the approach most likely to succeed in their case.
The author is supported in part by United States Public Health Service grants MH59490 and GM53275. I thank Prof. Paivi Pajukanta and an anonymous reviewer for their comments on an early draft of this manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.