Related Articles
Domestic animals are invaluable resources for study of the molecular architecture of complex traits. Although the mapping of quantitative trait loci (QTL) responsible for economically important traits in domestic animals has achieved remarkable results in recent decades, not all of the genetic variation in the complex traits has been captured because of the low density of markers used in QTL mapping studies. The genome wide association study (GWAS), which utilizes high-density single-nucleotide polymorphism (SNP), provides a new way to tackle this issue. Encouraging achievements in dissection of the genetic mechanisms of complex diseases in humans have resulted from the use of GWAS. At present, GWAS has been applied to the field of domestic animal breeding and genetics, and some advances have been made. Many genes or markers that affect economic traits of interest in domestic animals have been identified. In this review, advances in the use of GWAS in domestic animals are described.
doi:10.1186/2049-1891-3-26
PMCID: PMC3506437
PMID: 22958308
Domestic animals; Genome wide association study (GWAS); Quantitative trait loci (QTL); Single-nucleotide polymorphism (SNP)
Background
In the last years GWA studies have successfully identified common SNPs associated with complex diseases. However, most of the variants found this way account for only a small portion of the trait variance. This fact leads researchers to focus on rare-variant mapping with large scale sequencing, which can be facilitated by using linkage information. The question arises why linkage analysis often fails to identify genes when analyzing complex diseases. Using simulations we have investigated the power of parametric and nonparametric linkage statistics (KC-LOD, NPL, LOD and MOD scores), to detect the effect of genes responsible for complex diseases using different pedigree structures.
Results
As expected, a small number of pedigrees with less than three affected individuals has low power to map disease genes with modest effect. Interestingly, the power decreases when unaffected individuals are included in the analysis, irrespective of the true mode of inheritance. Furthermore, we found that the best performing statistic depends not only on the type of pedigrees but also on the true mode of inheritance.
Conclusions
When applied in a sensible way linkage is an appropriate and robust technique to map genes for complex disease. Unlike association analysis, linkage analysis is not hampered by allelic heterogeneity. So, why does linkage analysis often fail with complex diseases? Evidently, when using an insufficient number of small pedigrees, one might miss a true genetic linkage when actually a real effect exists. Furthermore, we show that the test statistic has an important effect on the power to detect linkage as well. Therefore, a linkage analysis might fail if an inadequate test statistic is employed. We provide recommendations regarding the most favorable test statistics, in terms of power, for a given mode of inheritance and type of pedigrees under study, in order to reduce the probability to miss a true linkage.
doi:10.1186/1756-0500-5-411
PMCID: PMC3475142
PMID: 22862841
Linkage; Parametric analysis; Nonparametric analysis; NPL score; LOD score; MOD score; Complex diseases; Rare variants
Technology is driving the field of human genetics research with advances in techniques to generate high-throughput data that interrogate various levels of biological regulation. With this massive amount of data comes the important task of using powerful bioinformatics techniques to sift through the noise to find true signals that predict various human traits. A popular analytical method thus far has been the genome-wide association study (GWAS), which assesses the association of single nucleotide polymorphisms (SNPs) with the trait of interest. Unfortunately, GWAS has not been able to explain a substantial proportion of the estimated heritability for most complex traits. Due to the inherently complex nature of biology, this phenomenon could be a factor of the simplistic study design. A more powerful analysis may be a systems biology approach that integrates different types of data, or a meta-dimensional analysis. For this study we used the Analysis Tool for Heritable and Environmental Network Associations (ATHENA) to integrate high-throughput SNPs and gene expression variables (EVs) to predict high-density lipoprotein cholesterol (HDL-C) levels. We generated multivariable models that consisted of SNPs only, EVs only, and SNPs + EVs with testing r-squared values of 0.16, 0.11, and 0.18, respectively. Additionally, using just the SNPs and EVs from the best models, we generated a model with a testing r-squared of 0.32. A linear regression model with the same variables resulted in an adjusted r-squared of 0.23. With this systems biology approach, we were able to integrate different types of high-throughput data to generate meta-dimensional models that are predictive for the HDL-C in our data set. Additionally, our modeling method was able to capture more of the HDL-C variation than a linear regression model that included the same variables.
PMCID: PMC3587764
PMID: 23424143
Recent technological advances in microarray technology have allowed whole-genome association studies to become possible and greatly expanded the ability to map complex genetic traits across the human genome. Current mapping array technologies offer unprecedented resolution to examine single-nucleotide polymorphisms (SNP) across the human genome. The resolution of these mapping arrays can be greatly increased by performing secondary screens using a targeted genotyping approach to further characterize SNPs discovered with the mapping arrays as well as include SNPs that are determined to be of interest in candidate genes or other genetic areas of importance.
Over the last several months, we have examined 2200 patient samples from the Shanghai Breast Cancer Study cohort using a combination of Affymetrix 500k mapping arrays and a custom 1700 SNP targeted genotyping array. Beginning with the analysis of 200 samples analyzed with the 500k mapping arrays and proceeding to 2200 samples analyzed using the custom targeted genotyping arrays, data regarding the automation of assay protocols and data analysis schemes to greatly increase the efficiency and speed of the analysis will be presented. Association statistics as well as chromosome copy number information has been produced. Data will be presented to illustrate the importance of automation, accurate sample tracking, and examination of existing data from the HapMap project in evaluating and analyzing the resulting genotyping datasets.
PMCID: PMC2291829
Objective
Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning.
Methods
Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify trait–trait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits.
Results
A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller ‘shortest distance’ in protein interaction networks of complexly inherited diseases (Spearman p<2.2×10−16). Further, ‘cancer traits’ were similar to one another, as were ‘metabolic syndrome traits’ (Fisher's exact test p=0.001 and 3.5×10−7, respectively).
Conclusion
An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches.
doi:10.1136/amiajnl-2011-000482
PMCID: PMC3277620
PMID: 22278381
Complex disease; SNP; gene ontology; protein-interaction networks; information theory; translational bioinformatics; complex disease; ontology; bioinformatcis; genetics; network; prostate cancer; protein networks; pathway analysis; network modeling; knowledge representations; uncertain reasoning and decision theory; languages and computational methods
Genome-wide association (GWA) study is becoming a powerful tool in deciphering genetic basis of complex human diseases/traits. Currently, the univariate analysis is the most commonly used method to identify genes associated with a certain disease/phenotype under study. A major limitation with the univariate analysis is that it may not make use of the information of multiple correlated phenotypes, which are usually measured and collected in practical studies. The multivariate analysis has proven to be a powerful approach in linkage studies of complex diseases/traits, but it has received little attention in GWA. In this study, we aim to develop a bivariate analytical method for GWAS, which can be used for a complex situation that a continuous trait and a binary trait measured are under study. Based on the modified extended generalized estimating equation (EGEE) method we proposed herein, we assessed the performance of our bivariate analyses through extensive simulations as well as real data analyses. In the study, to develop an EGEE approach for bivariate genetic analyses, we combined two different generalized linear models corresponding to phenotypic variables using a Seemingly Unrelated Regression (SUR) model. The simulation results demonstrated that our EGEE-based bivariate analytical method outperforms univariate analyses in increasing statistical power under a variety of simulation scenarios. Notably, EGEE-based bivariate analyses have consistent advantages over univariate analyses whether or not there exits a phenotypic correlation between the two traits. Our study has practical importance, as one can always use multivariate analyses as a screening tool when multiple phenotypes are available, without extra costs of statistical power and false positive rate. Analyses on empirical GWA data further affirm the advantages of our bivariate analytical method.
doi:10.1002/gepi.20372
PMCID: PMC2745071
PMID: 18924135
DNA sequence variants (DSVs) are major components of the “causal field” for virtually all-medical phenotypes, whether single-gene familial disorders or complex traits without a clear familial aggregation. The causal variants in single gene disorders are necessary and sufficient to impart large effects. In contrast, complex traits are due to a much more complicated network of contributory components that in aggregate increase the probability of disease. The conventional approach to identification of the causal variants for single gene disorders is genetic linkage. However, it does not offer sufficient resolution to map the causal genes in small size families or sporadic cases. The approach to genetic studies of complex traits entails candidate gene or Genome Wide Association Studies (GWAS). GWAS provides an unbiased survey of the effects of common genetic variants (common disease - common variant hypothesis). GWAS have led to identification of a large number of alleles for various cardiovascular diseases. However, common alleles account for a relatively small fraction of the total heritability of the traits. Accordingly, the focus has shifted toward identification of rare variants that might impart larger effect sizes (rare variant-common disease hypothesis). This shift is made feasible by recent advances in massively parallel DNA sequencing platforms, which afford the opportunity to identify virtually all common as well as rare alleles in individuals. In this review, we discuss various strategies that are used to delineate the genetic contribution to medically important cardiovascular phenotypes, emphasizing the utility of the new deep sequencing approaches.
doi:10.1161/CIRCRESAHA.110.236067
PMCID: PMC3115927
PMID: 21566222
Genetics; Next-Generation Sequencing; Complex traits; Polymorphism
Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.
doi:10.4137/BBI.S8852
PMCID: PMC3273305
PMID: 22346348
association; common variant; haplotype; rare variant
The approach to molecular genetic studies of complex phenotypes has evolved considerably during the recent years. The candidate gene approach, restricted to analysis of a few single nucleotide polymorphisms (SNPs) in a modest number of cases and controls, has been supplanted by the unbiased approach of Genome-Wide Association Studies (GWAS), wherein a large number of tagger SNPs are typed in a large number of individuals. GWAS, which are designed upon the common disease- common variant hypothesis (CD-CV), have identified a large number of SNPs and loci for complex phenotypes. However, alleles identified through GWAS are typically not causative but rather in linkage disequilibrium (LD) with the true causal variants. The common alleles, which may not capture the uncommon and rare variants, account only for a fraction of heritability of the complex traits. Hence, the focus is being shifted to rare variants – common disease (RV-CD) hypothesis, surmising that rare variants exert large effect sizes on the phenotype. In conjunctional with this conceptual shift technological advances in DNA sequencing techniques have dramatically enhanced whole genome or whole exome sequencing capacity. The sequencing approach affords identification of not only the rare but also the common variants. The approach – whether used in complementation with GWAS or as a stand-alone approach - could define the genetic architecture of the complex phenotypes. Robust phenotyping and large-scale sequencing studies are essential to extract the information content of the vast number of DNA sequence variants (DSVs) in the genome. To garner meaningful clinical information and link the genotype to a phenotype, identification and characterization of a very large number of causal fields beyond the information content of DNA sequence variants would be necessary. This review provides an update on the current progress and limitations in identifying DSVs that are associated with phenotypic effects.
doi:10.1016/j.trsl.2011.08.001
PMCID: PMC3259530
PMID: 22243791
Nagamine, Yoshitaka | Pong-Wong, Ricardo | Navarro, Pau | Vitart, Veronique | Hayward, Caroline | Rudan, Igor | Campbell, Harry | Wilson, James | Wild, Sarah | Hicks, Andrew A. | Pramstaller, Peter P. | Hastie, Nicholas | Wright, Alan F. | Haley, Chris S. | Moore, Stephen
The limited proportion of complex trait variance identified in genome-wide association studies may reflect the limited power of single SNP analyses to detect either rare causative alleles or those of small effect. Motivated by studies that demonstrate that loci contributing to trait variation may contain a number of different alleles, we have developed an analytical approach termed Regional Genomic Relationship Mapping that, like linkage-based family methods, integrates variance contributed by founder gametes within a pedigree. This approach takes advantage of very distant (and unrecorded) relationships, and this greatly increases the power of the method, compared with traditional pedigree-based linkage analyses. By integrating variance contributed by founder gametes in the population, our approach provides an estimate of the Regional Heritability attributable to a small genomic region (e.g. 100 SNP window covering ca. 1 Mb of DNA in a 300000 SNP GWAS) and has the power to detect regions containing multiple alleles that individually contribute too little variance to be detectable by GWAS as well as regions with single common GWAS-detectable SNPs. We use genome-wide SNP array data to obtain both a genome-wide relationship matrix and regional relationship (“identity by state" or IBS) matrices for sequential regions across the genome. We then estimate a heritability for each region sequentially in our genome-wide scan. We demonstrate by simulation and with real data that, when compared to traditional (“individual SNP") GWAS, our method uncovers new loci that explain additional trait variation. We analysed data from three Southern European populations and from Orkney for exemplar traits – serum uric acid concentration and height. We show that regional heritability estimates are correlated with results from genome-wide association analysis but can capture more of the genetic variance segregating in the population and identify additional trait loci.
doi:10.1371/journal.pone.0046501
PMCID: PMC3471913
PMID: 23077511
Background
Association studies are a promising way to uncover the genetic basis of complex traits in wild populations. Data on population stratification, linkage disequilibrium and distribution of variant effect-sizes for different trait-types are required to predict study success but are lacking for most taxa. We quantified and investigated the impacts of these key variables in a large-scale association study of a strongly selected trait of medical importance: pyrethroid resistance in the African malaria vector Anopheles gambiae.
Methodology/Principal Findings
We genotyped ≈1500 resistance-phenotyped wild mosquitoes from Ghana and Cameroon using a 1536-SNP array enriched for candidate insecticide resistance gene SNPs. Three factors greatly impacted study power. (1) Population stratification, which was attributable to co-occurrence of molecular forms (M and S), and cryptic within-form stratification necessitating both a partitioned analysis and genomic control. (2) All SNPs of substantial effect (odds ratio, OR>2) were rare (minor allele frequency, MAF<0.05). (3) Linkage disequilibrium (LD) was very low throughout most of the genome. Nevertheless, locally high LD, consistent with a recent selective sweep, and uniformly high ORs in each subsample facilitated significant direct and indirect detection of the known insecticide target site mutation kdr L1014F (OR≈6; P<10−6), but with resistance level modified by local haplotypic background.
Conclusion
Primarily as a result of very low LD in wild A. Gambiae, LD-based association mapping is challenging, but is feasible at least for major effect variants, especially where LD is enhanced by selective sweeps. Such variants will be of greatest importance for predictive diagnostic screening.
doi:10.1371/journal.pone.0013140
PMCID: PMC2956759
PMID: 20976111
Hsu, Yi-Hsiang | Zillikens, M. Carola | Wilson, Scott G. | Farber, Charles R. | Demissie, Serkalem | Soranzo, Nicole | Bianchi, Estelle N. | Grundberg, Elin | Liang, Liming | Richards, J. Brent | Estrada, Karol | Zhou, Yanhua | van Nas, Atila | Moffatt, Miriam F. | Zhai, Guangju | Hofman, Albert | van Meurs, Joyce B. | Pols, Huibert A. P. | Price, Roger I. | Nilsson, Olle | Pastinen, Tomi | Cupples, L. Adrienne | Lusis, Aldons J. | Schadt, Eric E. | Ferrari, Serge | Uitterlinden, André G. | Rivadeneira, Fernando | Spector, Timothy D. | Karasik, David | Kiel, Douglas P. | Visscher, Peter M.
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6×10−8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6×10−13; SOX6, p = 6.4×10−10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
Author Summary
BMD and hip geometry are two major predictors of osteoporotic fractures, the most severe consequence of osteoporosis in elderly persons. We performed sex-specific genome-wide association studies (GWAS) for BMD at the lumbar spine and femor neck skeletal sites as well as hip geometric indices (NSA, NL, and NW) in the Framingham Osteoporosis Study and then replicated the top findings in two independent studies. Three novel loci were significant: in women, including chromosome 1p13.2 (RAP1A) for NW; in men, 2q11.2 (TBC1D8) for NSA and 18q11.2 (OSBPL1A) for NW. We confirmed a previously reported region on 8q24.12 (TNFRSF11B/OPG) for lumbar spine BMD in women. In addition, we integrated GWAS signals with eQTL in several tissues and publicly available expression signature profiling in cellular and whole-animal models, and prioritized 16 candidate genes/loci based on their potential involvement in skeletal metabolism. Among three prioritized loci (GPR177, SOX6, and CASR genes) associated with BMD in women, GPR177 and SOX6 have been successfully replicated later in a large-scale meta-analysis, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of using expression profiling to support the candidacy of suggestive GWAS signals that may contain important genes of interest.
doi:10.1371/journal.pgen.1000977
PMCID: PMC2883588
PMID: 20548944
Background
Grass carp (Ctenopharyngodon idella) belongs to the family Cyprinidae which includes more than 2000 fish species. It is one of the most important freshwater food fish species in world aquaculture. A linkage map is an essential framework for mapping traits of interest and is often the first step towards understanding genome evolution. The aim of this study is to construct a first generation genetic map of grass carp using microsatellites and SNPs to generate a new resource for mapping QTL for economically important traits and to conduct a comparative mapping analysis to shed new insights into the evolution of fish genomes.
Results
We constructed a first generation linkage map of grass carp with a mapping panel containing two F1 families including 192 progenies. Sixteen SNPs in genes and 263 microsatellite markers were mapped to twenty-four linkage groups (LGs). The number of LGs was corresponding to the haploid chromosome number of grass carp. The sex-specific map was 1149.4 and 888.8 cM long in females and males respectively whereas the sex-averaged map spanned 1176.1 cM. The average resolution of the map was 4.2 cM/locus. BLAST searches of sequences of mapped markers of grass carp against the whole genome sequence of zebrafish revealed substantial macrosynteny relationship and extensive colinearity of markers between grass carp and zebrafish.
Conclusions
The linkage map of grass carp presented here is the first linkage map of a food fish species based on co-dominant markers in the family Cyprinidae. This map provides a valuable resource for mapping phenotypic variations and serves as a reference to approach comparative genomics and understand the evolution of fish genomes and could be complementary to grass carp genome sequencing project.
doi:10.1186/1471-2164-11-135
PMCID: PMC2838847
PMID: 20181260
Genome-wide association (GWA) study aims to identify the genetic factors associated with the traits of interest. However, the power of GWA analysis has been seriously limited by the enormous number of markers tested. Recently, the gene set analysis (GSA) methods were introduced to GWA studies to address the association of gene sets that share common biological functions. GSA considerably increased the power of association analysis and successfully identified coordinated association patterns of gene sets. There have been several approaches in this direction with some limitations. Here, we present a general approach for GSA in GWA analysis and a stand-alone software GSA-SNP that implements three widely used GSA methods. GSA-SNP provides a fast computation and an easy-to-use interface. The software and test datasets are freely available at http://gsa.muldas.org. We provide an exemplary analysis on adult heights in a Korean population.
doi:10.1093/nar/gkq428
PMCID: PMC2896081
PMID: 20501604
Yamada, Kazuo | Iwayama, Yoshimi | Hattori, Eiji | Iwamoto, Kazuya | Toyota, Tomoko | Ohnishi, Tetsuo | Ohba, Hisako | Maekawa, Motoko | Kato, Tadafumi | Yoshikawa, Takeo | Hashimoto, Kenji
Schizophrenia is a devastating neuropsychiatric disorder with genetically complex traits. Genetic variants should explain a considerable portion of the risk for schizophrenia, and genome-wide association study (GWAS) is a potentially powerful tool for identifying the risk variants that underlie the disease. Here, we report the results of a three-stage analysis of three independent cohorts consisting of a total of 2,535 samples from Japanese and Chinese populations for searching schizophrenia susceptibility genes using a GWAS approach. Firstly, we examined 115,770 single nucleotide polymorphisms (SNPs) in 120 patient-parents trio samples from Japanese schizophrenia pedigrees. In stage II, we evaluated 1,632 SNPs (1,159 SNPs of p<0.01 and 473 SNPs of p<0.05 that located in previously reported linkage regions). The second sample consisted of 1,012 case-control samples of Japanese origin. The most significant p value was obtained for the SNP in the ELAVL2 [(embryonic lethal, abnormal vision, Drosophila)-like 2] gene located on 9p21.3 (p = 0.00087). In stage III, we scrutinized the ELAVL2 gene by genotyping gene-centric tagSNPs in the third sample set of 293 family samples (1,163 individuals) of Chinese descent and the SNP in the gene showed a nominal association with schizophrenia in Chinese population (p = 0.026). The current data in Asian population would be helpful for deciphering ethnic diversity of schizophrenia etiology.
doi:10.1371/journal.pone.0020468
PMCID: PMC3108953
PMID: 21674006
Background: Large-scale candidate-gene and genome-wide association studies genotype multiple SNPs within or surrounding a gene, including both tag and functional SNPs. The immense amount of data generated in these studies poses new challenges to analysis. One particularly challenging yet important question is how to best use all genetic information to test whether a gene or a region is associated with the trait of interest.
Methods: Here we propose a powerful gene-based Association Test by combining Optimally Weighted Markers (ATOM) within a genomic region. Due to variation in linkage disequilibrium, different markers often associate with the trait of interest at different levels. To appropriately apportion their contributions, we assign a weight to each marker that is proportional to the amount of information it captures about the trait locus. We analytically derive the optimal weights for both quantitative and binary traits, and describe a procedure for estimating the weights from a reference database such as the HapMap. Compared with existing approaches, our method has several distinct advantages, including (i) the ability to borrow information from an external database to increase power, (ii) the theoretical derivation of optimal marker weights and (iii) the scalability to simultaneous analysis of all SNPs in candidate genes and pathways.
Results: Through extensive simulations and analysis of the FTO gene in our ongoing genome-wide association study on childhood obesity, we demonstrate that ATOM increases the power to detect genetic association as compared with several commonly used multi-marker association tests.
Contact: mingyao@mail.med.upenn.edu; chun.li@vanderbilt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btn641
PMCID: PMC2642636
PMID: 19074959
Compared to the conventional linkage mapping, linkage disequilibrium (LD)-mapping, using the nonrandom associations of loci in haplotypes, is a powerful high-resolution mapping tool for complex quantitative traits. The recent advances in the development of unbiased association mapping approaches for plant population with their successful applications in dissecting a number of simple to complex traits in many crop species demonstrate a flourish of the approach as a “powerful gene tagging” tool for crops in the plant genomics era of 21st century. The goal of this review is to provide nonexpert readers of crop breeding community with (1) the basic concept, merits, and simple description of existing methodologies for an association mapping with the recent improvements for plant populations, and (2) the details of some of pioneer and recent studies on association mapping in various crop species to demonstrate the feasibility, success, problems, and future perspectives of the efforts in plants. This should be helpful for interested readers of international plant research community as a guideline for the basic understanding, choosing the appropriate methods, and its application.
doi:10.1155/2008/574927
PMCID: PMC2423417
PMID: 18551188
Nature
2009;461(7265):802-808.
Summary
Although autism is a highly heritable neurodevelopmental disorder, attempts to identify specific susceptibility genes have thus far met with limited success 1. Genome-wide association studies (GWAS) using half a million or more markers, particularly those with very large sample sizes achieved through meta-analysis, have shown great success in mapping genes for other complex genetic traits (http://www.genome.gov/26525384). Consequently, we initiated a linkage and association mapping study using half a million genome-wide SNPs in a common set of 1,031 multiplex autism families (1,553 affected offspring). We identified regions of suggestive and significant linkage on chromosomes 6q27 and 20p13, respectively. Initial analysis did not yield genome-wide significant associations; however, genotyping of top hits in additional families revealed a SNP on chromosome 5p15 (between SEMA5A and TAS2R1) that was significantly associated with autism (P = 2 × 10−7). We also demonstrated that expression of SEMA5A is reduced in brains from autistic patients, further implicating SEMA5A as an autism susceptibility gene. The linkage regions reported here provide targets for rare variation screening while the discovery of a single novel association demonstrates the action of common variants.
doi:10.1038/nature08490
PMCID: PMC2772655
PMID: 19812673
The genetic analysis of quantitative or complex traits has been based mainly on statistical quantities such as genetic variances and heritability. These analyses continue to be developed, for example in studies of natural populations. Genomic methods are having an impact on progress and prospects. Actual relationships of individuals can be estimated enabling novel quantitative analyses. Increasing precision of linkage mapping is feasible with dense marker panels and designed stocks allowing multiple generations of recombination, and large SNP panels enable the use of genome wide association analysis utilising historical recombination. Whilst such analyses are identifying many loci for disease genes and traits such as height, typically each individually contributes a small amount of the variation. Only by fitting all SNPs without regard to significance can a high proportion be accounted for, so a classical polygenic model with near infinitesimally small effects remains a useful one. Theory indicates that a high proportion of variants will have low minor allele frequency, making detection difficult. Genomic selection, based on simultaneously fitting very dense markers and incorporating these with phenotypic data in breeding value prediction is revolutionising breeding programmes in agriculture and has a major potential role in human disease prediction.
doi:10.2174/138920212800543110
PMCID: PMC3382274
PMID: 23115521
Complex traits; evolution; heritability; genetic variance; genome wide association; QTL; selection.
Background
The feasibility of effectively analyzing high-density single nucleotide polymorphism (SNP) maps in whole genome scans of complex traits is not known. The purpose of this study was to compare variance components linkage results using different density marker maps in data from the Collaborative Study on the Genetics of Alcoholism (COGA). Marker maps having an average spacing of 10 cM (microsatellite), 0.78 cM (SNP1), and 0.31 cM (SNP2) were used to identify quantitative trait loci (QTLs) affecting maximum number of alcoholic drinks consumed in a 24-hour period (lnmaxalc).
Results
Heritability of lnmaxalc was estimated to be 15%. Multipoint variance components linkage analysis revealed similar linkage patterns among the three marker panels, with the SNP maps consistently yielding higher LOD scores. Robust LOD scores > 1.0 were observed on chromosomes 1 and 13 for all three marker maps. Additional LODs > 1.0 were observed on chromosome 4 with both SNP maps and on chromosomes 18 and 21 with the SNP2 map. Peak LOD scores for lnmaxalc were observed on chromosome 1, although none reached genome-wide statistical significance. Quantile-quantile plots revealed that the multipoint distribution of SNP results appeared to fit the asymptotic null distribution better than the twopoint results.
Conclusion
In conclusion, variance-components linkage analysis using high-density SNP maps provided higher LOD scores compared with the standard microsatellite map, similar to studies using nonparametric linkage methods. Widespread application of SNP maps will depend on further improvements in the computational methods implemented in current software packages.
doi:10.1186/1471-2156-6-S1-S5
PMCID: PMC1866734
PMID: 16451661
Gene expression, as a heritable complex trait, has recently been used in many genome-wide linkage studies. The estimated overall heritability of each trait may be considered as evidence of a genetic contribution to the total phenotypic variation, which implies the possibility of mapping genome regions responsible for the gene expression variation via linkage analysis. However, heritability has been found to be an inconsistent predictor of significant linkage signals. To investigate this issue in human studies, we performed genome-wide linkage analysis on the 3554 gene expression traits of 194 Centre d'Etude du Polymorphisme Humain individuals provided by Genetic Analysis Workshop 15. Out of the 422 expression traits with significant linkage signals identified (LOD > 5.3), 89 traits have low estimated heritability (h2 < 10%), among which 23 traits have an estimated heritability equal to 0. The linkage analysis on individual pedigree shows that the overall LOD scores may result from a few pedigrees with strong linkage signals. Screening gene expressions before linkage analysis using a relatively low heritability (h2 < 20%) may result in a loss of significant linkage signals, especially for trans-acting expression quantitative trait loci (49%).
PMCID: PMC2367577
PMID: 18466589
Background
Genome-wide association studies (GWAS) based on linkage disequilibrium (LD) provide a promising tool for the detection and fine mapping of quantitative trait loci (QTL) underlying complex agronomic traits. In this study we explored the genetic basis of variation for the traits heading date, plant height, thousand grain weight, starch content and crude protein content in a diverse collection of 224 spring barleys of worldwide origin. The whole panel was genotyped with a customized oligonucleotide pool assay containing 1536 SNPs using Illumina's GoldenGate technology resulting in 957 successful SNPs covering all chromosomes. The morphological trait "row type" (two-rowed spike vs. six-rowed spike) was used to confirm the high level of selectivity and sensitivity of the approach. This study describes the detection of QTL for the above mentioned agronomic traits by GWAS.
Results
Population structure in the panel was investigated by various methods and six subgroups that are mainly based on their spike morphology and region of origin. We explored the patterns of linkage disequilibrium (LD) among the whole panel for all seven barley chromosomes. Average LD was observed to decay below a critical level (r2-value 0.2) within a map distance of 5-10 cM. Phenotypic variation within the panel was reasonably large for all the traits. The heritabilities calculated for each trait over multi-environment experiments ranged between 0.90-0.95. Different statistical models were tested to control spurious LD caused by population structure and to calculate the P-value of marker-trait associations. Using a mixed linear model with kinship for controlling spurious LD effects, we found a total of 171 significant marker trait associations, which delineate into 107 QTL regions. Across all traits these can be grouped into 57 novel QTL and 50 QTL that are congruent with previously mapped QTL positions.
Conclusions
Our results demonstrate that the described diverse barley panel can be efficiently used for GWAS of various quantitative traits, provided that population structure is appropriately taken into account. The observed significant marker trait associations provide a refined insight into the genetic architecture of important agronomic traits in barley. However, individual QTL account only for a small portion of phenotypic variation, which may be due to insufficient marker coverage and/or the elimination of rare alleles prior to analysis. The fact that the combined SNP effects fall short of explaining the complete phenotypic variance may support the hypothesis that the expression of a quantitative trait is caused by a large number of very small effects that escape detection. Notwithstanding these limitations, the integration of GWAS with biparental linkage mapping and an ever increasing body of genomic sequence information will facilitate the systematic isolation of agronomically important genes and subsequent analysis of their allelic diversity.
doi:10.1186/1471-2229-12-16
PMCID: PMC3349577
PMID: 22284310
Advances in pig gene identification, mapping and functional analysis have continued to make rapid progress. The porcine genetic linkage map now has nearly 3000 loci,
including several hundred genes, and is likely to expand considerably in the next few
years, with many more genes and amplified fragment length polymorphism (AFLP)
markers being added to the map. The physical genetic map is also growing rapidly
and has over 3000 genes and markers. Several recent quantitative trait loci (QTL)
scans and candidate gene analyses have identified important chromosomal regions
and individual genes associated with traits of economic interest. The commercial pig
industry is actively using this information and traditional performance information
to improve pig production by marker-assisted selection (MAS). Research to study
the co-expression of thousands of genes is now advancing and methods to combine
these approaches to aid in gene discovery are under way. The pig's role in
xenotransplantation and biomedical research makes the study of its genome important
for the study of human disease. This review will briefly describe advances made,
directions for future research and the implications for both the pig industry and
human health.
doi:10.1002/cfg.261
PMCID: PMC2447406
PMID: 18629119
Sammalisto, S | Hiekkalinna, T | Suviolahti, E | Sood, K | Metzidis, A | Pajukanta, P | Lilja, H | Soro-Paavonen, A | Taskinen, M | Tuomi, T | Almgren, P | Orho-Melander, M | Groop, L | Peltonen, L | Perola, M
Background: Many genome-wide scans aimed at complex traits have been statistically underpowered due to small sample size. Combining data from several genome-wide screens with comparable quantitative phenotype data should improve statistical power for the localisation of genomic regions contributing to these traits.
Objective: To perform a genome-wide screen for loci affecting adult stature by combined analysis of four previously performed genome-wide scans.
Methods: We developed a web based computer tool, Cartographer, for combining genetic marker maps which positions genetic markers accurately using the July 2003 release of the human genome sequence and the deCODE genetic map. Using Cartographer, we combined the primary genotype data from four genome-wide scans and performed variance components (VC) linkage analyses for human stature on the pooled dataset of 1417 individuals from 277 families and performed VC analyses for males and females separately.
Results: We found significant linkage to stature on 1p21 (multipoint LOD score 4.25) and suggestive linkages on 9p24 and 18q21 (multipoint LOD scores 2.57 and 2.39, respectively) in males-only analyses. We also found suggestive linkage to 4q35 and 22q13 (multipoint LOD scores 2.18 and 2.85, respectively) when we analysed both females and males and to 13q12 (multipoint LOD score 2.66) in females-only analyses.
Conclusions: We strengthened the evidence for linkage to previously reported quantitative trait loci (QTL) for stature and also found significant evidence of a novel male-specific QTL on 1p21. Further investigation of several interesting candidate genes in this region will help towards characterisation of this first sex-specific locus affecting human stature.
doi:10.1136/jmg.2005.031278
PMCID: PMC1735962
PMID: 15827092
Background
The genetic mechanisms underlying interindividual blood pressure variation reflect the complex interplay of both genetic and environmental variables. The current standard statistical methods for detecting genes involved in the regulation mechanisms of complex traits are based on univariate analysis. Few studies have focused on the search for and understanding of quantitative trait loci responsible for gene × environmental interactions or multiple trait analysis. Composite interval mapping has been extended to multiple traits and may be an interesting approach to such a problem.
Methods
We used multiple-trait analysis for quantitative trait locus mapping of loci having different effects on systolic blood pressure with NaCl exposure. Animals studied were 188 rats, the progenies of an F2 rat intercross between the hypertensive and normotensive strain, genotyped in 179 polymorphic markers across the rat genome. To accommodate the correlational structure from measurements taken in the same animals, we applied univariate and multivariate strategies for analyzing the data.
Results
We detected a new quantitative train locus on a region close to marker R589 in chromosome 5 of the rat genome, not previously identified through serial analysis of individual traits. In addition, we were able to justify analytically the parametric restrictions in terms of regression coefficients responsible for the gain in precision with the adopted analytical approach.
Conclusion
Future work should focus on fine mapping and the identification of the causative variant responsible for this quantitative trait locus signal. The multivariable strategy might be valuable in the study of genetic determinants of interindividual variation of antihypertensive drug effectiveness.
doi:10.1186/1471-2350-7-47
PMCID: PMC1522018
PMID: 16716221