PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1104861)

Clipboard (0)
None

Related Articles

1.  Adjustment for local ancestry in genetic association analysis of admixed populations 
Bioinformatics  2010;27(5):670-677.
Motivation: Admixed populations offer a unique opportunity for mapping diseases that have large disease allele frequency differences between ancestral populations. However, association analysis in such populations is challenging because population stratification may lead to association with loci unlinked to the disease locus.
Methods and results: We show that local ancestry at a test single nucleotide polymorphism (SNP) may confound with the association signal and ignoring it can lead to spurious association. We demonstrate theoretically that adjustment for local ancestry at the test SNP is sufficient to remove the spurious association regardless of the mechanism of population stratification, whether due to local or global ancestry differences among study subjects; however, global ancestry adjustment procedures may not be effective. We further develop two novel association tests that adjust for local ancestry. Our first test is based on a conditional likelihood framework which models the distribution of the test SNP given disease status and flanking marker genotypes. A key advantage of this test lies in its ability to incorporate different directions of association in the ancestral populations. Our second test, which is computationally simpler, is based on logistic regression, with adjustment for local ancestry proportion. We conducted extensive simulations and found that the Type I error rates of our tests are under control; however, the global adjustment procedures yielded inflated Type I error rates when stratification is due to local ancestry difference.
Contact: mingyao@upenn.edu; chun.li@vanderbilt.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq709
PMCID: PMC3042179  PMID: 21169375
2.  Enriching targeted sequencing experiments for rare disease alleles 
Bioinformatics  2011;27(15):2112-2118.
Motivation: Next-generation targeted resequencing of genome-wide association study (GWAS)-associated genomic regions is a common approach for follow-up of indirect association of common alleles. However, it is prohibitively expensive to sequence all the samples from a well-powered GWAS study with sufficient depth of coverage to accurately call rare genotypes. As a result, many studies may use next-generation sequencing for single nucleotide polymorphism (SNP) discovery in a smaller number of samples, with the intent to genotype candidate SNPs with rare alleles captured by resequencing. This approach is reasonable, but may be inefficient for rare alleles if samples are not carefully selected for the resequencing experiment.
Results: We have developed a probability-based approach, SampleSeq, to select samples for a targeted resequencing experiment that increases the yield of rare disease alleles substantially over random sampling of cases or controls or sampling based on genotypes at associated SNPs from GWAS data. This technique allows for smaller sample sizes for resequencing experiments, or allows the capture of rarer risk alleles. When following up multiple regions, SampleSeq selects subjects with an even representation of all the regions. SampleSeq also can be used to calculate the sample size needed for the resequencing to increase the chance of successful capture of rare alleles of desired frequencies.
Software: http://biostat.mc.vanderbilt.edu/SampleSeq
Contact: chun.li@vanderbilt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr324
PMCID: PMC3137214  PMID: 21700677
3.  dmGWAS: dense module searching for genome-wide association studies in protein–protein interaction networks 
Bioinformatics  2010;27(1):95-102.
Motivation: An important question that has emerged from the recent success of genome-wide association studies (GWAS) is how to detect genetic signals beyond single markers/genes in order to explore their combined effects on mediating complex diseases and traits. Integrative testing of GWAS association data with that from prior-knowledge databases and proteome studies has recently gained attention. These methodologies may hold promise for comprehensively examining the interactions between genes underlying the pathogenesis of complex diseases.
Methods: Here, we present a dense module searching (DMS) method to identify candidate subnetworks or genes for complex diseases by integrating the association signal from GWAS datasets into the human protein–protein interaction (PPI) network. The DMS method extensively searches for subnetworks enriched with low P-value genes in GWAS datasets. Compared with pathway-based approaches, this method introduces flexibility in defining a gene set and can effectively utilize local PPI information.
Results: We implemented the DMS method in an R package, which can also evaluate and graphically represent the results. We demonstrated DMS in two GWAS datasets for complex diseases, i.e. breast cancer and pancreatic cancer. For each disease, the DMS method successfully identified a set of significant modules and candidate genes, including some well-studied genes not detected in the single-marker analysis of GWA studies. Functional enrichment analysis and comparison with previously published methods showed that the genes we identified by DMS have higher association signal.
Availability: dmGWAS package and documents are available at http://bioinfo.mc.vanderbilt.edu/dmGWAS.html.
Contact: zhongming.zhao@vanderbilt.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq615
PMCID: PMC3008643  PMID: 21045073
4.  Correcting population stratification in genetic association studies using a phylogenetic approach 
Bioinformatics  2010;26(6):798-806.
Motivation: The rapid development of genotyping technology and extensive cataloguing of single nucleotide polymorphisms (SNPs) across the human genome have made genetic association studies the mainstream for gene mapping of complex human diseases. For many diseases, the most practical approach is the population-based design with unrelated individuals. Although having the advantages of easier sample collection and greater power than family-based designs, unrecognized population stratification in the study samples can lead to both false-positive and false-negative findings and might obscure the true association signals if not appropriately corrected.
Methods: We report PHYLOSTRAT, a new method that corrects for population stratification by combining phylogeny constructed from SNP genotypes and principal coordinates from multi-dimensional scaling (MDS) analysis. This hybrid approach efficiently captures both discrete and admixed population structures.
Results: By extensive simulations, the analysis of a synthetic genome-wide association dataset created using data from the Human Genome Diversity Project, and the analysis of a lactase-height dataset, we show that our method can correct for population stratification more efficiently than several existing population stratification correction methods, including EIGENSTRAT, a hybrid approach based on MDS and clustering, and STRATSCORE , in terms of requiring fewer random SNPs for inference of population structure. By combining the flexibility and hierarchical nature of phylogenetic trees with the advantage of representing admixture using MDS, our hybrid approach can capture the complex population structures in human populations effectively.
Software Availability: Codes can be downloaded from http://people.pcbi.upenn.edu/∼lswang/phylostrat/
Contact: mingyao@upenn.edu; iswang@upenn.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq025
PMCID: PMC2832820  PMID: 20097913
5.  Appetite regulation genes are associated with body mass index in black South African adolescents: a genetic association study 
BMJ Open  2012;2(3):e000873.
Background
Obesity is a complex trait with both environmental and genetic contributors. Genome-wide association studies have identified several variants that are robustly associated with obesity and body mass index (BMI), many of which are found within genes involved in appetite regulation. Currently, genetic association data for obesity are lacking in Africans—a single genome-wide association study and a few replication studies have been published in West Africa, but none have been performed in a South African population.
Objective
To assess the association of candidate loci with BMI in black South Africans. The authors focused on single nucleotide polymorphisms (SNPs) in the FTO, LEP, LEPR, MC4R, NPY2R and POMC genes.
Design
A genetic association study.
Participants
990 randomly selected individuals from the larger Birth to Twenty cohort (a longitudinal birth cohort study of health and development in Africans).
Measures
The authors genotyped 44 SNPs within the six candidate genes that included known BMI-associated SNPs and tagSNPs based on linkage disequilibrium in an African population for FTO, LEP and NPY2R. To assess population substructure, the authors included 18 ancestry informative markers. Weight, height, sex, sex-specific pubertal stage and exact age collected during adolescence (13 years) were used to identify loci that predispose to obesity early in life.
Results
Sex, sex-specific pubertal stage and exact age together explain 14.3% of the variation in log(BMI) at age 13. After adjustment for these factors, four SNPs were individually significantly associated with BMI: FTO rs17817449 (p=0.022), LEP rs10954174 (p=0.0004), LEP rs6966536 (p=0.012) and MC4R rs17782313 (p=0.045). Together the four SNPs account for 2.1% of the variation in log(BMI). Each risk allele was associated with an estimated average increase of 2.5% in BMI.
Conclusions
The study highlighted SNPs in FTO and MC4R as potential genetic markers of obesity risk in South Africans. The association with two SNPs in the 3′ untranslated region of the LEP gene is novel.
Article summary
Article focus
This is a replication study aiming to reproduce BMI association findings from European cohorts in a South African population.
This study focused on genes linked to appetite control that were previously reported to show association with BMI or obesity and included FTO, LEP, LEPR, MC4R, NPY2R and POMC.
Adolescent data were used to facilitate the identification of genetic loci that predispose to obesity early in life, as it is known that overweight/obese children have an elevated risk of becoming obese adults.
Key messages
We found four SNPs were individually significantly associated with BMI: FTO rs17817449 (p=0.022), LEP rs10954174 (p=0.0004), LEP rs6966536 (p=0.012) and MC4R rs17782313 (p=0.045).
Together the four SNPs account for 2.1% of the variation in log(BMI).
We also demonstrated that an accumulation of risk alleles is linked to a significant increase in BMI—individuals with seven risk alleles had an 11.0% increase in median BMI compared with those with two risk alleles.
Strengths and limitations of this study
This study provides the first preliminary evidence of the role of genetic variants in obesity risk in an adolescent black South African population.
This study was only moderately powered to detect association with BMI, and not all genes were exhaustively investigated.
TagSNP selection would have been enhanced if South African data were available for this approach.
doi:10.1136/bmjopen-2012-000873
PMCID: PMC3358621  PMID: 22614171
6.  A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases–schizophrenia as a case 
Bioinformatics  2009;25(19):2595-6602.
Motivation: During the past decade, we have seen an exponential growth of vast amounts of genetic data generated for complex disease studies. Currently, across a variety of complex biological problems, there is a strong trend towards the integration of data from multiple sources. So far, candidate gene prioritization approaches have been designed for specific purposes, by utilizing only some of the available sources of genetic studies, or by using a simple weight scheme. Specifically to psychiatric disorders, there has been no prioritization approach that fully utilizes all major sources of experimental data.
Results: Here we present a multi-dimensional evidence-based candidate gene prioritization approach for complex diseases and demonstrate it in schizophrenia. In this approach, we first collect and curate genetic studies for schizophrenia from four major categories: association studies, linkage analyses, gene expression and literature search. Genes in these data sets are initially scored by category-specific scoring methods. Then, an optimal weight matrix is searched by a two-step procedure (core genes and unbiased P-values in independent genome-wide association studies). Finally, genes are prioritized by their combined scores using the optimal weight matrix. Our evaluation suggests this approach generates prioritized candidate genes that are promising for further analysis or replication. The approach can be applied to other complex diseases.
Availability: The collected data, prioritized candidate genes, and gene prioritization tools are freely available at http://bioinfo.mc.vanderbilt.edu/SZGR/.
Contact: zhongming.zhao@vanderbilt.edu
Supplementary information:Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp428
PMCID: PMC2752609  PMID: 19602527
7.  An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies 
Bioinformatics  2011;27(5):686-692.
Motivation: In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models.
Results: The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS.
Availability: The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.
Contact: lily.wang@vanderbilt.edu; zhongming.zhao@vanderbilt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq728
PMCID: PMC3042187  PMID: 21266443
8.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits 
PLoS Genetics  2007;3(7):e114.
We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate (“impute”) unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.
Author Summary
Ongoing association studies are evaluating the influence of genetic variation on phenotypes of interest (hereditary traits and susceptibility to disease) in large patient samples. However, although genotyping is relatively cheap, most association studies genotype only a small proportion of SNPs in the region of study, with many SNPs remaining untyped. Here, we present methods for assessing whether these untyped SNPs are associated with the phenotype of interest. The methods exploit information on patterns of multi-marker correlation (“linkage disequilibrium”) from publically available databases, such as the International HapMap project or the SeattleSNPs resequencing studies, to estimate (“impute”) patient genotypes at untyped SNPs, and assess the estimated genotypes for association with phenotype. We show that, particularly for common causal variants, these methods are highly effective. Compared with standard methods, they provide both greater power to detect associations between genetic variation and phenotypes, and also better explanations of detected associations, in many cases closely approximating results that would have been obtained by genotyping all SNPs.
doi:10.1371/journal.pgen.0030114
PMCID: PMC1934390  PMID: 17676998
9.  Gene-based interaction analysis by incorporating external linkage disequilibrium information 
Gene–gene interactions have an important role in complex human diseases. Detection of gene–gene interactions has long been a challenge due to their complexity. The standard method aiming at detecting SNP–SNP interactions may be inadequate as it does not model linkage disequilibrium (LD) among SNPs in each gene and may lose power due to a large number of comparisons. To improve power, we propose a principal component (PC)-based framework for gene-based interaction analysis. We analytically derive the optimal weight for both quantitative and binary traits based on pairwise LD information. We then use PCs to summarize the information in each gene and test for interactions between the PCs. We further extend this gene-based interaction analysis procedure to allow the use of imputation dosage scores obtained from a popular imputation software package, MACH, which incorporates multilocus LD information. To evaluate the performance of the gene-based interaction tests, we conducted extensive simulations under various settings. We demonstrate that gene-based interaction tests are more powerful than SNP-based tests when more than two variants interact with each other; moreover, tests that incorporate external LD information are generally more powerful than those that use genotyped markers only. We also apply the proposed gene-based interaction tests to a candidate gene study on high-density lipoprotein. As our method operates at the gene level, it can be applied to a genome-wide association setting and used as a screening tool to detect gene–gene interactions.
doi:10.1038/ejhg.2010.164
PMCID: PMC3025792  PMID: 20924406
gene–gene interaction; linkage disequilibrium; imputation
10.  DNA polymorphisms and haplotype patterns of transcription factors involved in barley endosperm development are associated with key agronomic traits 
BMC Plant Biology  2010;10:5.
Background
Association mapping is receiving considerable attention in plant genetics for its potential to fine map quantitative trait loci (QTL), validate candidate genes, and identify alleles of interest. In the present study association mapping in barley (Hordeum vulgare L.) is investigated by associating DNA polymorphisms with variation in grain quality traits, plant height, and flowering time to gain further understanding of gene functions involved in the control of these traits. We focused on the four loci BLZ1, BLZ2, BPBF and HvGAMYB that play a role in the regulation of B-hordein expression, the major fraction of the barley storage protein. The association was tested in a collection of 224 spring barley accessions using a two-stage mixed model approach.
Results
Within the sequenced fragments of four candidate genes we observed different levels of nucleotide diversity. The effect of selection on the candidate genes was tested by Tajima's D which revealed significant values for BLZ1, BLZ2, and BPBF in the subset of two-rowed barleys. Pair-wise LD estimates between the detected SNPs within each candidate gene revealed different intra-genic linkage patterns. On the basis of a more extensive examination of genomic regions surrounding the four candidate genes we found a sharp decrease of LD (r2<0.2 within 1 cM) in all but one flanking regions.
Significant marker-trait associations between SNP sites within BLZ1 and flowering time, BPBF and crude protein content and BPBF and starch content were detected. Most haplotypes occurred at frequencies <0.05 and therefore were rejected from the association analysis. Based on haplotype information, BPBF was associated to crude protein content and starch content, BLZ2 showed association to thousand-grain weight and BLZ1 was found to be associated with flowering time and plant height.
Conclusions
Differences in nucleotide diversity and LD pattern within the candidate genes BLZ1, BLZ2, BPBF, and HvGAMYB reflect the impact of selection on the nucleotide sequence of the four candidate loci.
Despite significant associations, the analysed candidate genes only explained a minor part of the total genetic variation although they are known to be important factors influencing the expression of seed quality traits. Therefore, we assume that grain quality as well as plant height and flowering time are influenced by many factors each contributing a small part to the expression of the phenotype. A genome-wide association analysis could provide a more comprehensive picture of loci involved in the regulation of grain quality, thousand grain weight and the other agronomic traits that were analyzed in this study. However, despite available high-throughput genotyping arrays the marker density along the barely genome is still insufficient to cover all associations in a whole genome scan. Therefore, the candidate gene-based approach will further play an important role in barley association studies.
doi:10.1186/1471-2229-10-5
PMCID: PMC2822787  PMID: 20064201
11.  Genome wide association studies for body conformation traits in the Chinese Holstein cattle population 
BMC Genomics  2013;14:897.
Background
Genome-wide association study (GWAS) is a powerful tool for revealing the genetic basis of quantitative traits. However, studies using GWAS for conformation traits of cattle is comparatively less. This study aims to use GWAS to find the candidates genes for body conformation traits.
Results
The Illumina BovineSNP50 BeadChip was used to identify single nucleotide polymorphisms (SNPs) that are associated with body conformation traits. A least absolute shrinkage and selection operator (LASSO) was applied to detect multiple SNPs simultaneously for 29 body conformation traits with 1,314 Chinese Holstein cattle and 52,166 SNPs. Totally, 59 genome-wide significant SNPs associated with 26 conformation traits were detected by genome-wide association analysis; five SNPs were within previously reported QTL regions (Animal Quantitative Trait Loci (QTL) database) and 11 were very close to the reported SNPs. Twenty-two SNPs were located within annotated gene regions, while the remainder were 0.6–826 kb away from known genes. Some of the genes had clear biological functions related to conformation traits. By combining information about the previously reported QTL regions and the biological functions of the genes, we identified DARC, GAS1, MTPN, HTR2A, ZNF521, PDIA6, and TMEM130 as the most promising candidate genes for capacity and body depth, chest width, foot angle, angularity, rear leg side view, teat length, and animal size traits, respectively. We also found four SNPs that affected four pairs of traits, and the genetic correlation between each pair of traits ranged from 0.35 to 0.86, suggesting that these SNPs may have a pleiotropic effect on each pair of traits.
Conclusions
A total of 59 significant SNPs associated with 26 conformation traits were identified in the Chinese Holstein population. Six promising candidate genes were suggested, and four SNPs showed genetic correlation for four pairs of traits.
doi:10.1186/1471-2164-14-897
PMCID: PMC3879203  PMID: 24341352
Dairy cattle; GWAS; Body conformation traits; SNP; Holstein; QTL
12.  Screening and Replication using the Same Data Set: Testing Strategies for Family-Based Studies in which All Probands Are Affected 
PLoS Genetics  2008;4(9):e1000197.
For genome-wide association studies in family-based designs, we propose a powerful two-stage testing strategy that can be applied in situations in which parent-offspring trio data are available and all offspring are affected with the trait or disease under study. In the first step of the testing strategy, we construct estimators of genetic effect size in the completely ascertained sample of affected offspring and their parents that are statistically independent of the family-based association/transmission disequilibrium tests (FBATs/TDTs) that are calculated in the second step of the testing strategy. For each marker, the genetic effect is estimated (without requiring an estimate of the SNP allele frequency) and the conditional power of the corresponding FBAT/TDT is computed. Based on the power estimates, a weighted Bonferroni procedure assigns an individually adjusted significance level to each SNP. In the second stage, the SNPs are tested with the FBAT/TDT statistic at the individually adjusted significance levels. Using simulation studies for scenarios with up to 1,000,000 SNPs, varying allele frequencies and genetic effect sizes, the power of the strategy is compared with standard methodology (e.g., FBATs/TDTs with Bonferroni correction). In all considered situations, the proposed testing strategy demonstrates substantial power increases over the standard approach, even when the true genetic model is unknown and must be selected based on the conditional power estimates. The practical relevance of our methodology is illustrated by an application to a genome-wide association study for childhood asthma, in which we detect two markers meeting genome-wide significance that would not have been detected using standard methodology.
Author Summary
The current state of genotyping technology has enabled researchers to conduct genome-wide association studies of up to 1,000,000 SNPs, allowing for systematic scanning of the genome for variants that might influence the development and progression of complex diseases. One of the largest obstacles to the successful detection of such variants is the multiple comparisons/testing problem in the genetic association analysis. For family-based designs in which all offspring are affected with the disease/trait under study, we developed a methodology that addresses this problem by partitioning the family-based data into two statistically independent components. The first component is used to screen the data and determine the most promising SNPs. The second component is used to test the SNPs for association, where information from the screening is used to weight the SNPs during testing. This methodology is more powerful than standard procedures for multiple comparisons adjustment (i.e., Bonferroni correction). Additionally, as only one data set is required for screening and testing, our testing strategy is less susceptible to study heterogeneity. Finally, as many family-based studies collect data only from affected offspring, this method addresses a major limitation of previous methodologies for multiple comparisons in family-based designs, which require variation in the disease/trait among offspring.
doi:10.1371/journal.pgen.1000197
PMCID: PMC2529406  PMID: 18802462
13.  DNA sequence polymorphisms within the bovine guanine nucleotide-binding protein Gs subunit alpha (Gsα)-encoding (GNAS) genomic imprinting domain are associated with performance traits 
BMC Genetics  2011;12:4.
Background
Genes which are epigenetically regulated via genomic imprinting can be potential targets for artificial selection during animal breeding. Indeed, imprinted loci have been shown to underlie some important quantitative traits in domestic mammals, most notably muscle mass and fat deposition. In this candidate gene study, we have identified novel associations between six validated single nucleotide polymorphisms (SNPs) spanning a 97.6 kb region within the bovine guanine nucleotide-binding protein Gs subunit alpha gene (GNAS) domain on bovine chromosome 13 and genetic merit for a range of performance traits in 848 progeny-tested Holstein-Friesian sires. The mammalian GNAS domain consists of a number of reciprocally-imprinted, alternatively-spliced genes which can play a major role in growth, development and disease in mice and humans. Based on the current annotation of the bovine GNAS domain, four of the SNPs analysed (rs43101491, rs43101493, rs43101485 and rs43101486) were located upstream of the GNAS gene, while one SNP (rs41694646) was located in the second intron of the GNAS gene. The final SNP (rs41694656) was located in the first exon of transcripts encoding the putative bovine neuroendocrine-specific protein NESP55, resulting in an aspartic acid-to-asparagine amino acid substitution at amino acid position 192.
Results
SNP genotype-phenotype association analyses indicate that the single intronic GNAS SNP (rs41694646) is associated (P ≤ 0.05) with a range of performance traits including milk yield, milk protein yield, the content of fat and protein in milk, culled cow carcass weight and progeny carcass conformation, measures of animal body size, direct calving difficulty (i.e. difficulty in calving due to the size of the calf) and gestation length. Association (P ≤ 0.01) with direct calving difficulty (i.e. due to calf size) and maternal calving difficulty (i.e. due to the maternal pelvic width size) was also observed at the rs43101491 SNP. Following adjustment for multiple-testing, significant association (q ≤ 0.05) remained between the rs41694646 SNP and four traits (animal stature, body depth, direct calving difficulty and milk yield) only. Notably, the single SNP in the bovine NESP55 gene (rs41694656) was associated (P ≤ 0.01) with somatic cell count--an often-cited indicator of resistance to mastitis and overall health status of the mammary system--and previous studies have demonstrated that the chromosomal region to where the GNAS domain maps underlies an important quantitative trait locus for this trait. This association, however, was not significant after adjustment for multiple testing. The three remaining SNPs assayed were not associated with any of the performance traits analysed in this study. Analysis of all pairwise linkage disequilibrium (r2) values suggests that most allele substitution effects for the assayed SNPs observed are independent. Finally, the polymorphic coding SNP in the putative bovine NESP55 gene was used to test the imprinting status of this gene across a range of foetal bovine tissues.
Conclusions
Previous studies in other mammalian species have shown that DNA sequence variation within the imprinted GNAS gene cluster contributes to several physiological and metabolic disorders, including obesity in humans and mice. Similarly, the results presented here indicate an important role for the imprinted GNAS cluster in underlying complex performance traits in cattle such as animal growth, calving, fertility and health. These findings suggest that GNAS domain-associated polymorphisms may serve as important genetic markers for future livestock breeding programs and support previous studies that candidate imprinted loci may act as molecular targets for the genetic improvement of agricultural populations. In addition, we present new evidence that the bovine NESP55 gene is epigenetically regulated as a maternally expressed imprinted gene in placental and intestinal tissues from 8-10 week old bovine foetuses.
doi:10.1186/1471-2156-12-4
PMCID: PMC3025900  PMID: 21214909
14.  Association, effects and validation of polymorphisms within the NCAPG - LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle 
BMC Genetics  2011;12:103.
Background
In a previously reported genome-wide association study based on a high-density bovine SNP genotyping array, 8 SNP were nominally associated (P ≤ 0.003) with average daily gain (ADG) and 3 of these were also associated (P ≤ 0.002) with average daily feed intake (ADFI) in a population of crossbred beef cattle. The SNP were clustered in a 570 kb region around 38 Mb on the draft sequence of bovine chromosome 6 (BTA6), an interval containing several positional and functional candidate genes including the bovine LAP3, NCAPG, and LCORL genes. The goal of the present study was to develop and examine additional markers in this region to optimize the ability to distinguish favorable alleles, with potential to identify functional variation.
Results
Animals from the original study were genotyped for 47 SNP within or near the gene boundaries of the three candidate genes. Sixteen markers in the NCAPG-LCORL locus displayed significant association with both ADFI and ADG even after stringent correction for multiple testing (P ≤ 005). These markers were evaluated for their effects on meat and carcass traits. The alleles associated with higher ADFI and ADG were also associated with higher hot carcass weight (HCW) and ribeye area (REA), and lower adjusted fat thickness (AFT). A reduced set of markers was genotyped on a separate, crossbred population including genetic contributions from 14 beef cattle breeds. Two of the markers located within the LCORL gene locus remained significant for ADG (P ≤ 0.04).
Conclusions
Several markers within the NCAPG-LCORL locus were significantly associated with feed intake and body weight gain phenotypes. These markers were also associated with HCW, REA and AFT suggesting that they are involved with lean growth and reduced fat deposition. Additionally, the two markers significant for ADG in the validation population of animals may be more robust for the prediction of ADG and possibly the correlated trait ADFI, across multiple breeds and populations of cattle.
doi:10.1186/1471-2156-12-103
PMCID: PMC3287254  PMID: 22168586
15.  DNA sequence polymorphisms in a panel of eight candidate bovine imprinted genes and their association with performance traits in Irish Holstein-Friesian cattle 
BMC Genetics  2010;11:93.
Background
Studies in mice and humans have shown that imprinted genes, whereby expression from one of the two parentally inherited alleles is attenuated or completely silenced, have a major effect on mammalian growth, metabolism and physiology. More recently, investigations in livestock species indicate that genes subject to this type of epigenetic regulation contribute to, or are associated with, several performance traits, most notably muscle mass and fat deposition. In the present study, a candidate gene approach was adopted to assess 17 validated single nucleotide polymorphisms (SNPs) and their association with a range of performance traits in 848 progeny-tested Irish Holstein-Friesian artificial insemination sires. These SNPs are located proximal to, or within, the bovine orthologs of eight genes (CALCR, GRB10, PEG3, PHLDA2, RASGRF1, TSPAN32, ZIM2 and ZNF215) that have been shown to be imprinted in cattle or in at least one other mammalian species (i.e. human/mouse/pig/sheep).
Results
Heterozygosities for all SNPs analysed ranged from 0.09 to 0.46 and significant deviations from Hardy-Weinberg proportions (P ≤ 0.01) were observed at four loci. Phenotypic associations (P ≤ 0.05) were observed between nine SNPs proximal to, or within, six of the eight analysed genes and a number of performance traits evaluated, including milk protein percentage, somatic cell count, culled cow and progeny carcass weight, angularity, body conditioning score, progeny carcass conformation, body depth, rump angle, rump width, animal stature, calving difficulty, gestation length and calf perinatal mortality. Notably, SNPs within the imprinted paternally expressed gene 3 (PEG3) gene cluster were associated (P ≤ 0.05) with calving, calf performance and fertility traits, while a single SNP in the zinc finger protein 215 gene (ZNF215) was associated with milk protein percentage (P ≤ 0.05), progeny carcass weight (P ≤ 0.05), culled cow carcass weight (P ≤ 0.01), angularity (P ≤ 0.01), body depth (P ≤ 0.01), rump width (P ≤ 0.01) and animal stature (P ≤ 0.01).
Conclusions
Of the eight candidate bovine imprinted genes assessed, DNA sequence polymorphisms in six of these genes (CALCR, GRB10, PEG3, RASGRF1, ZIM2 and ZNF215) displayed associations with several of the phenotypes included for analyses. The genotype-phenotype associations detected here are further supported by the biological function of these six genes, each of which plays important roles in mammalian growth, development and physiology. The associations between SNPs within the imprinted PEG3 gene cluster and traits related to calving, calf performance and gestation length suggest that this domain on chromosome 18 may play a role regulating pre-natal growth and development and fertility. SNPs within the bovine ZNF215 gene were associated with bovine growth and body conformation traits and studies in humans have revealed that the human ZNF215 ortholog belongs to the imprinted gene cluster associated with Beckwith-Wiedemann syndrome--a genetic disorder characterised by growth abnormalities. Similarly, the data presented here suggest that the ZNF215 gene may have an important role in regulating bovine growth. Collectively, our results support previous work showing that (candidate) imprinted genes/loci contribute to heritable variation in bovine performance traits and suggest that DNA sequence polymorphisms within these genes/loci represents an important reservoir of genomic markers for future genetic improvement of dairy and beef cattle populations.
doi:10.1186/1471-2156-11-93
PMCID: PMC2965127  PMID: 20942903
16.  Genome-wide association analyses for carcass quality in crossbred beef cattle 
BMC Genetics  2013;14:80.
Background
Genetic improvement of beef quality will benefit both producers and consumers, and can be achieved by selecting animals that carry desired quantitative trait nucleotides (QTN), which result from intensive searches using genetic markers. This paper presents a genome-wide association approach utilizing single nucleotide polymorphisms (SNP) in the Illumina BovineSNP50 BeadChip to seek genomic regions that potentially harbor genes or QTN underlying variation in carcass quality of beef cattle.
This study used 747 genotyped animals, mainly crossbred, with phenotypes on twelve carcass quality traits, including hot carcass weight (HCW), back fat thickness (BF), Longissimus dorsi muscle area or ribeye area (REA), marbling scores (MRB), lean yield grade by Beef Improvement Federation formulae (BIFYLD), steak tenderness by Warner-Bratzler shear force 7-day post-mortem (LM7D) as well as body composition as determined by partial rib (IMPS 103) dissection presented as a percentage of total rib weight including body cavity fat (BDFR), lean (LNR), bone (BNR), intermuscular fat (INFR), subcutaneous fat (SQFR), and total fat (TLFR).
Results
At the genome wide level false discovery rate (FDR < 10%), eight SNP were found significantly associated with HCW. Seven of these SNP were located on Bos taurus autosome (BTA) 6. At a less stringent significance level (P < 0.001), 520 SNP were found significantly associated with mostly individual traits (473 SNP), and multiple traits (47 SNP). Of these significant SNP, 48 were located on BTA6, and 22 of them were in association with hot carcass weight. There were 53 SNP associated with percentage of rib bone, and 12 of them were on BTA20. The rest of the significant SNP were scattered over other chromosomes. They accounted for 1.90 - 5.89% of the phenotypic variance of the traits. A region of approximately 4 Mbp long on BTA6 was found to be a potential area to harbor candidate genes influencing growth. One marker on BTA25 accounting for 2.67% of the variation in LM7D may be worth further investigation for the improvement of beef tenderness.
Conclusion
This study provides useful information to further assist the identification of chromosome regions and subsequently genes affecting carcass quality traits in beef cattle. It also revealed many SNP that acted pleiotropically to affect carcass quality. This knowledge is important in selecting subsets of SNP to improve the performance of beef cattle.
doi:10.1186/1471-2156-14-80
PMCID: PMC3827924  PMID: 24024930
Single nucleotide polymorphism; Chromosome regions; Beef carcass quality
17.  Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel 
PLoS Genetics  2014;10(9):e1004573.
Association mapping is a powerful approach for dissecting the genetic architecture of complex quantitative traits using high-density SNP markers in maize. Here, we expanded our association panel size from 368 to 513 inbred lines with 0.5 million high quality SNPs using a two-step data-imputation method which combines identity by descent (IBD) based projection and k-nearest neighbor (KNN) algorithm. Genome-wide association studies (GWAS) were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model (MLM) and a new method, the Anderson-Darling (A-D) test. Ten loci for five traits were identified using the MLM method at the Bonferroni-corrected threshold −log10 (P) >5.74 (α = 1). Many loci ranging from one to 34 loci (107 loci for plant height) were identified for 17 traits using the A-D test at the Bonferroni-corrected threshold −log10 (P) >7.05 (α = 0.05) using 556809 SNPs. Many known loci and new candidate loci were only observed by the A-D test, a few of which were also detected in independent linkage analysis. This study indicates that combining IBD based projection and KNN algorithm is an efficient imputation method for inferring large missing genotype segments. In addition, we showed that the A-D test is a useful complement for GWAS analysis of complex quantitative traits. Especially for traits with abnormal phenotype distribution, controlled by moderate effect loci or rare variations, the A-D test balances false positives and statistical power. The candidate SNPs and associated genes also provide a rich resource for maize genetics and breeding.
Author Summary
Genotype imputation has been used widely in the analysis of genome-wide association studies (GWAS) to boost power and fine-map associations. We developed a two-step data imputation method to meet the challenge of large proportion missing genotypes. GWAS have uncovered an extensive genetic architecture of complex quantitative traits using high-density SNP markers in maize in the past few years. Here, GWAS were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model and a new method, the Anderson-Darling (A-D) test. We intend to show that the A-D test is a complement to current GWAS methods, especially for complex quantitative traits controlled by moderate effect loci or rare variations and with abnormal phenotype distribution. In addition, the traits associated QTL identified here provide a rich resource for maize genetics and breeding.
doi:10.1371/journal.pgen.1004573
PMCID: PMC4161304  PMID: 25211220
18.  FastMap: Fast eQTL mapping in homozygous populations 
Bioinformatics  2008;25(4):482-489.
Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105–106, respectively.
Results: We propose a new method, FastMap, for fast and efficient eQTL mapping in homozygous inbred populations with binary allele calls. FastMap exploits the discrete nature and structure of the measured single nucleotide polymorphisms (SNPs). In particular, SNPs are organized into a Hamming distance-based tree that minimizes the number of arithmetic operations required to calculate the association of a SNP by making use of the association of its parent SNP in the tree. FastMap's tree can be used to perform both single marker mapping and haplotype association mapping over an m-SNP window. These performance enhancements also permit permutation-based significance testing.
Availability: The FastMap program and source code are available at the website: http://cebc.unc.edu/fastmap86.html
Contact: iir@unc.edu; nobel@email.unc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btn648
PMCID: PMC2642639  PMID: 19091771
19.  Identification of candidate genes, regions and markers for pre-harvest sprouting resistance in wheat (Triticum aestivum L.) 
BMC Plant Biology  2014;14(1):340.
Background
Pre-harvest sprouting (PHS) of wheat grain leads to a reduction in grain yield and quality. The availability of markers for marker-assisted selection (MAS) of PHS resistance will serve to enhance breeding selection and advancement of lines for cultivar development. The aim of this study was to identify candidate regions and develop molecular markers for PHS resistance in wheat. This was achieved via high density mapping of single nucleotide polymorphism (SNP) markers from an Illumina 90 K Infinium Custom Beadchip in a doubled haploid (DH) population derived from a RL4452/‘AC Domain’ cross and subsequent detection of quantitative trait loci (QTL) for PHS related traits (falling number [FN], germination index [GI] and sprouting index [SI]). SNP marker sequences flanking QTL were used to locate colinear regions in Brachypodium and rice, and identify genic markers associated with PHS resistance that can be utilized for MAS in wheat.
Results
A linkage map spanning 2569.4 cM was constructed with a total of 12,201 SNP, simple sequence repeat (SSR), diversity arrays technology (DArT) and expressed sequence tag (EST) markers. QTL analyses using Multiple Interval Mapping (MIM) identified four QTL for PHS resistance traits on chromosomes 3B, 4A, 7B and 7D. Sequences of SNPs flanking these QTL were subject to a BLASTN search on the International Wheat Genome Sequencing Consortium (IWGSC) database (http://wheat-urgi.versailles.inra.fr/Seq-Repository). Best survey sequence hits were subject to a BLASTN search on Gramene (www.gramene.org) against both Brachypodium and rice databases, and candidate genes and regions for PHS resistance were identified. A total of 18 SNP flanking sequences on chromosomes 3B, 4A, 7B and 7D were converted to KASP markers and validated with matching genotype calls of Infinium SNP data.
Conclusions
Our study identified candidate genes involved in abscissic acid (ABA) and gibberellin (GA) metabolism, and flowering time in four genomic regions of Brachypodium and rice respectively, in addition to 18 KASP markers for PHS resistance in wheat. These markers can be deployed in future genetic studies of PHS resistance and might also be useful in the evaluation of PHS in germplasm and breeding material.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0340-1) contains supplementary material, which is available to authorized users.
doi:10.1186/s12870-014-0340-1
PMCID: PMC4253633  PMID: 25432597
Wheat; Pre-harvest sprouting; Quantitative trait loci; Candidate genes
20.  An association mapping approach to identify favourable alleles for tomato fruit quality breeding 
BMC Plant Biology  2014;14(1):337.
Background
Genome Wide Association Studies (GWAS) have been recently used to dissect complex quantitative traits and identify candidate genes affecting phenotype variation of polygenic traits. In order to map loci controlling variation in tomato marketable and nutritional fruit traits, we used a collection of 96 cultivated genotypes, including Italian, Latin American, and other worldwide-spread landraces and varieties. Phenotyping was carried out by measuring ten quality traits and metabolites in red ripe fruits. In parallel, genotyping was carried out by using the Illumina Infinium SolCAP array, which allows data to be collected from 7,720 single nucleotide polymorphism (SNP) markers.
Results
The Mixed Linear Model used to detect associations between markers and traits allowed population structure and relatedness to be evidenced within our collection, which have been taken into consideration for association analysis. GWAS identified 20 SNPs that were significantly associated with seven out of ten traits considered. In particular, our analysis revealed two markers associated with phenolic compounds, three with ascorbic acid, β-carotene and trans-lycopene, six with titratable acidity, and only one with pH and fresh weight. Co-localization of a group of associated loci with candidate genes/QTLs previously reported in other studies validated the approach. Moreover, 19 putative genes in linkage disequilibrium with markers were found. These genes might be involved in the biosynthetic pathways of the traits analyzed or might be implied in their transcriptional regulation. Finally, favourable allelic combinations between associated loci were identified that could be pyramided to obtain new improved genotypes.
Conclusions
Our results led to the identification of promising candidate loci controlling fruit quality that, in the future, might be transferred into tomato genotypes by Marker Assisted Selection or genetic engineering, and highlighted that intraspecific variability might be still exploited for enhancing tomato fruit quality.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0337-9) contains supplementary material, which is available to authorized users.
doi:10.1186/s12870-014-0337-9
PMCID: PMC4266912  PMID: 25465385
Candidate genes; Fruit quality; Genome-wide association; Metabolite analysis; Mixed Linear Model; Solanum lycopersicum; SolCAP Infinium array
21.  Validation of associations for female fertility traits in Nordic Holstein, Nordic Red and Jersey dairy cattle 
BMC Genetics  2014;15:8.
Background
The results obtained from genome-wide association studies (GWAS) often show pronounced disagreements. Validation of association studies is therefore desired before marker information is incorporated in selection decisions. A reliable way to confirm a discovered association between genetic markers and phenotypes is to validate the results in different populations. Therefore, the objective of this study was to validate single nucleotide polymorphism (SNP) marker associations to female fertility traits identified in the Nordic Holstein (NH) cattle population in the Nordic Red (NR) and Jersey (JER) cattle breeds. In the present study, we used data from 3,475 NH sires which were genotyped with the BovineSNP50 Beadchip to discover associations between SNP markers and eight female fertility-related traits. The significant SNP markers were then tested in NR and JER cattle.
Results
A total of 4,474 significant associations between SNP markers and eight female fertility traits were detected in NH cattle. These significant associations were then validated in the NR (4,998 sires) and JER (1,225 sires) dairy cattle populations. We were able to validate 836 of the SNPs discovered in NH cattle in the NR population, as well as 686 SNPs in the JER population. 152 SNPs could be confirmed in both the NR and JER populations.
Conclusions
The present study presents strong evidence for association of SNPs with fertility traits across three cattle breeds. We provide strong evidence that SNPs for many fertility traits are concentrated at certain areas on the genome (BTA1, BTA4, BTA7, BTA9, BTA11 and BTA13), and these areas would be highly suitable for further study in order to identify candidate genes for female fertility traits in dairy cattle.
doi:10.1186/1471-2156-15-8
PMCID: PMC3898023  PMID: 24428918
Female fertility; GWAS; Validation
22.  Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L. 
BMC Genomics  2012;13:318.
Background
In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding.
Results
Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program.
Conclusions
We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.
doi:10.1186/1471-2164-13-318
PMCID: PMC3430595  PMID: 22805587
23.  Whole genome association study identifies regions of the bovine genome and biological pathways involved in carcass trait performance in Holstein-Friesian cattle 
BMC Genomics  2014;15(1):837.
Background
Four traits related to carcass performance have been identified as economically important in beef production: carcass weight, carcass fat, carcass conformation of progeny and cull cow carcass weight. Although Holstein-Friesian cattle are primarily utilized for milk production, they are also an important source of meat for beef production and export. Because of this, there is great interest in understanding the underlying genomic structure influencing these traits. Several genome-wide association studies have identified regions of the bovine genome associated with growth or carcass traits, however, little is known about the mechanisms or underlying biological pathways involved. This study aims to detect regions of the bovine genome associated with carcass performance traits (employing a panel of 54,001 SNPs) using measures of genetic merit (as predicted transmitting abilities) for 5,705 Irish Holstein-Friesian animals. Candidate genes and biological pathways were then identified for each trait under investigation.
Results
Following adjustment for false discovery (q-value < 0.05), 479 quantitative trait loci (QTL) were associated with at least one of the four carcass traits using a single SNP regression approach. Using a Bayesian approach, 46 QTL were associated (posterior probability > 0.5) with at least one of the four traits. In total, 557 unique bovine genes, which mapped to 426 human orthologs, were within 500kbs of QTL found associated with a trait using the Bayesian approach. Using this information, 24 significantly over-represented pathways were identified across all traits. The most significantly over-represented biological pathway was the peroxisome proliferator-activated receptor (PPAR) signaling pathway.
Conclusions
A large number of genomic regions putatively associated with bovine carcass traits were detected using two different statistical approaches. Notably, several significant associations were detected in close proximity to genes with a known role in animal growth such as glucagon and leptin. Several biological pathways, including PPAR signaling, were shown to be involved in various aspects of bovine carcass performance. These core genes and biological processes may form the foundation for further investigation to identify causative mutations involved in each trait. Results reported here support previous findings suggesting conservation of key biological processes involved in growth and metabolism.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-837) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-837
PMCID: PMC4192274  PMID: 25273628
Genome-wide association; Single nucleotide polymorphism; Holstein-Friesian; Carcass; Biological pathways
24.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network 
PLoS Genetics  2009;5(8):e1000587.
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.
Author Summary
An association study examines a phenotype against genotypic variations over a large set of individuals in order to find the genetic variant that gives rise to the variation in the phenotype. Many complex disease syndromes consist of a large number of highly related clinical phenotypes, and the patient cohorts are routinely surveyed with a large number of traits, such as hundreds of clinical phenotypes and genome-wide profiling of thousands of gene expressions, many of which are correlated. However, most of the conventional approaches for association mapping or eQTL analysis consider a single phenotype at a time instead of taking advantage of the relatedness of traits by analyzing them jointly. Assuming that a group of tightly correlated traits may share a common genetic basis, in this paper, we present a new framework for association analysis that searches for genetic variations influencing a group of correlated traits. We explicitly represent the correlation information in multiple quantitative traits as a quantitative trait network and directly incorporate this network information to scan the genome for association. Our results on simulated and asthma data show that our approach has a significant advantage in detecting associations when a genetic marker perturbs synergistically a group of traits.
doi:10.1371/journal.pgen.1000587
PMCID: PMC2719086  PMID: 19680538
25.  Unraveling the associations of osteoprotegerin gene with production traits in a paternal broiler line 
SpringerPlus  2014;3:682.
Improvements on growth and carcass traits in the poultry industry have been achieved by intense selection for heavier chickens at early ages. This faster growth has caused serious problems due to insufficient skeletal structure development needed to support the musculature of modern broilers. The osteoprotegerin gene (OPG), located on GGA2, is an important regulator of bone metabolism and reabsorption, being suggestive as a possible functional candidate gene associated with bone integrity in chickens. This study reports associations of a single nucleotide polymorphism (SNP) in the OPG gene with production traits in a parental broiler line. Different phenotypic groups were evaluated: performance, carcass and skeletal traits. SNPs were identified within the OPG gene and the most informative SNP g.9144C > G was chosen for association analyses. Chickens (n = 1230) were genotyped using PCR-RFLP. The association was carried out with QxPaK v4.0 software using a mixed model including sex, hatch and SNP as fixed effects, and the infinitesimal and residual as random effects. The OPG SNP was associated with important traits as body weight at 21 days, weights of tibia and drumstick skin, leg muscle yield, and tibia breaking strength (P < 0.05). Associations were explained by the additive effect of the SNP and the additive effect within sex. This SNP could be considered a potential marker to improve bone resistance in chickens; however, caution should be taken because of its negative effect in other important traits evaluated in this study. Furthermore, these findings suggest a possible involvement of the OPG gene in fat deposition in poultry.
Electronic supplementary material
The online version of this article (doi:10.1186/2193-1801-3-682) contains supplementary material, which is available to authorized users.
doi:10.1186/2193-1801-3-682
PMCID: PMC4247828  PMID: 25520909
Bone metabolism; Bone resistance; Chicken; Fat deposition; TNFRSF11B

Results 1-25 (1104861)