PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1123198)

Clipboard (0)
None

Related Articles

1.  Adjustment for local ancestry in genetic association analysis of admixed populations 
Bioinformatics  2010;27(5):670-677.
Motivation: Admixed populations offer a unique opportunity for mapping diseases that have large disease allele frequency differences between ancestral populations. However, association analysis in such populations is challenging because population stratification may lead to association with loci unlinked to the disease locus.
Methods and results: We show that local ancestry at a test single nucleotide polymorphism (SNP) may confound with the association signal and ignoring it can lead to spurious association. We demonstrate theoretically that adjustment for local ancestry at the test SNP is sufficient to remove the spurious association regardless of the mechanism of population stratification, whether due to local or global ancestry differences among study subjects; however, global ancestry adjustment procedures may not be effective. We further develop two novel association tests that adjust for local ancestry. Our first test is based on a conditional likelihood framework which models the distribution of the test SNP given disease status and flanking marker genotypes. A key advantage of this test lies in its ability to incorporate different directions of association in the ancestral populations. Our second test, which is computationally simpler, is based on logistic regression, with adjustment for local ancestry proportion. We conducted extensive simulations and found that the Type I error rates of our tests are under control; however, the global adjustment procedures yielded inflated Type I error rates when stratification is due to local ancestry difference.
Contact: mingyao@upenn.edu; chun.li@vanderbilt.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq709
PMCID: PMC3042179  PMID: 21169375
2.  Enriching targeted sequencing experiments for rare disease alleles 
Bioinformatics  2011;27(15):2112-2118.
Motivation: Next-generation targeted resequencing of genome-wide association study (GWAS)-associated genomic regions is a common approach for follow-up of indirect association of common alleles. However, it is prohibitively expensive to sequence all the samples from a well-powered GWAS study with sufficient depth of coverage to accurately call rare genotypes. As a result, many studies may use next-generation sequencing for single nucleotide polymorphism (SNP) discovery in a smaller number of samples, with the intent to genotype candidate SNPs with rare alleles captured by resequencing. This approach is reasonable, but may be inefficient for rare alleles if samples are not carefully selected for the resequencing experiment.
Results: We have developed a probability-based approach, SampleSeq, to select samples for a targeted resequencing experiment that increases the yield of rare disease alleles substantially over random sampling of cases or controls or sampling based on genotypes at associated SNPs from GWAS data. This technique allows for smaller sample sizes for resequencing experiments, or allows the capture of rarer risk alleles. When following up multiple regions, SampleSeq selects subjects with an even representation of all the regions. SampleSeq also can be used to calculate the sample size needed for the resequencing to increase the chance of successful capture of rare alleles of desired frequencies.
Software: http://biostat.mc.vanderbilt.edu/SampleSeq
Contact: chun.li@vanderbilt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr324
PMCID: PMC3137214  PMID: 21700677
3.  dmGWAS: dense module searching for genome-wide association studies in protein–protein interaction networks 
Bioinformatics  2010;27(1):95-102.
Motivation: An important question that has emerged from the recent success of genome-wide association studies (GWAS) is how to detect genetic signals beyond single markers/genes in order to explore their combined effects on mediating complex diseases and traits. Integrative testing of GWAS association data with that from prior-knowledge databases and proteome studies has recently gained attention. These methodologies may hold promise for comprehensively examining the interactions between genes underlying the pathogenesis of complex diseases.
Methods: Here, we present a dense module searching (DMS) method to identify candidate subnetworks or genes for complex diseases by integrating the association signal from GWAS datasets into the human protein–protein interaction (PPI) network. The DMS method extensively searches for subnetworks enriched with low P-value genes in GWAS datasets. Compared with pathway-based approaches, this method introduces flexibility in defining a gene set and can effectively utilize local PPI information.
Results: We implemented the DMS method in an R package, which can also evaluate and graphically represent the results. We demonstrated DMS in two GWAS datasets for complex diseases, i.e. breast cancer and pancreatic cancer. For each disease, the DMS method successfully identified a set of significant modules and candidate genes, including some well-studied genes not detected in the single-marker analysis of GWA studies. Functional enrichment analysis and comparison with previously published methods showed that the genes we identified by DMS have higher association signal.
Availability: dmGWAS package and documents are available at http://bioinfo.mc.vanderbilt.edu/dmGWAS.html.
Contact: zhongming.zhao@vanderbilt.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq615
PMCID: PMC3008643  PMID: 21045073
4.  Correcting population stratification in genetic association studies using a phylogenetic approach 
Bioinformatics  2010;26(6):798-806.
Motivation: The rapid development of genotyping technology and extensive cataloguing of single nucleotide polymorphisms (SNPs) across the human genome have made genetic association studies the mainstream for gene mapping of complex human diseases. For many diseases, the most practical approach is the population-based design with unrelated individuals. Although having the advantages of easier sample collection and greater power than family-based designs, unrecognized population stratification in the study samples can lead to both false-positive and false-negative findings and might obscure the true association signals if not appropriately corrected.
Methods: We report PHYLOSTRAT, a new method that corrects for population stratification by combining phylogeny constructed from SNP genotypes and principal coordinates from multi-dimensional scaling (MDS) analysis. This hybrid approach efficiently captures both discrete and admixed population structures.
Results: By extensive simulations, the analysis of a synthetic genome-wide association dataset created using data from the Human Genome Diversity Project, and the analysis of a lactase-height dataset, we show that our method can correct for population stratification more efficiently than several existing population stratification correction methods, including EIGENSTRAT, a hybrid approach based on MDS and clustering, and STRATSCORE , in terms of requiring fewer random SNPs for inference of population structure. By combining the flexibility and hierarchical nature of phylogenetic trees with the advantage of representing admixture using MDS, our hybrid approach can capture the complex population structures in human populations effectively.
Software Availability: Codes can be downloaded from http://people.pcbi.upenn.edu/∼lswang/phylostrat/
Contact: mingyao@upenn.edu; iswang@upenn.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq025
PMCID: PMC2832820  PMID: 20097913
5.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits 
PLoS Genetics  2007;3(7):e114.
We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate (“impute”) unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.
Author Summary
Ongoing association studies are evaluating the influence of genetic variation on phenotypes of interest (hereditary traits and susceptibility to disease) in large patient samples. However, although genotyping is relatively cheap, most association studies genotype only a small proportion of SNPs in the region of study, with many SNPs remaining untyped. Here, we present methods for assessing whether these untyped SNPs are associated with the phenotype of interest. The methods exploit information on patterns of multi-marker correlation (“linkage disequilibrium”) from publically available databases, such as the International HapMap project or the SeattleSNPs resequencing studies, to estimate (“impute”) patient genotypes at untyped SNPs, and assess the estimated genotypes for association with phenotype. We show that, particularly for common causal variants, these methods are highly effective. Compared with standard methods, they provide both greater power to detect associations between genetic variation and phenotypes, and also better explanations of detected associations, in many cases closely approximating results that would have been obtained by genotyping all SNPs.
doi:10.1371/journal.pgen.0030114
PMCID: PMC1934390  PMID: 17676998
6.  A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases–schizophrenia as a case 
Bioinformatics  2009;25(19):2595-6602.
Motivation: During the past decade, we have seen an exponential growth of vast amounts of genetic data generated for complex disease studies. Currently, across a variety of complex biological problems, there is a strong trend towards the integration of data from multiple sources. So far, candidate gene prioritization approaches have been designed for specific purposes, by utilizing only some of the available sources of genetic studies, or by using a simple weight scheme. Specifically to psychiatric disorders, there has been no prioritization approach that fully utilizes all major sources of experimental data.
Results: Here we present a multi-dimensional evidence-based candidate gene prioritization approach for complex diseases and demonstrate it in schizophrenia. In this approach, we first collect and curate genetic studies for schizophrenia from four major categories: association studies, linkage analyses, gene expression and literature search. Genes in these data sets are initially scored by category-specific scoring methods. Then, an optimal weight matrix is searched by a two-step procedure (core genes and unbiased P-values in independent genome-wide association studies). Finally, genes are prioritized by their combined scores using the optimal weight matrix. Our evaluation suggests this approach generates prioritized candidate genes that are promising for further analysis or replication. The approach can be applied to other complex diseases.
Availability: The collected data, prioritized candidate genes, and gene prioritization tools are freely available at http://bioinfo.mc.vanderbilt.edu/SZGR/.
Contact: zhongming.zhao@vanderbilt.edu
Supplementary information:Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp428
PMCID: PMC2752609  PMID: 19602527
7.  An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies 
Bioinformatics  2011;27(5):686-692.
Motivation: In genome-wide association studies (GWAS) of complex diseases, genetic variants having real but weak associations often fail to be detected at the stringent genome-wide significance level. Pathway analysis, which tests disease association with combined association signals from a group of variants in the same pathway, has become increasingly popular. However, because of the complexities in genetic data and the large sample sizes in typical GWAS, pathway analysis remains to be challenging. We propose a new statistical model for pathway analysis of GWAS. This model includes a fixed effects component that models mean disease association for a group of genes, and a random effects component that models how each gene's association with disease varies about the gene group mean, thus belongs to the class of mixed effects models.
Results: The proposed model is computationally efficient and uses only summary statistics. In addition, it corrects for the presence of overlapping genes and linkage disequilibrium (LD). Via simulated and real GWAS data, we showed our model improved power over currently available pathway analysis methods while preserving type I error rate. Furthermore, using the WTCCC Type 1 Diabetes (T1D) dataset, we demonstrated mixed model analysis identified meaningful biological processes that agreed well with previous reports on T1D. Therefore, the proposed methodology provides an efficient statistical modeling framework for systems analysis of GWAS.
Availability: The software code for mixed models analysis is freely available at http://biostat.mc.vanderbilt.edu/LilyWang.
Contact: lily.wang@vanderbilt.edu; zhongming.zhao@vanderbilt.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq728
PMCID: PMC3042187  PMID: 21266443
8.  Gene-based interaction analysis by incorporating external linkage disequilibrium information 
Gene–gene interactions have an important role in complex human diseases. Detection of gene–gene interactions has long been a challenge due to their complexity. The standard method aiming at detecting SNP–SNP interactions may be inadequate as it does not model linkage disequilibrium (LD) among SNPs in each gene and may lose power due to a large number of comparisons. To improve power, we propose a principal component (PC)-based framework for gene-based interaction analysis. We analytically derive the optimal weight for both quantitative and binary traits based on pairwise LD information. We then use PCs to summarize the information in each gene and test for interactions between the PCs. We further extend this gene-based interaction analysis procedure to allow the use of imputation dosage scores obtained from a popular imputation software package, MACH, which incorporates multilocus LD information. To evaluate the performance of the gene-based interaction tests, we conducted extensive simulations under various settings. We demonstrate that gene-based interaction tests are more powerful than SNP-based tests when more than two variants interact with each other; moreover, tests that incorporate external LD information are generally more powerful than those that use genotyped markers only. We also apply the proposed gene-based interaction tests to a candidate gene study on high-density lipoprotein. As our method operates at the gene level, it can be applied to a genome-wide association setting and used as a screening tool to detect gene–gene interactions.
doi:10.1038/ejhg.2010.164
PMCID: PMC3025792  PMID: 20924406
gene–gene interaction; linkage disequilibrium; imputation
9.  DNA polymorphisms and haplotype patterns of transcription factors involved in barley endosperm development are associated with key agronomic traits 
BMC Plant Biology  2010;10:5.
Background
Association mapping is receiving considerable attention in plant genetics for its potential to fine map quantitative trait loci (QTL), validate candidate genes, and identify alleles of interest. In the present study association mapping in barley (Hordeum vulgare L.) is investigated by associating DNA polymorphisms with variation in grain quality traits, plant height, and flowering time to gain further understanding of gene functions involved in the control of these traits. We focused on the four loci BLZ1, BLZ2, BPBF and HvGAMYB that play a role in the regulation of B-hordein expression, the major fraction of the barley storage protein. The association was tested in a collection of 224 spring barley accessions using a two-stage mixed model approach.
Results
Within the sequenced fragments of four candidate genes we observed different levels of nucleotide diversity. The effect of selection on the candidate genes was tested by Tajima's D which revealed significant values for BLZ1, BLZ2, and BPBF in the subset of two-rowed barleys. Pair-wise LD estimates between the detected SNPs within each candidate gene revealed different intra-genic linkage patterns. On the basis of a more extensive examination of genomic regions surrounding the four candidate genes we found a sharp decrease of LD (r2<0.2 within 1 cM) in all but one flanking regions.
Significant marker-trait associations between SNP sites within BLZ1 and flowering time, BPBF and crude protein content and BPBF and starch content were detected. Most haplotypes occurred at frequencies <0.05 and therefore were rejected from the association analysis. Based on haplotype information, BPBF was associated to crude protein content and starch content, BLZ2 showed association to thousand-grain weight and BLZ1 was found to be associated with flowering time and plant height.
Conclusions
Differences in nucleotide diversity and LD pattern within the candidate genes BLZ1, BLZ2, BPBF, and HvGAMYB reflect the impact of selection on the nucleotide sequence of the four candidate loci.
Despite significant associations, the analysed candidate genes only explained a minor part of the total genetic variation although they are known to be important factors influencing the expression of seed quality traits. Therefore, we assume that grain quality as well as plant height and flowering time are influenced by many factors each contributing a small part to the expression of the phenotype. A genome-wide association analysis could provide a more comprehensive picture of loci involved in the regulation of grain quality, thousand grain weight and the other agronomic traits that were analyzed in this study. However, despite available high-throughput genotyping arrays the marker density along the barely genome is still insufficient to cover all associations in a whole genome scan. Therefore, the candidate gene-based approach will further play an important role in barley association studies.
doi:10.1186/1471-2229-10-5
PMCID: PMC2822787  PMID: 20064201
10.  Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L. 
BMC Genomics  2012;13:318.
Background
In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding.
Results
Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program.
Conclusions
We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.
doi:10.1186/1471-2164-13-318
PMCID: PMC3430595  PMID: 22805587
11.  Genome wide association studies for body conformation traits in the Chinese Holstein cattle population 
BMC Genomics  2013;14:897.
Background
Genome-wide association study (GWAS) is a powerful tool for revealing the genetic basis of quantitative traits. However, studies using GWAS for conformation traits of cattle is comparatively less. This study aims to use GWAS to find the candidates genes for body conformation traits.
Results
The Illumina BovineSNP50 BeadChip was used to identify single nucleotide polymorphisms (SNPs) that are associated with body conformation traits. A least absolute shrinkage and selection operator (LASSO) was applied to detect multiple SNPs simultaneously for 29 body conformation traits with 1,314 Chinese Holstein cattle and 52,166 SNPs. Totally, 59 genome-wide significant SNPs associated with 26 conformation traits were detected by genome-wide association analysis; five SNPs were within previously reported QTL regions (Animal Quantitative Trait Loci (QTL) database) and 11 were very close to the reported SNPs. Twenty-two SNPs were located within annotated gene regions, while the remainder were 0.6–826 kb away from known genes. Some of the genes had clear biological functions related to conformation traits. By combining information about the previously reported QTL regions and the biological functions of the genes, we identified DARC, GAS1, MTPN, HTR2A, ZNF521, PDIA6, and TMEM130 as the most promising candidate genes for capacity and body depth, chest width, foot angle, angularity, rear leg side view, teat length, and animal size traits, respectively. We also found four SNPs that affected four pairs of traits, and the genetic correlation between each pair of traits ranged from 0.35 to 0.86, suggesting that these SNPs may have a pleiotropic effect on each pair of traits.
Conclusions
A total of 59 significant SNPs associated with 26 conformation traits were identified in the Chinese Holstein population. Six promising candidate genes were suggested, and four SNPs showed genetic correlation for four pairs of traits.
doi:10.1186/1471-2164-14-897
PMCID: PMC3879203  PMID: 24341352
Dairy cattle; GWAS; Body conformation traits; SNP; Holstein; QTL
12.  Whole genome association study identifies regions of the bovine genome and biological pathways involved in carcass trait performance in Holstein-Friesian cattle 
BMC Genomics  2014;15(1):837.
Background
Four traits related to carcass performance have been identified as economically important in beef production: carcass weight, carcass fat, carcass conformation of progeny and cull cow carcass weight. Although Holstein-Friesian cattle are primarily utilized for milk production, they are also an important source of meat for beef production and export. Because of this, there is great interest in understanding the underlying genomic structure influencing these traits. Several genome-wide association studies have identified regions of the bovine genome associated with growth or carcass traits, however, little is known about the mechanisms or underlying biological pathways involved. This study aims to detect regions of the bovine genome associated with carcass performance traits (employing a panel of 54,001 SNPs) using measures of genetic merit (as predicted transmitting abilities) for 5,705 Irish Holstein-Friesian animals. Candidate genes and biological pathways were then identified for each trait under investigation.
Results
Following adjustment for false discovery (q-value < 0.05), 479 quantitative trait loci (QTL) were associated with at least one of the four carcass traits using a single SNP regression approach. Using a Bayesian approach, 46 QTL were associated (posterior probability > 0.5) with at least one of the four traits. In total, 557 unique bovine genes, which mapped to 426 human orthologs, were within 500kbs of QTL found associated with a trait using the Bayesian approach. Using this information, 24 significantly over-represented pathways were identified across all traits. The most significantly over-represented biological pathway was the peroxisome proliferator-activated receptor (PPAR) signaling pathway.
Conclusions
A large number of genomic regions putatively associated with bovine carcass traits were detected using two different statistical approaches. Notably, several significant associations were detected in close proximity to genes with a known role in animal growth such as glucagon and leptin. Several biological pathways, including PPAR signaling, were shown to be involved in various aspects of bovine carcass performance. These core genes and biological processes may form the foundation for further investigation to identify causative mutations involved in each trait. Results reported here support previous findings suggesting conservation of key biological processes involved in growth and metabolism.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-837) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-837
PMCID: PMC4192274  PMID: 25273628
Genome-wide association; Single nucleotide polymorphism; Holstein-Friesian; Carcass; Biological pathways
13.  Genetic Predisposition to Increased Blood Cholesterol and Triglyceride Lipid Levels and Risk of Alzheimer Disease: A Mendelian Randomization Analysis 
PLoS Medicine  2014;11(9):e1001713.
In this study, Proitsi and colleagues use a Mendelian randomization approach to dissect the causal nature of the association between circulating lipid levels and late onset Alzheimer's Disease (LOAD) and find that genetic predisposition to increased plasma cholesterol and triglyceride lipid levels is not associated with elevated LOAD risk.
Please see later in the article for the Editors' Summary
Background
Although altered lipid metabolism has been extensively implicated in the pathogenesis of Alzheimer disease (AD) through cell biological, epidemiological, and genetic studies, the molecular mechanisms linking cholesterol and AD pathology are still not well understood and contradictory results have been reported. We have used a Mendelian randomization approach to dissect the causal nature of the association between circulating lipid levels and late onset AD (LOAD) and test the hypothesis that genetically raised lipid levels increase the risk of LOAD.
Methods and Findings
We included 3,914 patients with LOAD, 1,675 older individuals without LOAD, and 4,989 individuals from the general population from six genome wide studies drawn from a white population (total n = 10,578). We constructed weighted genotype risk scores (GRSs) for four blood lipid phenotypes (high-density lipoprotein cholesterol [HDL-c], low-density lipoprotein cholesterol [LDL-c], triglycerides, and total cholesterol) using well-established SNPs in 157 loci for blood lipids reported by Willer and colleagues (2013). Both full GRSs using all SNPs associated with each trait at p<5×10−8 and trait specific scores using SNPs associated exclusively with each trait at p<5×10−8 were developed. We used logistic regression to investigate whether the GRSs were associated with LOAD in each study and results were combined together by meta-analysis. We found no association between any of the full GRSs and LOAD (meta-analysis results: odds ratio [OR] = 1.005, 95% CI 0.82–1.24, p = 0.962 per 1 unit increase in HDL-c; OR = 0.901, 95% CI 0.65–1.25, p = 0.530 per 1 unit increase in LDL-c; OR = 1.104, 95% CI 0.89–1.37, p = 0.362 per 1 unit increase in triglycerides; and OR = 0.954, 95% CI 0.76–1.21, p = 0.688 per 1 unit increase in total cholesterol). Results for the trait specific scores were similar; however, the trait specific scores explained much smaller phenotypic variance.
Conclusions
Genetic predisposition to increased blood cholesterol and triglyceride lipid levels is not associated with elevated LOAD risk. The observed epidemiological associations between abnormal lipid levels and LOAD risk could therefore be attributed to the result of biological pleiotropy or could be secondary to LOAD. Limitations of this study include the small proportion of lipid variance explained by the GRS, biases in case-control ascertainment, and the limitations implicit to Mendelian randomization studies. Future studies should focus on larger LOAD datasets with longitudinal sampled peripheral lipid measures and other markers of lipid metabolism, which have been shown to be altered in LOAD.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Currently, about 44 million people worldwide have dementia, a group of brain disorders characterized by an irreversible decline in memory, communication, and other “cognitive” functions. Dementia mainly affects older people and, because people are living longer, experts estimate that more than 135 million people will have dementia by 2050. The commonest form of dementia is Alzheimer disease. In this type of dementia, protein clumps called plaques and neurofibrillary tangles form in the brain and cause its degeneration. The earliest sign of Alzheimer disease is usually increasing forgetfulness. As the disease progresses, affected individuals gradually lose their ability to deal with normal daily activities such as dressing. They may become anxious or aggressive or begin to wander. They may also eventually lose control of their bladder and of other physical functions. At present, there is no cure for Alzheimer disease although some of its symptoms can be managed with drugs. Most people with the disease are initially cared for at home by relatives and other unpaid carers, but many patients end their days in a care home or specialist nursing home.
Why Was This Study Done?
Several lines of evidence suggest that lipid metabolism (how the body handles cholesterol and other fats) is altered in patients whose Alzheimer disease develops after the age of 60 years (late onset Alzheimer disease, LOAD). In particular, epidemiological studies (observational investigations that examine the patterns and causes of disease in populations) have found an association between high amounts of cholesterol in the blood in midlife and the risk of LOAD. However, observational studies cannot prove that abnormal lipid metabolism (dyslipidemia) causes LOAD. People with dyslipidemia may share other characteristics that cause both dyslipidemia and LOAD (confounding) or LOAD might actually cause dyslipidemia (reverse causation). Here, the researchers use “Mendelian randomization” to examine whether lifetime changes in lipid metabolism caused by genes have a causal impact on LOAD risk. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the effect of a modifiable risk factor and the outcome of interest. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. So, if dyslipidemia causes LOAD, genetic variants that affect lipid metabolism should be associated with an altered risk of LOAD.
What Did the Researchers Do and Find?
The researchers investigated whether genetic predisposition to raised lipid levels increased the risk of LOAD in 10,578 participants (3,914 patients with LOAD, 1,675 elderly people without LOAD, and 4,989 population controls) using data collected in six genome wide studies looking for gene variants associated with Alzheimer disease. The researchers constructed a genotype risk score (GRS) for each participant using genetic risk markers for four types of blood lipids on the basis of the presence of single nucleotide polymorphisms (SNPs, a type of gene variant) in their DNA. When the researchers used statistical methods to investigate the association between the GRS and LOAD among all the study participants, they found no association between the GRS and LOAD.
What Do These Findings Mean?
These findings suggest that the genetic predisposition to raised blood levels of four types of lipid is not causally associated with LOAD risk. The accuracy of this finding may be affected by several limitations of this study, including the small proportion of lipid variance explained by the GRS and the validity of several assumptions that underlie all Mendelian randomization studies. Moreover, because all the participants in this study were white, these findings may not apply to people of other ethnic backgrounds. Given their findings, the researchers suggest that the observed epidemiological associations between abnormal lipid levels in the blood and variation in lipid levels for reasons other than genetics, or to LOAD risk could be secondary to variation in lipid levels for reasons other than genetics, or to LOAD, a possibility that can be investigated by studying blood lipid levels and other markers of lipid metabolism over time in large groups of patients with LOAD. Importantly, however, these findings provide new information about the role of lipids in LOAD development that may eventually lead to new therapeutic and public-health interventions for Alzheimer disease.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001713.
The UK National Health Service Choices website provides information (including personal stories) about Alzheimer's disease
The UK not-for-profit organization Alzheimer's Society provides information for patients and carers about dementia, including personal experiences of living with Alzheimer's disease
The US not-for-profit organization Alzheimer's Association also provides information for patients and carers about dementia and personal stories about dementia
Alzheimer's Disease International is the international federation of Alzheimer disease associations around the world; it provides links to individual associations, information about dementia, and links to World Alzheimer Reports
MedlinePlus provides links to additional resources about Alzheimer's disease (in English and Spanish)
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001713
PMCID: PMC4165594  PMID: 25226301
14.  Screening and Replication using the Same Data Set: Testing Strategies for Family-Based Studies in which All Probands Are Affected 
PLoS Genetics  2008;4(9):e1000197.
For genome-wide association studies in family-based designs, we propose a powerful two-stage testing strategy that can be applied in situations in which parent-offspring trio data are available and all offspring are affected with the trait or disease under study. In the first step of the testing strategy, we construct estimators of genetic effect size in the completely ascertained sample of affected offspring and their parents that are statistically independent of the family-based association/transmission disequilibrium tests (FBATs/TDTs) that are calculated in the second step of the testing strategy. For each marker, the genetic effect is estimated (without requiring an estimate of the SNP allele frequency) and the conditional power of the corresponding FBAT/TDT is computed. Based on the power estimates, a weighted Bonferroni procedure assigns an individually adjusted significance level to each SNP. In the second stage, the SNPs are tested with the FBAT/TDT statistic at the individually adjusted significance levels. Using simulation studies for scenarios with up to 1,000,000 SNPs, varying allele frequencies and genetic effect sizes, the power of the strategy is compared with standard methodology (e.g., FBATs/TDTs with Bonferroni correction). In all considered situations, the proposed testing strategy demonstrates substantial power increases over the standard approach, even when the true genetic model is unknown and must be selected based on the conditional power estimates. The practical relevance of our methodology is illustrated by an application to a genome-wide association study for childhood asthma, in which we detect two markers meeting genome-wide significance that would not have been detected using standard methodology.
Author Summary
The current state of genotyping technology has enabled researchers to conduct genome-wide association studies of up to 1,000,000 SNPs, allowing for systematic scanning of the genome for variants that might influence the development and progression of complex diseases. One of the largest obstacles to the successful detection of such variants is the multiple comparisons/testing problem in the genetic association analysis. For family-based designs in which all offspring are affected with the disease/trait under study, we developed a methodology that addresses this problem by partitioning the family-based data into two statistically independent components. The first component is used to screen the data and determine the most promising SNPs. The second component is used to test the SNPs for association, where information from the screening is used to weight the SNPs during testing. This methodology is more powerful than standard procedures for multiple comparisons adjustment (i.e., Bonferroni correction). Additionally, as only one data set is required for screening and testing, our testing strategy is less susceptible to study heterogeneity. Finally, as many family-based studies collect data only from affected offspring, this method addresses a major limitation of previous methodologies for multiple comparisons in family-based designs, which require variation in the disease/trait among offspring.
doi:10.1371/journal.pgen.1000197
PMCID: PMC2529406  PMID: 18802462
15.  Appetite regulation genes are associated with body mass index in black South African adolescents: a genetic association study 
BMJ Open  2012;2(3):e000873.
Background
Obesity is a complex trait with both environmental and genetic contributors. Genome-wide association studies have identified several variants that are robustly associated with obesity and body mass index (BMI), many of which are found within genes involved in appetite regulation. Currently, genetic association data for obesity are lacking in Africans—a single genome-wide association study and a few replication studies have been published in West Africa, but none have been performed in a South African population.
Objective
To assess the association of candidate loci with BMI in black South Africans. The authors focused on single nucleotide polymorphisms (SNPs) in the FTO, LEP, LEPR, MC4R, NPY2R and POMC genes.
Design
A genetic association study.
Participants
990 randomly selected individuals from the larger Birth to Twenty cohort (a longitudinal birth cohort study of health and development in Africans).
Measures
The authors genotyped 44 SNPs within the six candidate genes that included known BMI-associated SNPs and tagSNPs based on linkage disequilibrium in an African population for FTO, LEP and NPY2R. To assess population substructure, the authors included 18 ancestry informative markers. Weight, height, sex, sex-specific pubertal stage and exact age collected during adolescence (13 years) were used to identify loci that predispose to obesity early in life.
Results
Sex, sex-specific pubertal stage and exact age together explain 14.3% of the variation in log(BMI) at age 13. After adjustment for these factors, four SNPs were individually significantly associated with BMI: FTO rs17817449 (p=0.022), LEP rs10954174 (p=0.0004), LEP rs6966536 (p=0.012) and MC4R rs17782313 (p=0.045). Together the four SNPs account for 2.1% of the variation in log(BMI). Each risk allele was associated with an estimated average increase of 2.5% in BMI.
Conclusions
The study highlighted SNPs in FTO and MC4R as potential genetic markers of obesity risk in South Africans. The association with two SNPs in the 3′ untranslated region of the LEP gene is novel.
Article summary
Article focus
This is a replication study aiming to reproduce BMI association findings from European cohorts in a South African population.
This study focused on genes linked to appetite control that were previously reported to show association with BMI or obesity and included FTO, LEP, LEPR, MC4R, NPY2R and POMC.
Adolescent data were used to facilitate the identification of genetic loci that predispose to obesity early in life, as it is known that overweight/obese children have an elevated risk of becoming obese adults.
Key messages
We found four SNPs were individually significantly associated with BMI: FTO rs17817449 (p=0.022), LEP rs10954174 (p=0.0004), LEP rs6966536 (p=0.012) and MC4R rs17782313 (p=0.045).
Together the four SNPs account for 2.1% of the variation in log(BMI).
We also demonstrated that an accumulation of risk alleles is linked to a significant increase in BMI—individuals with seven risk alleles had an 11.0% increase in median BMI compared with those with two risk alleles.
Strengths and limitations of this study
This study provides the first preliminary evidence of the role of genetic variants in obesity risk in an adolescent black South African population.
This study was only moderately powered to detect association with BMI, and not all genes were exhaustively investigated.
TagSNP selection would have been enhanced if South African data were available for this approach.
doi:10.1136/bmjopen-2012-000873
PMCID: PMC3358621  PMID: 22614171
16.  DNA sequence polymorphisms within the bovine guanine nucleotide-binding protein Gs subunit alpha (Gsα)-encoding (GNAS) genomic imprinting domain are associated with performance traits 
BMC Genetics  2011;12:4.
Background
Genes which are epigenetically regulated via genomic imprinting can be potential targets for artificial selection during animal breeding. Indeed, imprinted loci have been shown to underlie some important quantitative traits in domestic mammals, most notably muscle mass and fat deposition. In this candidate gene study, we have identified novel associations between six validated single nucleotide polymorphisms (SNPs) spanning a 97.6 kb region within the bovine guanine nucleotide-binding protein Gs subunit alpha gene (GNAS) domain on bovine chromosome 13 and genetic merit for a range of performance traits in 848 progeny-tested Holstein-Friesian sires. The mammalian GNAS domain consists of a number of reciprocally-imprinted, alternatively-spliced genes which can play a major role in growth, development and disease in mice and humans. Based on the current annotation of the bovine GNAS domain, four of the SNPs analysed (rs43101491, rs43101493, rs43101485 and rs43101486) were located upstream of the GNAS gene, while one SNP (rs41694646) was located in the second intron of the GNAS gene. The final SNP (rs41694656) was located in the first exon of transcripts encoding the putative bovine neuroendocrine-specific protein NESP55, resulting in an aspartic acid-to-asparagine amino acid substitution at amino acid position 192.
Results
SNP genotype-phenotype association analyses indicate that the single intronic GNAS SNP (rs41694646) is associated (P ≤ 0.05) with a range of performance traits including milk yield, milk protein yield, the content of fat and protein in milk, culled cow carcass weight and progeny carcass conformation, measures of animal body size, direct calving difficulty (i.e. difficulty in calving due to the size of the calf) and gestation length. Association (P ≤ 0.01) with direct calving difficulty (i.e. due to calf size) and maternal calving difficulty (i.e. due to the maternal pelvic width size) was also observed at the rs43101491 SNP. Following adjustment for multiple-testing, significant association (q ≤ 0.05) remained between the rs41694646 SNP and four traits (animal stature, body depth, direct calving difficulty and milk yield) only. Notably, the single SNP in the bovine NESP55 gene (rs41694656) was associated (P ≤ 0.01) with somatic cell count--an often-cited indicator of resistance to mastitis and overall health status of the mammary system--and previous studies have demonstrated that the chromosomal region to where the GNAS domain maps underlies an important quantitative trait locus for this trait. This association, however, was not significant after adjustment for multiple testing. The three remaining SNPs assayed were not associated with any of the performance traits analysed in this study. Analysis of all pairwise linkage disequilibrium (r2) values suggests that most allele substitution effects for the assayed SNPs observed are independent. Finally, the polymorphic coding SNP in the putative bovine NESP55 gene was used to test the imprinting status of this gene across a range of foetal bovine tissues.
Conclusions
Previous studies in other mammalian species have shown that DNA sequence variation within the imprinted GNAS gene cluster contributes to several physiological and metabolic disorders, including obesity in humans and mice. Similarly, the results presented here indicate an important role for the imprinted GNAS cluster in underlying complex performance traits in cattle such as animal growth, calving, fertility and health. These findings suggest that GNAS domain-associated polymorphisms may serve as important genetic markers for future livestock breeding programs and support previous studies that candidate imprinted loci may act as molecular targets for the genetic improvement of agricultural populations. In addition, we present new evidence that the bovine NESP55 gene is epigenetically regulated as a maternally expressed imprinted gene in placental and intestinal tissues from 8-10 week old bovine foetuses.
doi:10.1186/1471-2156-12-4
PMCID: PMC3025900  PMID: 21214909
17.  DNA sequence polymorphisms in a panel of eight candidate bovine imprinted genes and their association with performance traits in Irish Holstein-Friesian cattle 
BMC Genetics  2010;11:93.
Background
Studies in mice and humans have shown that imprinted genes, whereby expression from one of the two parentally inherited alleles is attenuated or completely silenced, have a major effect on mammalian growth, metabolism and physiology. More recently, investigations in livestock species indicate that genes subject to this type of epigenetic regulation contribute to, or are associated with, several performance traits, most notably muscle mass and fat deposition. In the present study, a candidate gene approach was adopted to assess 17 validated single nucleotide polymorphisms (SNPs) and their association with a range of performance traits in 848 progeny-tested Irish Holstein-Friesian artificial insemination sires. These SNPs are located proximal to, or within, the bovine orthologs of eight genes (CALCR, GRB10, PEG3, PHLDA2, RASGRF1, TSPAN32, ZIM2 and ZNF215) that have been shown to be imprinted in cattle or in at least one other mammalian species (i.e. human/mouse/pig/sheep).
Results
Heterozygosities for all SNPs analysed ranged from 0.09 to 0.46 and significant deviations from Hardy-Weinberg proportions (P ≤ 0.01) were observed at four loci. Phenotypic associations (P ≤ 0.05) were observed between nine SNPs proximal to, or within, six of the eight analysed genes and a number of performance traits evaluated, including milk protein percentage, somatic cell count, culled cow and progeny carcass weight, angularity, body conditioning score, progeny carcass conformation, body depth, rump angle, rump width, animal stature, calving difficulty, gestation length and calf perinatal mortality. Notably, SNPs within the imprinted paternally expressed gene 3 (PEG3) gene cluster were associated (P ≤ 0.05) with calving, calf performance and fertility traits, while a single SNP in the zinc finger protein 215 gene (ZNF215) was associated with milk protein percentage (P ≤ 0.05), progeny carcass weight (P ≤ 0.05), culled cow carcass weight (P ≤ 0.01), angularity (P ≤ 0.01), body depth (P ≤ 0.01), rump width (P ≤ 0.01) and animal stature (P ≤ 0.01).
Conclusions
Of the eight candidate bovine imprinted genes assessed, DNA sequence polymorphisms in six of these genes (CALCR, GRB10, PEG3, RASGRF1, ZIM2 and ZNF215) displayed associations with several of the phenotypes included for analyses. The genotype-phenotype associations detected here are further supported by the biological function of these six genes, each of which plays important roles in mammalian growth, development and physiology. The associations between SNPs within the imprinted PEG3 gene cluster and traits related to calving, calf performance and gestation length suggest that this domain on chromosome 18 may play a role regulating pre-natal growth and development and fertility. SNPs within the bovine ZNF215 gene were associated with bovine growth and body conformation traits and studies in humans have revealed that the human ZNF215 ortholog belongs to the imprinted gene cluster associated with Beckwith-Wiedemann syndrome--a genetic disorder characterised by growth abnormalities. Similarly, the data presented here suggest that the ZNF215 gene may have an important role in regulating bovine growth. Collectively, our results support previous work showing that (candidate) imprinted genes/loci contribute to heritable variation in bovine performance traits and suggest that DNA sequence polymorphisms within these genes/loci represents an important reservoir of genomic markers for future genetic improvement of dairy and beef cattle populations.
doi:10.1186/1471-2156-11-93
PMCID: PMC2965127  PMID: 20942903
18.  Association, effects and validation of polymorphisms within the NCAPG - LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle 
BMC Genetics  2011;12:103.
Background
In a previously reported genome-wide association study based on a high-density bovine SNP genotyping array, 8 SNP were nominally associated (P ≤ 0.003) with average daily gain (ADG) and 3 of these were also associated (P ≤ 0.002) with average daily feed intake (ADFI) in a population of crossbred beef cattle. The SNP were clustered in a 570 kb region around 38 Mb on the draft sequence of bovine chromosome 6 (BTA6), an interval containing several positional and functional candidate genes including the bovine LAP3, NCAPG, and LCORL genes. The goal of the present study was to develop and examine additional markers in this region to optimize the ability to distinguish favorable alleles, with potential to identify functional variation.
Results
Animals from the original study were genotyped for 47 SNP within or near the gene boundaries of the three candidate genes. Sixteen markers in the NCAPG-LCORL locus displayed significant association with both ADFI and ADG even after stringent correction for multiple testing (P ≤ 005). These markers were evaluated for their effects on meat and carcass traits. The alleles associated with higher ADFI and ADG were also associated with higher hot carcass weight (HCW) and ribeye area (REA), and lower adjusted fat thickness (AFT). A reduced set of markers was genotyped on a separate, crossbred population including genetic contributions from 14 beef cattle breeds. Two of the markers located within the LCORL gene locus remained significant for ADG (P ≤ 0.04).
Conclusions
Several markers within the NCAPG-LCORL locus were significantly associated with feed intake and body weight gain phenotypes. These markers were also associated with HCW, REA and AFT suggesting that they are involved with lean growth and reduced fat deposition. Additionally, the two markers significant for ADG in the validation population of animals may be more robust for the prediction of ADG and possibly the correlated trait ADFI, across multiple breeds and populations of cattle.
doi:10.1186/1471-2156-12-103
PMCID: PMC3287254  PMID: 22168586
19.  Genome-wide association analyses for carcass quality in crossbred beef cattle 
BMC Genetics  2013;14:80.
Background
Genetic improvement of beef quality will benefit both producers and consumers, and can be achieved by selecting animals that carry desired quantitative trait nucleotides (QTN), which result from intensive searches using genetic markers. This paper presents a genome-wide association approach utilizing single nucleotide polymorphisms (SNP) in the Illumina BovineSNP50 BeadChip to seek genomic regions that potentially harbor genes or QTN underlying variation in carcass quality of beef cattle.
This study used 747 genotyped animals, mainly crossbred, with phenotypes on twelve carcass quality traits, including hot carcass weight (HCW), back fat thickness (BF), Longissimus dorsi muscle area or ribeye area (REA), marbling scores (MRB), lean yield grade by Beef Improvement Federation formulae (BIFYLD), steak tenderness by Warner-Bratzler shear force 7-day post-mortem (LM7D) as well as body composition as determined by partial rib (IMPS 103) dissection presented as a percentage of total rib weight including body cavity fat (BDFR), lean (LNR), bone (BNR), intermuscular fat (INFR), subcutaneous fat (SQFR), and total fat (TLFR).
Results
At the genome wide level false discovery rate (FDR < 10%), eight SNP were found significantly associated with HCW. Seven of these SNP were located on Bos taurus autosome (BTA) 6. At a less stringent significance level (P < 0.001), 520 SNP were found significantly associated with mostly individual traits (473 SNP), and multiple traits (47 SNP). Of these significant SNP, 48 were located on BTA6, and 22 of them were in association with hot carcass weight. There were 53 SNP associated with percentage of rib bone, and 12 of them were on BTA20. The rest of the significant SNP were scattered over other chromosomes. They accounted for 1.90 - 5.89% of the phenotypic variance of the traits. A region of approximately 4 Mbp long on BTA6 was found to be a potential area to harbor candidate genes influencing growth. One marker on BTA25 accounting for 2.67% of the variation in LM7D may be worth further investigation for the improvement of beef tenderness.
Conclusion
This study provides useful information to further assist the identification of chromosome regions and subsequently genes affecting carcass quality traits in beef cattle. It also revealed many SNP that acted pleiotropically to affect carcass quality. This knowledge is important in selecting subsets of SNP to improve the performance of beef cattle.
doi:10.1186/1471-2156-14-80
PMCID: PMC3827924  PMID: 24024930
Single nucleotide polymorphism; Chromosome regions; Beef carcass quality
20.  Mapping of a blood pressure QTL on chromosome 17 in American Indians of the strong heart family study 
Background
Blood pressure (BP) is a complex trait, with a heritability of 30 to 40%. Several genome wide associated BP loci explain only a small fraction of the phenotypic variation. Family studies can provide an important tool for gene discovery by utilizing trait and genetic transmission information among relative-pairs. We have previously described a quantitative trait locus at chromosome 17q25.3 influencing systolic BP in American Indians of the Strong Heart Family Study (SHFS). This locus has been reported to associate with variation in BP traits in family studies of Europeans, African Americans and Hispanics.
Methods
To follow-up persuasive linkage findings at this locus, we performed comprehensive genotyping in the 1-LOD unit support interval region surrounding this QTL using a multi-step strategy. We first genotyped 1,334 single nucleotide polymorphisms (SNPs) in 928 individuals from families that showed evidence of linkage for BP. We then genotyped a second panel of 306 SNPs in all SHFS participants (N = 3,807) for genes that displayed the strongest evidence of association in the region, and, in a third step, included additional genotyping to better cover the genes of interest and to interrogate plausible candidate genes in the region.
Results
Three genes had multiple SNPs marginally associated with systolic BP (TBC1D16, HRNBP3 and AZI1). In BQTN analysis, used to estimate the posterior probability that any variant in each gene had an effect on the phenotype, AZI1 showed the most prominent findings (posterior probability of 0.66). Importantly, upon correction for multiple testing, none of our study findings could be distinguished from chance.
Conclusion
Our findings demonstrate the difficulty of follow-up studies of linkage studies for complex traits, particularly in the context of low powered studies and rare variants underlying linkage peaks.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2261-14-158) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2261-14-158
PMCID: PMC4246441  PMID: 25387527
21.  Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel 
PLoS Genetics  2014;10(9):e1004573.
Association mapping is a powerful approach for dissecting the genetic architecture of complex quantitative traits using high-density SNP markers in maize. Here, we expanded our association panel size from 368 to 513 inbred lines with 0.5 million high quality SNPs using a two-step data-imputation method which combines identity by descent (IBD) based projection and k-nearest neighbor (KNN) algorithm. Genome-wide association studies (GWAS) were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model (MLM) and a new method, the Anderson-Darling (A-D) test. Ten loci for five traits were identified using the MLM method at the Bonferroni-corrected threshold −log10 (P) >5.74 (α = 1). Many loci ranging from one to 34 loci (107 loci for plant height) were identified for 17 traits using the A-D test at the Bonferroni-corrected threshold −log10 (P) >7.05 (α = 0.05) using 556809 SNPs. Many known loci and new candidate loci were only observed by the A-D test, a few of which were also detected in independent linkage analysis. This study indicates that combining IBD based projection and KNN algorithm is an efficient imputation method for inferring large missing genotype segments. In addition, we showed that the A-D test is a useful complement for GWAS analysis of complex quantitative traits. Especially for traits with abnormal phenotype distribution, controlled by moderate effect loci or rare variations, the A-D test balances false positives and statistical power. The candidate SNPs and associated genes also provide a rich resource for maize genetics and breeding.
Author Summary
Genotype imputation has been used widely in the analysis of genome-wide association studies (GWAS) to boost power and fine-map associations. We developed a two-step data imputation method to meet the challenge of large proportion missing genotypes. GWAS have uncovered an extensive genetic architecture of complex quantitative traits using high-density SNP markers in maize in the past few years. Here, GWAS were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model and a new method, the Anderson-Darling (A-D) test. We intend to show that the A-D test is a complement to current GWAS methods, especially for complex quantitative traits controlled by moderate effect loci or rare variations and with abnormal phenotype distribution. In addition, the traits associated QTL identified here provide a rich resource for maize genetics and breeding.
doi:10.1371/journal.pgen.1004573
PMCID: PMC4161304  PMID: 25211220
22.  Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated with High-Density Lipoprotein Cholesterol in Two Asian Cohorts 
PLoS Genetics  2013;9(11):e1003939.
Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.
Author Summary
Genes do not act in isolation, but interact in complex networks or pathways. By accounting for such interactions, pathways analysis methods hope to identify aspects of a disease or trait's genetic architecture that might be missed using more conventional approaches. Most existing pathways methods take a univariate approach, in which each variant within a pathway is separately tested for association with the phenotype of interest. These statistics are then combined to assess pathway significance. As a second step, further analysis can reveal important genetic variants within significant pathways. We have previously shown that a joint-modelling approach using a sparse regression model can increase the power to detect pathways influencing a quantitative trait. Here we extend this approach, and describe a method that is able to simultaneously identify pathways and genes that may be driving pathway selection. We test our method using simulations, and apply it to a study searching for pathways and genes associated with high-density lipoprotein cholesterol in two separate East Asian cohorts.
doi:10.1371/journal.pgen.1003939
PMCID: PMC3836716  PMID: 24278029
23.  Snat: a SNP annotation tool for bovine by integrating various sources of genomic information 
BMC Genetics  2011;12:85.
Background
Most recently, with maturing of bovine genome sequencing and high throughput SNP genotyping technologies, a large number of significant SNPs associated with economic important traits can be identified by genome-wide association studies (GWAS). To further determine true association findings in GWAS, the common strategy is to sift out most promising SNPs for follow-up replication studies. Hence it is crucial to explore the functional significance of the candidate SNPs in order to screen and select the potential functional ones. To systematically prioritize these statistically significant SNPs and facilitate follow-up replication studies, we developed a bovine SNP annotation tool (Snat) based on a web interface.
Results
With Snat, various sources of genomic information are integrated and retrieved from several leading online databases, including SNP information from dbSNP, gene information from Entrez Gene, protein features from UniProt, linkage information from AnimalQTLdb, conserved elements from UCSC Genome Browser Database and gene functions from Gene Ontology (GO), KEGG PATHWAY and Online Mendelian Inheritance in Animals (OMIA). Snat provides two different applications, including a CGI-based web utility and a command-line version, to access the integrated database, target any single nucleotide loci of interest and perform multi-level functional annotations. For further validation of the practical significance of our study, SNPs involved in two commercial bovine SNP chips, i.e., the Affymetrix Bovine 10K chip array and the Illumina 50K chip array, have been annotated by Snat, and the corresponding outputs can be directly downloaded from Snat website. Furthermore, a real dataset involving 20 identified SNPs associated with milk yield in our recent GWAS was employed to demonstrate the practical significance of Snat.
Conclusions
To our best knowledge, Snat is one of first tools focusing on SNP annotation for livestock. Snat confers researchers with a convenient and powerful platform to aid functional analyses and accurate evaluation on genes/variants related to SNPs, and facilitates follow-up replication studies in the post-GWAS era.
doi:10.1186/1471-2156-12-85
PMCID: PMC3224132  PMID: 21982513
24.  FastMap: Fast eQTL mapping in homozygous populations 
Bioinformatics  2008;25(4):482-489.
Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105–106, respectively.
Results: We propose a new method, FastMap, for fast and efficient eQTL mapping in homozygous inbred populations with binary allele calls. FastMap exploits the discrete nature and structure of the measured single nucleotide polymorphisms (SNPs). In particular, SNPs are organized into a Hamming distance-based tree that minimizes the number of arithmetic operations required to calculate the association of a SNP by making use of the association of its parent SNP in the tree. FastMap's tree can be used to perform both single marker mapping and haplotype association mapping over an m-SNP window. These performance enhancements also permit permutation-based significance testing.
Availability: The FastMap program and source code are available at the website: http://cebc.unc.edu/fastmap86.html
Contact: iir@unc.edu; nobel@email.unc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btn648
PMCID: PMC2642639  PMID: 19091771
25.  An association mapping approach to identify favourable alleles for tomato fruit quality breeding 
BMC Plant Biology  2014;14(1):337.
Background
Genome Wide Association Studies (GWAS) have been recently used to dissect complex quantitative traits and identify candidate genes affecting phenotype variation of polygenic traits. In order to map loci controlling variation in tomato marketable and nutritional fruit traits, we used a collection of 96 cultivated genotypes, including Italian, Latin American, and other worldwide-spread landraces and varieties. Phenotyping was carried out by measuring ten quality traits and metabolites in red ripe fruits. In parallel, genotyping was carried out by using the Illumina Infinium SolCAP array, which allows data to be collected from 7,720 single nucleotide polymorphism (SNP) markers.
Results
The Mixed Linear Model used to detect associations between markers and traits allowed population structure and relatedness to be evidenced within our collection, which have been taken into consideration for association analysis. GWAS identified 20 SNPs that were significantly associated with seven out of ten traits considered. In particular, our analysis revealed two markers associated with phenolic compounds, three with ascorbic acid, β-carotene and trans-lycopene, six with titratable acidity, and only one with pH and fresh weight. Co-localization of a group of associated loci with candidate genes/QTLs previously reported in other studies validated the approach. Moreover, 19 putative genes in linkage disequilibrium with markers were found. These genes might be involved in the biosynthetic pathways of the traits analyzed or might be implied in their transcriptional regulation. Finally, favourable allelic combinations between associated loci were identified that could be pyramided to obtain new improved genotypes.
Conclusions
Our results led to the identification of promising candidate loci controlling fruit quality that, in the future, might be transferred into tomato genotypes by Marker Assisted Selection or genetic engineering, and highlighted that intraspecific variability might be still exploited for enhancing tomato fruit quality.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0337-9) contains supplementary material, which is available to authorized users.
doi:10.1186/s12870-014-0337-9
PMCID: PMC4266912  PMID: 25465385
Candidate genes; Fruit quality; Genome-wide association; Metabolite analysis; Mixed Linear Model; Solanum lycopersicum; SolCAP Infinium array

Results 1-25 (1123198)