Search tips
Search criteria

Results 1-21 (21)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  GLIDERS - A web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs 
BMC Bioinformatics  2009;10:367.
A number of tools for the examination of linkage disequilibrium (LD) patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb). We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine) that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers.
GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range). The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%), distance limits between SNPs (minimum and maximum), r2 (0.3 to 1), HapMap population sample (CEU, YRI and JPT+CHB combined) and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file.
GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.
PMCID: PMC2777181  PMID: 19878600
2.  An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people 
Science (New York, N.Y.)  2012;337(6090):100-104.
Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (one every 17 bases) and geographically localized, such that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. Overall we conclude that, due to rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
PMCID: PMC4319976  PMID: 22604722
3.  Designing candidate gene and genome-wide case-control association studies 
Nature protocols  2007;2(10):2492-2501.
This protocol describes how to appropriately design a genetic association case-control study, either focussing on a candidate gene or region, or implementing a genome-wide approach. The steps described involve: 1) defining the case phenotype in adequate detail; 2) checking the heritability of the disease in question; 3) considering whether a population-based study is the appropriate design for the research question; 4) the appropriate selection of controls; 5) sample size calculations; and 6) giving due consideration to whether it is a de-novo or replication study. General guidelines are given, as well as specific examples of a candidate gene and a genome-wide association study into Type 2 Diabetes. Software and websites used in this protocol include the International HapMap Consortium website, Genetic Power Calculator, CaTS, and SNPSpD. Running each of the programmes only takes a few seconds; the rate-limiting steps involve thinking through the designs and parameters in the disease models.
PMCID: PMC4180089  PMID: 17947991
case-control; genetic; study design; candidate gene; genome-wide; power
4.  Multiple type 2 diabetes susceptibility genes following genome-wide association scan in UK samples 
Science (New York, N.Y.)  2007;316(5829):1336-1341.
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
PMCID: PMC3772310  PMID: 17463249
5.  Deep sequencing of the LRRK2 gene in 14,002 individuals reveals evidence of purifying selection and independent origin of the p.Arg1628Pro mutation in Europe 
Human Mutation  2012;33(7):1087-1098.
Genetic variation in LRRK2 predisposes to Parkinson disease (PD), which underpins its development as a therapeutic target. Here, we aimed to identify novel genotype-phenotype associations that might support developing LRRK2 therapies for other conditions. We sequenced the 51 exons of LRRK2 in cases comprising 12 common diseases (n = 9,582), and in 4,420 population controls. We identified 739 single nucleotide variants (SNVs), 62% of which were observed in only one person, including 316 novel exonic variants. We found evidence of purifying selection for the LRRK2 gene and a trend suggesting that this is more pronounced in the central (ROC-COR-kinase) core protein domains of LRRK2 than the flanking domains. Population genetic analyses revealed that LRRK2 is not especially polymorphic or differentiated in comparison to 201 other drug target genes. Amongst Europeans, we identified 17 carriers (0.13%) of pathogenic LRRK2 mutations that were not significantly enriched within any disease or in those reporting a family history of PD. Analysis of pathogenic mutations within Europe reveals that the p.Arg1628Pro (c4883G>C) mutation arose independently in Europe and Asia. Taken together, these findings demonstrate how targeted deep sequencing can help to reveal fundamental characteristics of clinically important loci.
PMCID: PMC3370131  PMID: 22415848
LRRK2; Deep sequencing; novel variants; evolution; population genetics; genotype-phenotype associations
6.  Deep Resequencing Unveils Genetic Architecture of ADIPOQ and Identifies a Novel Low-Frequency Variant Strongly Associated With Adiponectin Variation 
Diabetes  2012;61(5):1297-1301.
Increased adiponectin levels have been shown to be associated with a lower risk of type 2 diabetes. To understand the relations between genetic variation at the adiponectin-encoding gene, ADIPOQ, and adiponectin levels, and subsequently its role in disease, we conducted a deep resequencing experiment of ADIPOQ in 14,002 subjects, including 12,514 Europeans, 594 African Americans, and 567 Indian Asians. We identified 296 single nucleotide polymorphisms (SNPs), including 30 amino acid changes, and carried out association analyses in a subset of 3,665 subjects from two independent studies. We confirmed multiple genome-wide association study findings and identified a novel association between a low-frequency SNP (rs17366653) and adiponectin levels (P = 2.2E–17). We show that seven SNPs exert independent effects on adiponectin levels. Together, they explained 6% of adiponectin variation in our samples. We subsequently assessed association between these SNPs and type 2 diabetes in the Genetics of Diabetes Audit and Research in Tayside Scotland (GO-DARTS) study, comprised of 5,145 case and 6,374 control subjects. No evidence of association with type 2 diabetes was found, but we were also unable to exclude the possibility of substantial effects (e.g., odds ratio 95% CI for rs7366653 [0.91–1.58]). Further investigation by large-scale and well-powered Mendelian randomization studies is warranted.
PMCID: PMC3331741  PMID: 22403302
7.  An 18-kDa Translocator Protein (TSPO) polymorphism explains differences in binding affinity of the PET radioligand PBR28 
[11C]PBR28 binds the 18-kDa Translocator Protein (TSPO) and is used in positron emission tomography (PET) to detect microglial activation. However, quantitative interpretations of signal are confounded by large interindividual variability in binding affinity, which displays a trimodal distribution compatible with a codominant genetic trait. Here, we tested directly for an underlying genetic mechanism to explain this. Binding affinity of PBR28 was measured in platelets isolated from 41 human subjects and tested for association with polymorphisms in TSPO and genes encoding other proteins in the TSPO complex. Complete agreement was observed between the TSPO Ala147Thr genotype and PBR28 binding affinity phenotype (P value=3.1 × 10−13). The TSPO Ala147Thr polymorphism predicts PBR28 binding affinity in human platelets. As all second-generation TSPO PET radioligands tested hitherto display a trimodal distribution in binding affinity analogous to PBR28, testing for this polymorphism may allow quantitative interpretation of TSPO PET studies with these radioligands.
PMCID: PMC3323305  PMID: 22008728
Ala147Thr; PBR28; polymorphism; radioligand binding; TSPO
8.  Basic statistical analysis in genetic case-control studies 
Nature protocols  2011;6(2):121-133.
This protocol describes how to perform basic statistical analysis in a population-based genetic association case-control study. The steps described involve the (i) appropriate selection of measures of association and relevance of disease models; (ii) appropriate selection of tests of association; (iii) visualization and interpretation of results; (iv) consideration of appropriate methods to control for multiple testing; and (v) replication strategies. Assuming no previous experience with software such as PLINK, R or Haploview, we describe how to use these popular tools for handling single-nucleotide polymorphism data in order to carry out tests of association and visualize and interpret results. This protocol assumes that data quality assessment and control has been performed, as described in a previous protocol, so that samples and markers deemed to have the potential to introduce bias to the study have been identified and removed. Study design, marker selection and quality control of case-control studies have also been discussed in earlier protocols. The protocol should take ~1 h to complete.
PMCID: PMC3154648  PMID: 21293453
9.  The Use of Genome-Wide eQTL Associations in Lymphoblastoid Cell Lines to Identify Novel Genetic Pathways Involved in Complex Traits 
PLoS ONE  2011;6(7):e22070.
The integrated analysis of genotypic and expression data for association with complex traits could identify novel genetic pathways involved in complex traits. We profiled 19,573 expression probes in Epstein-Barr virus-transformed lymphoblastoid cell lines (LCLs) from 299 twins and correlated these with 44 quantitative traits (QTs). For 939 expressed probes correlating with more than one QT, we investigated the presence of eQTL associations in three datasets of 57 CEU HapMap founders and 86 unrelated twins. Genome-wide association analysis of these probes with 2.2 m SNPs revealed 131 potential eQTLs (1,989 eQTL SNPs) overlapping between the HapMap datasets, five of which were in cis (58 eQTL SNPs). We then tested 535 SNPs tagging the eQTL SNPs, for association with the relevant QT in 2,905 twins. We identified nine potential SNP-QT associations (P<0.01) but none significantly replicated in five large consortia of 1,097–16,129 subjects. We also failed to replicate previous reported eQTL associations with body mass index, plasma low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides levels derived from lymphocytes, adipose and liver tissue. Our results and additional power calculations suggest that proponents may have been overoptimistic in the power of LCLs in eQTL approaches to elucidate regulatory genetic effects on complex traits using the small datasets generated to date. Nevertheless, larger tissue-specific expression data sets relevant to specific traits are becoming available, and should enable the adoption of similar integrated analyses in the near future.
PMCID: PMC3137612  PMID: 21789213
10.  Data quality control in genetic case-control association studies 
Nature protocols  2010;5(9):1564-1573.
This protocol details the data quality assessment and control steps that are typically carried out during case-control association studies. The steps described involve the identification and removal of DNA samples and markers that introduce bias to the study. These critical steps are paramount to the success of a case-control study and are necessary before statistically testing for association. We describe how to use PLINK, a tool for handling SNP data, to carry out assessments of failure rate per-individual and per-SNP and to assess the degree of relatedness between individuals. We also detail other quality control procedures, including the use of SMARTPCA for the identification of ancestral outliers. These platforms were selected because they are user-friendly, widely used, and computationally efficient. Steps needed to detect and establish a disease association using case-control data are not discussed, as these are provided in a further protocol in the series. Issues concerning the study design and marker selection in case-control studies have been discussed in our earlier protocols. The protocol should take approximately 8 hours to complete.
PMCID: PMC3025522  PMID: 21085122
11.  Marker selection for genetic case-control association studies 
Nature protocols  2009;4(5):743-752.
Association studies can focus on candidate gene(s), a particular genomic region, or adopt a genome wide association approach, each of which has implications for marker selection. The strategy for marker selection will affect the statistical power of the study to detect a disease association and is a crucial element of study design. The abundant single nucleotide polymorphisms (SNPs) are the markers of choice in genetic case-control association studies. The genotypes of neighbouring SNPs are often highly correlated (‘in linkage disequilibrium’ – LD) within a population which is utilised for selecting specific ‘tagSNPs’ to serve as proxies for other nearby SNPs in high LD. General guidelines for SNP selection in candidate genes/regions and genome-wide studies are provided in this protocol, along with illustrative examples. Publicly available web-based resources are utilised to browse and retrieve data and software such as Haploview and Goldsurfer2, are applied to investigate LD and to select tagSNPs.
PMCID: PMC3025519  PMID: 19390530
gene; genetic marker; SNP; case-control study; association; design
12.  Finding the missing heritability of complex diseases 
Nature  2009;461(7265):747-753.
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
PMCID: PMC2831613  PMID: 19812666
13.  Genetic Susceptibility to Cancer: the Role of Polymorphisms in Candidate Genes 
Continuing advances in genotyping technologies and the inclusion of DNA collection in observational studies have resulted in an increasing number of genetic association studies.
To evaluate the overall progress and contribution of candidate gene association studies to current understanding of the genetic susceptibility to cancer.
Data Sources
We systematically examined the results of meta- and pooled analyses for genetic polymorphisms and cancer risk published through March 2008.
Study Selection
We identified 161 meta- and pooled analyses, encompassing 18 cancer sites and 99 genes. Analyses had to meet the following criteria: 1) at least 500 cases, 2) cancer risk as outcome, 3) not focused on HLA genetic markers, and 4) published in English.
Data Extraction
Information on cancer site, gene name, variant, point estimate and 95% confidence interval, allelic frequency, number of studies and cases, tests of study heterogeneity and publication bias were extracted by one investigator and reviewed by other investigators.
These 161 analyses evaluated 344 gene-variant/cancer associations and included on average 7.3 studies and 3,551 cases (range: 508–19,729 cases) per investigated association. The summary OR for 98 (28%) statistically significant associations (p-value <0.05) were further evaluated by estimating the false-positive report probability (FPRP) at a given prior probability and statistical power. At a prior probability level of 0.001 and statistical power to detect an OR of 1.5, thirteen gene-variant/cancer associations remained noteworthy (FPRP<0.2). Assuming a very low prior probability of 0.000001, similar to a probability assumed for a randomly selected SNP in a genome-wide association study, and statistical power to detect an OR of 1.5, four associations were considered noteworthy as denoted by a FPRP value < 0.2: 1) GSTM1 null and bladder cancer (OR:1.5, 95% CI: 1.3–1.6, p-value=1.9×10−14), 2) NAT2 slow acetylator and bladder cancer (OR: 1.46, 95% CI:1.26–1.68, p-value=2.5×10−7), 3) MTHFR C677T and gastric cancer (OR: 1.52, 95% CI: 1.31–1.77, p-value=4.9×10−8), and 4) GSTM1 null and acute leukemia (OR: 1.20, 95% CI: 1.14–1.25, p-value=8.6×10−15). When the OR used to determine statistical power was lowered to 1.2, two of the four noteworthy associations remained so: GSTM1 null with bladder cancer and acute leukemia.
Phase II enzymes, which are key enzymes involved in the detoxification and excretion of carcinogens (and particularly deletion of GSTM1), were among the most consistent and highly significant associations.
PMCID: PMC2772197  PMID: 18505952
14.  Genome-wide association defines more than thirty distinct susceptibility loci for Crohn's disease 
Nature genetics  2008;40(8):955-962.
Several new risk factors for Crohn's disease have been identified in recent genome-wide association studies. To advance gene discovery further we have combined the data from three studies (a total of 3,230 cases and 4,829 controls) and performed replication in 3,664 independent cases with a mixture of population-based and family-based controls. The results strongly confirm 11 previously reported loci and provide genome-wide significant evidence for 21 new loci, including the regions containing STAT3, JAK2, ICOSLG, CDKAL1, and ITLN1. The expanded molecular understanding of the basis of disease offers promise for informed therapeutic development.
PMCID: PMC2574810  PMID: 18587394
15.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity 
Science (New York, N.Y.)  2007;316(5826):889-894.
Obesity is a serious international health problem that increases the risk of several common diseases. The genetic factors predisposing to obesity are poorly understood. A genome-wide search for type 2 diabetes–susceptibility genes identified a common variant in the FTO (fat mass and obesity associated) gene that predisposes to diabetes through an effect on body mass index (BMI). An additive association of the variant with BMI was replicated in 13 cohorts with 38,759 participants. The 16% of adults who are homozygous for the risk allele weighed about 3 kilograms more and had 1.67-fold increased odds of obesity when compared with those not inheriting a risk allele. This association was observed from age 7 years upward and reflects a specific increase in fat mass.
PMCID: PMC2646098  PMID: 17434869
16.  A genome-wide association study for celiac disease identifies risk variants in the region harboring IL2 and IL21 
Nature genetics  2007;39(7):827-829.
We tested 310,605 single-nucleotide polymorphisms for association in 778 celiac disease cases and 1422 controls. Outside the HLA, the most significant finding (rs13119723, P=2.0 × 10−7, empirical genome-wide significance P=0.045) was in the KIAA1109/Tenr/IL2/IL21 linkage disequilibrium block. Association was independently confirmed in two further collections (strongest at rs6822844, 24kB 5' of IL21, meta-analysis P=1.3 × 10−14, OR 0.63), suggesting genetic variation in this region predisposes to celiac disease.
PMCID: PMC2274985  PMID: 17558408
17.  Goldsurfer2 (Gs2): A comprehensive tool for the analysis and visualization of genome wide association studies 
BMC Bioinformatics  2008;9:138.
Genome wide association (GWA) studies are now being widely undertaken aiming to find the link between genetic variations and common diseases. Ideally, a well-powered GWA study will involve the measurement of hundreds of thousands of single nucleotide polymorphisms (SNPs) in thousands of individuals. The sheer volume of data generated by these experiments creates very high analytical demands. There are a number of important steps during the analysis of such data, many of which may present severe bottlenecks. The data need to be imported and reviewed to perform initial quality control (QC) before proceeding to association testing. Evaluation of results may involve further statistical analysis, such as permutation testing, or further QC of associated markers, for example, reviewing raw genotyping intensities. Finally significant associations need to be prioritised using functional and biological interpretation methods, browsing available biological annotation, pathway information and patterns of linkage disequilibrium (LD).
We have developed an interactive and user-friendly graphical application to be used in all steps in GWA projects from initial data QC and analysis to biological evaluation and validation of results. The program is implemented in Java and can be used on all platforms.
Very large data sets (e.g. 500 k markers and 5000 samples) can be quality assessed, rapidly analysed and integrated with genomic sequence information. Candidate SNPs can be selected and functionally evaluated.
PMCID: PMC2323971  PMID: 18318908
18.  Two-Stage Two-Locus Models in Genome-Wide Association 
PLoS Genetics  2006;2(9):e157.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
Although there is growing appreciation that attempting to map genetic interactions in humans may be a fruitful endeavor, there is no consensus as to the best strategy for their detection, particularly in the case of genome-wide association where the number of potential comparisons is enormous. In this article, the authors compare the performance of four different search strategies to detect loci which interact in genome-wide association—a single-locus search, an exhaustive two-locus search, and two, two-stage procedures in which a subset of loci initially identified with single-locus tests are analyzed using a full two-locus model. Their results show that when loci interact, an exhaustive two-locus search across the genome is superior to a two-stage strategy, and in many situations can identify loci which would not have been identified solely using a single-locus search. Their findings suggest that an exhaustive search involving all pairwise combinations of markers across the genome may provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
PMCID: PMC1570380  PMID: 17002500
19.  Prospects and pitfalls in whole genome association studies 
Recent large-scale studies of common genetic variation throughout the human genome are making it feasible to conduct whole genome studies of genotype–phenotype associations. Such studies have the potential to uncover novel contributors to common complex traits and thus lead to insights into the aetiology of multifactorial phenotypes. Despite this promise, it is important to recognize that the availability of genetic markers and the ability to assay them at realistic cost does not guarantee success of this approach. There are a number of practical issues that require close attention, some forms of allelic architecture are not readily amenable to the association approach with even the most rigorous design, and doubtless new hurdles will emerge as the studies begin. Here we discuss the promise and current challenges of the whole genome approach, and raise some issues to consider in interpreting the results of the first whole genome studies.
PMCID: PMC1569530  PMID: 16096108
whole genome association; complex trait; genetic association
21.  Optimizing the Power of Genome-Wide Association Studies by Using Publicly Available Reference Samples to Expand the Control Group 
Genetic Epidemiology  2010;34(4):319-326.
Genome-wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousands of individuals. Large-scale international initiatives, such as the Wellcome Trust Case Control Consortium, the Genetic Association Information Network, and the database of genetic and phenotypic information, aim to facilitate discovery of modest-effect genes by making genome-wide data publicly available, allowing information to be combined for the purpose of pooled analysis. In principle, disease or control samples from these studies could be used to increase the power of any GWA study via judicious use as “genetically matched controls” for other traits. Here, we present the biological motivation for the problem and the theoretical potential for expanding the control group with publicly available disease or reference samples. We demonstrate that a naïve application of this strategy can greatly inflate the false-positive error rate in the presence of population structure. As a remedy, we make use of genome-wide data and model selection techniques to identify “axes” of genetic variation which are associated with disease. These axes are then included as covariates in association analysis to correct for population structure, which can result in increases in power over standard analysis of genetic information from the samples in the original GWA study. Genet. Epidemiol. 34: 319–326, 2010. © 2010 Wiley-Liss, Inc.
PMCID: PMC2962805  PMID: 20088020
genome-wide association study; expanded control group; population structure; multidimensional scaling; model selection

Results 1-21 (21)