Isolated populations are emerging as a powerful study design in the search for low frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece; and the Mylopotamos villages (HELIC-MANOLIS) on Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort we observe an enrichment of missense variants amongst the variants that have drifted up in frequency by >5 fold. In the Pomak cohort we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example with mean corpuscular volume (rs7116019, p=2.3×10−26). We replicate this association in a second set of Pomak samples (combined p=2.0×10−36). We demonstrate significant power gains in detecting medical trait associations.
Isolated populations are emerging as a powerful study design in the search for low-frequency and rare variant associations with complex phenotypes. Here we genotype 2,296 samples from two isolated Greek populations, the Pomak villages (HELIC-Pomak) in the North of Greece and the Mylopotamos villages (HELIC-MANOLIS) in Crete. We compare their genomic characteristics to the general Greek population and establish them as genetic isolates. In the MANOLIS cohort, we observe an enrichment of missense variants among the variants that have drifted up in frequency by more than fivefold. In the Pomak cohort, we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example, with mean corpuscular volume (rs7116019, P=2.3 × 10−26). We replicate this association in a second set of Pomak samples (combined P=2.0 × 10−36). We demonstrate significant power gains in detecting medical trait associations.
Isolated populations can increase power to detect low frequency and rare risk variants associated with complex phenotypes. Here, the authors identify variants associated with haematological traits in two isolated Greek populations that would be difficult to detect in the general population, due to their low frequency.
Common genetic variants have been identified for adult height, but not much is known about the genetics of skeletal growth in early life. To identify common genetic variants that influence fetal skeletal growth, we meta-analyzed 22 genome-wide association studies (Stage 1; N = 28 459). We identified seven independent top single nucleotide polymorphisms (SNPs) (P < 1 × 10−6) for birth length, of which three were novel and four were in or near loci known to be associated with adult height (LCORL, PTCH1, GPR126 and HMGA2). The three novel SNPs were followed-up in nine replication studies (Stage 2; N = 11 995), with rs905938 in DC-STAMP domain containing 2 (DCST2) genome-wide significantly associated with birth length in a joint analysis (Stages 1 + 2; β = 0.046, SE = 0.008, P = 2.46 × 10−8, explained variance = 0.05%). Rs905938 was also associated with infant length (N = 28 228; P = 5.54 × 10−4) and adult height (N = 127 513; P = 1.45 × 10−5). DCST2 is a DC-STAMP-like protein family member and DC-STAMP is an osteoclast cell-fusion regulator. Polygenic scores based on 180 SNPs previously associated with human adult stature explained 0.13% of variance in birth length. The same SNPs explained 2.95% of the variance of infant length. Of the 180 known adult height loci, 11 were genome-wide significantly associated with infant length (SF3B4, LCORL, SPAG17, C6orf173, PTCH1, GDF5, ZNFX1, HHIP, ACAN, HLA locus and HMGA2). This study highlights that common variation in DCST2 influences variation in early growth and adult height.
The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common frequency and rare variants. Targeted genotyping arrays and next-generation sequencing technologies at the whole-genome sequencing (WGS) and whole-exome scales (WES) are increasingly employed to access sequence variation across the full minor allele frequency (MAF) spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.
Osteoarthritis (OA) is the most prevalent form of arthritis and accounts for substantial morbidity and disability, particularly in the elderly. It is characterized by changes in joint structure including degeneration of the articular cartilage and its etiology is multifactorial with a strong postulated genetic component. We performed a meta-analysis of four genome-wide association (GWA) studies of 2,371 knee OA cases and 35,909 controls in Caucasian populations. Replication of the top hits was attempted with data from additional ten replication datasets. With a cumulative sample size of 6,709 cases and 44,439 controls, we identified one genome-wide significant locus on chromosome 7q22 for knee OA (rs4730250, p-value=9.2×10−9), thereby confirming its role as a susceptibility locus for OA. The associated signal is located within a large (500kb) linkage disequilibrium (LD) block that contains six genes; PRKAR2B (protein kinase, cAMP-dependent, regulatory, type II, beta), HPB1 (HMG-box transcription factor 1), COG5 (component of oligomeric golgi complex 5), GPR22 (G protein-coupled receptor 22), DUS4L (dihydrouridine synthase 4-like), and BCAP29 (the B-cell receptor-associated protein 29). Gene expression analyses of the (six) genes in primary cells derived from different joint tissues confirmed expression of all the genes in the joint environment.
The male-to-female sex ratio at birth is constant across world populations with an average of 1.06 (106 male to 100 female live births) for populations of European descent. The sex ratio is considered to be affected by numerous biological and environmental factors and to have a heritable component. The aim of this study was to investigate the presence of common allele modest effects at autosomal and chromosome X variants that could explain the observed sex ratio at birth. We conducted a large-scale genome-wide association scan (GWAS) meta-analysis across 51 studies, comprising overall 114 863 individuals (61 094 women and 53 769 men) of European ancestry and 2 623 828 common (minor allele frequency >0.05) single-nucleotide polymorphisms (SNPs). Allele frequencies were compared between men and women for directly-typed and imputed variants within each study. Forward-time simulations for unlinked, neutral, autosomal, common loci were performed under the demographic model for European populations with a fixed sex ratio and a random mating scheme to assess the probability of detecting significant allele frequency differences. We do not detect any genome-wide significant (P < 5 × 10−8) common SNP differences between men and women in this well-powered meta-analysis. The simulated data provided results entirely consistent with these findings. This large-scale investigation across ∼115 000 individuals shows no detectable contribution from common genetic variants to the observed skew in the sex ratio. The absence of sex-specific differences is useful in guiding genetic association study design, for example when using mixed controls for sex-biased traits.
Brachial circumference (BC), also known as upper arm or mid arm circumference, can be used as an indicator of muscle mass and fat tissue, which are distributed differently in men and women. Analysis of anthropometric measures of peripheral fat distribution such as BC could help in understanding the complex pathophysiology behind overweight and obesity. The purpose of this study is to identify genetic variants associated with BC through a large-scale genome-wide association scan (GWAS) meta-analysis. We used fixed-effects meta-analysis to synthesise summary results across 14 GWAS discovery and 4 replication cohorts comprising overall 22,376 individuals (12,031 women and 10,345 men) of European ancestry. Individual analyses were carried out for men, women, and combined across sexes using linear regression and an additive genetic model: adjusted for age and adjusted for age and BMI. We prioritised signals for follow-up in two-stages. We did not detect any signals reaching genome-wide significance. The FTO rs9939609 SNP showed nominal evidence for association (p<0.05) in the age-adjusted strata for men and across both sexes. In this first GWAS meta-analysis for BC to date, we have not identified any genome-wide significant signals and do not observe robust association of previously established obesity loci with BC. Large-scale collaborations will be necessary to achieve higher power to detect loci underlying BC.
Imputation is an extremely valuable tool in conducting and synthesising genome-wide association studies (GWASs). Directly typed SNP quality control (QC) is thought to affect imputation quality. It is, therefore, common practise to use quality-controlled (QCed) data as an input for imputing genotypes. This study aims to determine the effect of commonly applied QC steps on imputation outcomes. We performed several iterations of imputing SNPs across chromosome 22 in a dataset consisting of 3177 samples with Illumina 610k (Illumina, San Diego, CA, USA) GWAS data, applying different QC steps each time. The imputed genotypes were compared with the directly typed genotypes. In addition, we investigated the correlation between alternatively QCed data. We also applied a series of post-imputation QC steps balancing elimination of poorly imputed SNPs and information loss. We found that the difference between the unQCed data and the fully QCed data on imputation outcome was minimal. Our study shows that imputation of common variants is generally very accurate and robust to GWAS QC, which is not a major factor affecting imputation outcome. A minority of common-frequency SNPs with particular properties cannot be accurately imputed regardless of QC stringency. These findings may not generalise to the imputation of low frequency and rare variants.
genome-wide association study; imputation; quality control; single nucleotide polymorphism
Genome-wide association studies (GWAS) conducted using commercial single nucleotide polymorphisms (SNP) arrays have proven to be a powerful tool for the detection of common disease susceptibility variants. However, their utility for the detection of lower frequency variants is yet to be practically investigated. Here we describe the application of a rare variant collapsing method to a large genome-wide SNP dataset, the Wellcome Trust Case Control Consortium rheumatoid arthritis (RA) GWAS. We partitioned the data into gene-centric bins and collapsed genotypes of low frequency variants (defined here as MAF ≤0.05) into a single count coupled with univariate analysis. We then prioritised gene regions for further investigation in an independent cohort of 3,355 cases and 2,427 controls based on rare variant signal p value and prior evidence to support involvement in RA. A total of 14,536 gene bins were investigated in the primary analysis and signals mapping to the TNFAIP3 and chr17q24 loci were selected for further investigation. We detected replicating association to low frequency variants in the TNFAIP3 gene (combined p = 6.6 × 10−6). Even though rare variants are not well-represented and can be difficult to genotype in GWAS, our study supports the application of low frequency variant collapsing methods to genome-wide SNP datasets as a means of exploiting data that are routinely ignored.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-010-0889-1) contains supplementary material, which is available to authorized users.
The identification of complex disease susceptibility loci has been accelerated considerably by advances in high-throughput genotyping technologies, improved insight into correlation patterns of common variants and the availability of large-scale sample sets. Linkage scans and small-scale candidate gene studies have now given way to genome-wide association scans. In this review, we summarize insights gained from the past, highlight practical issues relating to the design and analysis of current state-of-the-art GWA studies and look into future trends in the field of human complex trait genetics.
association study; complex disease; single nucleotide polymorphism; genome-wide association scan; meta-analysis; sequencing
Osteoarthritis (OA) has a complex aetiology with a strong genetic component. Genome-wide association studies implicate several nuclear genes in the aetiology, but a major component of the heritability has yet to be defined at the molecular level. Initial studies implicate maternally inherited variants of mitochondrial DNA (mtDNA) in subgroups of patients with OA based on gender and specific joint involvement, but these findings have not been replicated.
The authors studied 138 maternally inherited mtDNA variants genotyped in a two cohort genetic association study across a total of 7393 OA cases from the arcOGEN consortium and 5122 controls genotyped in the Wellcome Trust Case Control consortium 2 study.
Following data quality control we examined 48 mtDNA variants that were common in cohort 1 and cohort 2, and found no association with OA. None of the phenotypic subgroups previously associated with mtDNA haplogroups were associated in this study.
We were not able to replicate previously published findings in the largest mtDNA association study to date. The evidence linking OA to mtDNA is not compelling at present.
Gene Polymorphism; Osteoarthritis; Pharmacogenetics
Multiple genetic loci have been associated with body mass index (BMI) and obesity. The aim of this study was to investigate the effects of established adult BMI and childhood obesity loci in a Greek adolescent cohort. For this purpose, 34 variants were selected for investigation in 707 (55.9% females) adolescents of Greek origin aged 13.42 ± 0.88 years. Cumulative effects of variants were assessed by calculating a genetic risk score (GRS-34) for each subject. Variants at the FTO, TMEM18, FAIM2, RBJ, ZNF608 and QPCTL loci yielded nominal evidence for association with BMI and/or overweight risk (p < 0.05). Variants at TFAP2B and NEGR1 loci showed nominal association (p < 0.05) with BMI and/or overweight risk in males and females respectively. Even though we did not detect any genome-wide significant associations, 27 out of 34 variants yielded directionally consistent effects with those reported by large-scale meta-analyses (binomial sign p = 0.0008). The GRS-34 was associated with both BMI (beta = 0.17 kg/m2/allele; p < 0.001) and overweight risk (OR = 1.09/allele; 95% CI: 1.04–1.16; p = 0.001). In conclusion, we replicate associations of established BMI and childhood obesity variants in a Greek adolescent cohort and confirm directionally consistent effects for most of them.
Obesity; BMI; common genetic variants; adolescents
Obesity as measured by body mass index (BMI) is one of the major risk factors for osteoarthritis. In addition, genetic overlap has been reported between osteoarthritis and normal adult height variation. We investigated whether this relationship is due to a shared genetic aetiology on a genome-wide scale.
We compared genetic association summary statistics (effect size, p value) for BMI and height from the GIANT consortium genome-wide association study (GWAS) with genetic association summary statistics from the arcOGEN consortium osteoarthritis GWAS. Significance was evaluated by permutation. Replication of osteoarthritis association of the highlighted signals was investigated in an independent dataset. Phenotypic information of height and BMI was accounted for in a separate analysis using osteoarthritis-free controls.
We found significant overlap between osteoarthritis and height (p=3.3×10−5 for signals with p≤0.05) when the GIANT and arcOGEN GWAS were compared. For signals with p≤0.001 we found 17 shared signals between osteoarthritis and height and four between osteoarthritis and BMI. However, only one of the height or BMI signals that had shown evidence of association with osteoarthritis in the arcOGEN GWAS was also associated with osteoarthritis in the independent dataset: rs12149832, within the FTO gene (combined p=2.3×10−5). As expected, this signal was attenuated when we adjusted for BMI.
We found a significant excess of shared signals between both osteoarthritis and height and osteoarthritis and BMI, suggestive of a common genetic aetiology. However, only one signal showed association with osteoarthritis when followed up in a new dataset.
Osteoarthritis; Gene Polymorphism; Epidemiology
Osteoarthritis; Gene Polymorphism; Epidemiology
Osteoarthritis (OA), the most common form of arthritis, is a highly debilitating disease of the joints and can lead to severe pain and disability. There is no cure for OA. Current treatments often fail to alleviate its symptoms leading to an increased demand for joint replacement surgery. Previous epidemiological and genetic research has established that OA is a multifactorial disease with both environmental and genetic components. Over the past 6 years, a candidate gene study and several genome-wide association scans (GWAS) in populations of Asian and European descent have collectively established 15 loci associated with knee or hip OA that have been replicated with genome-wide significance, shedding some light on the aetiogenesis of the disease. All OA associated variants to date are common in frequency and appear to confer moderate to small effect sizes. Some of the associated variants are found within or near genes with clear roles in OA pathogenesis, whereas others point to unsuspected, less characterised pathways. These studies have also provided further evidence in support of the existence of ethnic, sex, and joint specific effects in OA and have highlighted the importance of expanded and more homogeneous phenotype definitions in genetic studies of OA.
Isolated populations can empower the identification of rare variation associated with complex traits through next generation association studies, but the generalizability of such findings remains unknown. Here we genotype 1,267 individuals from a Greek population isolate on the Illumina HumanExome Beadchip, in search of functional coding variants associated with lipids traits. We find genome-wide significant evidence for association between R19X, a functional variant in APOC3, with increased high-density lipoprotein and decreased triglycerides levels. Approximately 3.8% of individuals are heterozygous for this cardioprotective variant, which was previously thought to be private to the Amish founder population. R19X is rare (<0.05% frequency) in outbred European populations. The increased frequency of R19X enables discovery of this lipid traits signal at genome-wide significance in a small sample size. This work exemplifies the value of isolated populations in successfully detecting transferable rare variant associations of high medical relevance.
Isolated populations may empower genetic association studies of complex traits. Here, the authors identify a rare cardioprotective APOC3 variant in a Greek population isolate and highlight the value of using population isolates to detect rare variants that confer disease risk.
Osteoarthritis (OA) is the most common form of arthritis with a clear genetic component. To identify novel loci associated with hip OA we performed a meta-analysis of genome-wide association studies (GWAS) on European subjects.
We performed a two-stage meta-analysis on more than 78 000 participants. In stage 1, we synthesised data from eight GWAS whereas data from 10 centres were used for ‘in silico’ or ‘de novo’ replication. Besides the main analysis, a stratified by sex analysis was performed to detect possible sex-specific signals. Meta-analysis was performed using inverse-variance fixed effects models. A random effects approach was also used.
We accumulated 11 277 cases of radiographic and symptomatic hip OA. We prioritised eight single nucleotide polymorphism (SNPs) for follow-up in the discovery stage (4349 OA cases); five from the combined analysis, two male specific and one female specific. One locus, at 20q13, represented by rs6094710 (minor allele frequency (MAF) 4%) near the NCOA3 (nuclear receptor coactivator 3) gene, reached genome-wide significance level with p=7.9×10−9 and OR=1.28 (95% CI 1.18 to 1.39) in the combined analysis of discovery (p=5.6×10−8) and follow-up studies (p=7.3×10−4). We showed that this gene is expressed in articular cartilage and its expression was significantly reduced in OA-affected cartilage. Moreover, two loci remained suggestive associated; rs5009270 at 7q31 (MAF 30%, p=9.9×10−7, OR=1.10) and rs3757837 at 7p13 (MAF 6%, p=2.2×10−6, OR=1.27 in male specific analysis).
Novel genetic loci for hip OA were found in this meta-analysis of GWAS.
Epidemiology; Gene Polymorphism; Osteoarthritis
Variation in the fat mass and obesity-associated (FTO) gene influences susceptibility to obesity. A variant in the FTO gene has been implicated in genetic risk to osteoarthritis (OA). We examined the role of the FTO polymorphism rs8044769 in risk of knee and hip OA in cases and controls incorporating body mass index (BMI) information.
5409 knee OA patients, 4355 hip OA patients and up to 5362 healthy controls from 7 independent cohorts from the UK and Australia were genotyped for rs8044769. The association of the FTO variant with OA was investigated in case/control analyses with and without BMI adjustment and in analyses matched for BMI category. A mendelian randomisation approach was employed using the FTO variant as the instrumental variable to evaluate the role of overweight on OA.
In the meta-analysis of all overweight (BMI≥25) samples versus normal-weight controls irrespective of OA status the association of rs8044769 with overweight is highly significant (OR[CIs] for allele G=1.14 [01.08 to 1.19], p=7.5×10−7). A significant association with knee OA is present in the analysis without BMI adjustment (OR[CIs]=1.08[1.02 to 1.14], p=0.009) but the signal fully attenuates after BMI adjustment (OR[CIs]=0.99[0.93 to 1.05], p=0.666). We observe no evidence for association in the BMI-matched meta-analyses. Using mendelian randomisation approaches we confirm the causal role of overweight on OA.
Our data highlight the contribution of genetic risk to overweight in defining risk to OA but the association is exclusively mediated by the effect on BMI. This is consistent with what is known of the biology of the FTO gene and supports the causative role of high BMI in OA.
Osteoarthritis; Knee Osteoarthritis; Gene Polymorphism; Epidemiology