Personality can be thought of as a set of characteristics that influence people’s thoughts, feelings, and behaviour across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide association (GWA) data for personality in ten discovery samples (17 375 adults) and five in-silico replication samples (3 294 adults). All participants were of European ancestry. Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness were based on the NEO Five-Factor Inventory. Genotype data were available of ~2.4M Single Nucleotide Polymorphisms (SNPs; directly typed and imputed using HAPMAP data). In the discovery samples, classical association analyses were performed under an additive model followed by meta-analysis using the weighted inverse variance method. Results showed genome-wide significance for Openness to Experience near the RASA1 gene on 5q14.3 (rs1477268 and rs2032794, P = 2.8 × 10−8 and 3.1 × 10−8) and for Conscientiousness in the brain-expressed KATNAL2 gene on 18q21.1 (rs2576037, P = 4.9 × 10−8). We further conducted a gene-based test that confirmed the association of KATNAL2 to Conscientiousness. In-silico replication did not, however, show significant associations of the top SNPs with Openness and Conscientiousness, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples. Larger scale GWA studies and alternative approaches are required for confirmation of KATNAL2 as a novel gene affecting Conscientiousness.
Personality; Five-Factor Model; Genome-wide association; Meta-analysis; Genetic variants
Schizophrenia is a severe mental disorder affecting ~1% of the world population, with heritability of up to 80%. To identify new common genetic risk factors, we performed a genome-wide association study (GWAS) in the Han Chinese population. The discovery sample set consisted of 3,750 patients and 6,468 healthy controls (1,578 cases and 1,592 controls from the Northern Han; 1,238 cases and 2,856 controls from the Central Han; 934 cases and 2,020 controls from the Southern Han); and we followed up the top association signals in an additional independent cohort of 4,383 cases and 4,539 controls from the Han Chinese. Meta-analysis identified genome-wide significant association of common SNPs with schizophrenia on chromosome 8p12 (rs16887244, P=1.27×10−10) and 1q24.2 (rs10489202, P=9.50×10−9). Our findings provide new insights into the pathogenesis of schizophrenia.
Migraine without aura is the most common form of migraine, characterized by recurrent disabling headache and associated autonomic symptoms. To identify common genetic variants for this migraine type, we analyzed genome-wide association data of 2,326 clinic-based German and Dutch patients and 4,580 population-matched controls. We selected SNPs from 12 loci with two or more SNPs with P-values < 1 × 10−5 for follow-up in 2,508 patients and 2,652 controls. Two loci, i.e. 1q22 (MEF2D) and 3p24 (near TGFBR2) replicated convincingly (P = 4.9 × 10−4, P = 1.0 × 10−4, respectively). Meta-analysis of the discovery and replication data yielded two additional genome-wide significant (P < 5 × 10−8) loci in PHACTR1 and ASTN2. In addition, SNPs in two previously reported migraine loci in or near TRPM8 and LRP1 significantly replicated. This study reveals the first susceptibility loci for migraine without aura, thereby expanding our knowledge of this debilitating neurological disorder.
The progressive myoclonus epilepsies (PMEs) comprise a group of clinically and genetically heterogeneous disorders characterized by myoclonus, epilepsy, and neurological deterioration. We aimed to identify the underlying gene(s) in childhood-onset PME patients with unknown molecular genetic background.
Homozygosity mapping was applied on genome-wide SNP data of 18 Turkish patients. The potassium channel tetramerization domain-containing 7 (KCTD7) gene, previously associated with PME in a single inbred family, was screened for mutations. The spatiotemporal expression of KCTD7 was assessed in cellular cultures and mouse brain tissue.
Overlapping homozygosity in 8/18 patients defined a 1.5 Mb segment on 7q11.21 as the major candidate locus. Screening of the positional candidate gene KCTD7 revealed homozygous missense mutations in two of the eight cases. Screening of KCTD7 in further 132 PME patients revealed four additional mutations (two missense, one in-frame deletion and one frameshift-causing) in five families. Eight patients presented with myoclonus and epilepsy and one with ataxia, the mean age of onset being 19 months. Within two years after onset progressive loss of mental and motor skills ensued leading to severe dementia and motor handicap. KCTD7 showed cytosolic localization and predominant neuronal expression, with widespread expression throughout the brain. None of three polypeptides carrying patient missense mutations affected the subcellular distribution of KCTD7.
Our data confirm the causality of KCTD7 defects in PME, and imply that KCTD7 mutation screening should be considered in PME patients with onset around 2 years of age followed by rapid mental and motor deterioration.
myoclonus; mutation; neurodegenerative disorders; mental retardation; pediatric epilepsy
Early pubertal onset in females is associated with increased risk for adult obesity and cardiovascular disease, but whether this relationship is independent of preceding childhood growth events is unclear. Furthermore, the association between male puberty and adult disease remains unknown. To clarify the link between puberty and adult health, we evaluated the relationship between pubertal timing and risk factors for type 2 diabetes and cardiovascular disease in both males and females from a large, prospective, and randomly ascertained birth cohort from Northern Finland.
RESEARCH DESIGN AND METHODS
Pubertal timing was estimated based on pubertal height growth in 5,058 subjects (2,417 males and 2,641 females), and the relationship between puberty and body weight, glucose and lipid homeostasis, and blood pressure at age 31 years was evaluated with linear regression modeling.
Earlier pubertal timing associated with higher adult BMI, fasting insulin, diastolic blood pressure, and decreased HDL cholesterol in both sexes (P < 0.002) and with higher total serum cholesterol, LDL cholesterol, and triglycerides in males. The association with BMI and diastolic blood pressure remained statistically significant in both sexes, as did the association with insulin levels and HDL cholesterol concentrations in males after adjusting for covariates reflecting both fetal and childhood growth including childhood BMI.
We demonstrate independent association between earlier pubertal timing and adult metabolic syndrome-related derangements both in males and females. The connection emphasizes that the mechanisms advancing puberty may also contribute to adult metabolic disorders.
During the past ten years the field of human disease genetics has made major leaps, including the completion of the Human Genome Project, the HapMap Project, the development of the genome-wide association (GWA) studies to identify common disease-predisposing variants and the introduction of large-scale whole-genome and whole-exome sequencing studies. The introduction of new technologies has enabled researchers to utilize novel study designs to tackle previously unexplored research questions in human genomics. These new types of studies typically need large sample sizes to overcome the multiple testing challenges caused by the huge number of interrogated genetic variants. As a consequence, large consortia-studies are at present the default in disease genetics research. The systematic planning of the GWA-studies was a key element in the success of the approach. Similar planning and rigor in statistical inferences will probably be beneficial also to future sequencing studies. Already today, the next-generation exome sequencing has led to the identification of several genes underlying Mendelian diseases. In spite of the clear benefits, the method has proven to be more challenging than anticipated. In the case of complex diseases, next-generation sequencing aims to identify disease-associated low-frequency alleles. However, their robust detection will require very large study samples, even larger than in the case of the GWA-studies. This has stimulated study designs that capitalize on enriching sets of low-frequency alleles, for example, studies focusing on population isolates such as Finland or Iceland. One example is the collaborative SISu Project (Sequencing Initiative Suomi) that aims to provide near complete genome variation information from Finnish study samples and pave the way for large, nationwide genome health initiative studies.
Nuclear magnetic resonance assays allow for measurement of a wide range of metabolic phenotypes. We report here the results of a GWAS on 8,330 Finnish individuals genotyped and imputed at 7.7 million SNPs for a range of 216 serum metabolic phenotypes assessed by NMR of serum samples. We identified significant associations (P < 2.31 × 10−10) at 31 loci, including 11 for which there have not been previous reports of associations to a metabolic trait or disorder. Analyses of Finnish twin pairs suggested that the metabolic measures reported here show higher heritability than comparable conventional metabolic phenotypes. In accordance with our expectations, SNPs at the 31 loci associated with individual metabolites account for a greater proportion of the genetic component of trait variance (up to 40%) than is typically observed for conventional serum metabolic phenotypes. The identification of such associations may provide substantial insight into cardiometabolic disorders.
Preterm birth is the major cause of neonatal mortality and morbidity. In many cases, it has severe life-long consequences for the health and neurological development of the newborn child. More than 50% of all preterm births are spontaneous, and currently there is no effective prevention. Several studies suggest that genetic factors play a role in spontaneous preterm birth (SPTB). However, its genetic background is insufficiently characterized. The aim of the present study was to perform a linkage analysis of X chromosomal markers in SPTB in large northern Finnish families with recurrent SPTBs. We found a significant linkage signal (HLOD = 3.72) on chromosome locus Xq13.1 when the studied phenotype was being born preterm. There were no significant linkage signals when the studied phenotype was giving preterm deliveries. Two functional candidate genes, those encoding the androgen receptor (AR) and the interleukin-2 receptor gamma subunit (IL2RG), located near this locus were analyzed as candidates for SPTB in subsequent case-control association analyses. Nine single-nucleotide polymorphisms (SNPs) within these genes and an AR exon-1 CAG repeat, which was previously demonstrated to be functionally significant, were analyzed in mothers with preterm delivery (n = 272) and their offspring (n = 269), and in mothers with exclusively term deliveries (n = 201) and their offspring (n = 199), all originating from northern Finland. A replication study population consisting of individuals born preterm (n = 111) and term (n = 197) from southern Finland was also analyzed. Long AR CAG repeats (≥26) were overrepresented and short repeats (≤19) underrepresented in individuals born preterm compared to those born at term. Thus, our linkage and association results emphasize the role of the fetal genome in genetic predisposition to SPTB and implicate AR as a potential novel fetal susceptibility gene for SPTB.
DNA methylation is one of the most studied epigenetic marks in the human genome, with the result that the desire to map the human methylome has driven the development of several methods to map DNA methylation on a genomic scale. Our study presents the first comparison of two of these techniques - the targeted approach of the Infinium HumanMethylation450 BeadChip® with the immunoprecipitation and sequencing-based method, MeDIP-seq. Both methods were initially validated with respect to bisulfite sequencing as the gold standard and then assessed in terms of coverage, resolution and accuracy. The regions of the methylome that can be assayed by both methods and those that can only be assayed by one method were determined and the discovery of differentially methylated regions (DMRs) by both techniques was examined. Our results show that the Infinium HumanMethylation450 BeadChip® and MeDIP-seq show a good positive correlation (Spearman correlation of 0.68) on a genome-wide scale and can both be used successfully to determine differentially methylated loci in RefSeq genes, CpG islands, shores and shelves. MeDIP-seq however, allows a wider interrogation of methylated regions of the human genome, including thousands of non-RefSeq genes and repetitive elements, all of which may be of importance in disease. In our study MeDIP-seq allowed the detection of 15,709 differentially methylated regions, nearly twice as many as the array-based method (8070), which may result in a more comprehensive study of the methylome.
Fibrin makes up the structural basis of an occlusive arterial thrombus and variability in fibrin phenotype relates to cardiovascular risk. The aims of the current study from the EU consortium EuroCLOT were to 1) determine the heritability of fibrin phenotypes and 2) identify QTLs associated with fibrin phenotypes.
447 dizygotic (DZ) and 460 monozygotic (MZ) pairs of healthy UK Caucasian female twins and 199 DZ twin pairs from Denmark were studied. D-dimer, an indicator of fibrin turnover, was measured by ELISA and measures of clot formation, morphology and lysis were determined by turbidimetric assays. Heritability estimates and genome-wide linkage analysis were performed.
Estimates of heritability for d-dimer and turbidometric variables were in the range 17 - 46%, with highest levels for maximal absorbance which provides an estimate of clot density. Genome-wide linkage analysis revealed 6 significant regions with LOD>3 on 5 chromosomes (5, 6, 9, 16 and 17).
The results indicate a significant genetic contribution to variability in fibrin phenotypes and highlight regions in the human genome which warrant further investigation in relation to ischaemic cardiovascular disorders and their therapy.
linkage; quantitative trait loci; twin; cardiovascular disease; thrombosis
Biobanks can have a pivotal role in elucidating disease etiology, translation, and
advancing public health. However, meeting these challenges hinges on a critical shift in
the way science is conducted and requires biobank harmonization. There is growing
recognition that a common strategy is imperative to develop biobanking globally and
effectively. To help guide this strategy, we articulate key principles, goals, and
priorities underpinning a roadmap for global biobanking to accelerate health science,
patient care, and public health. The need to manage and share very large amounts of data
has driven innovations on many fronts. Although technological solutions are allowing
biobanks to reach new levels of integration, increasingly powerful data-collection tools,
analytical techniques, and the results they generate raise new ethical and legal issues
and challenges, necessitating a reconsideration of previous policies, practices, and
ethical norms. These manifold advances and the investments that support them are also
fueling opportunities for biobanks to ultimately become integral parts of health-care
systems in many countries. International harmonization to increase interoperability and
sustainability are two strategic priorities for biobanking. Tackling these issues requires
an environment favorably inclined toward scientific funding and equipped to address
socio-ethical challenges. Cooperation and collaboration must extend beyond systems to
enable the exchange of data and samples to strategic alliances between many organizations,
including governmental bodies, funding agencies, public and private science enterprises,
and other stakeholders, including patients. A common vision is required and we articulate
the essential basis of such a vision herein.
Genome-wide association (GWA) studies have identified several susceptibility loci for metabolic syndrome (MetS) component traits, but have had variable success in identifying susceptibility loci to the syndrome as an entity. We conducted a GWA study on MetS and its component traits in four Finnish cohorts consisting of 2637 MetS cases and 7927 controls, both free of diabetes, and followed the top loci in an independent sample with transcriptome and NMR-based metabonomics data. Furthermore, we tested for loci associated with multiple MetS component traits using factor analysis and built a genetic risk score for MetS.
Methods and Results
A previously known lipid locus, APOA1/C3/A4/A5 gene cluster region (SNP rs964184), was associated with MetS in all four study samples (P=7.23×10−9 in meta-analysis). The association was further supported by serum metabolite analysis, where rs964184 associated with various VLDL, TG, and HDL metabolites (P=0.024-1.88×10−5). Twenty-two previously identified susceptibility loci for individual MetS component traits were replicated in our GWA and factor analysis. Most of these associated with lipid phenotypes and none with two or more uncorrelated MetS components. A genetic risk score, calculated as the number of alleles in loci associated with individual MetS traits, was strongly associated with MetS status.
Our findings suggest that genes from lipid metabolism pathways have the key role in the genetic background of MetS. We found little evidence for pleiotropy linking dyslipidemia and obesity to the other MetS component traits such as hypertension and glucose intolerance.
metabolic syndrome; risk factors; genome-wide association study; meta-analysis; lipids
Asthma has substantial morbidity and mortality and a strong genetic component, but identification of genetic risk factors is limited by availability of suitable studies.
To test if population-based cohorts with self-reported physician-diagnosed asthma and genome-wide association (GWA) data could be used to validate known associations with asthma and identify novel associations.
The APCAT (Analysis in Population-based Cohorts of Asthma Traits) consortium consists of 1,716 individuals with asthma and 16,888 healthy controls from six European-descent population-based cohorts. We examined associations in APCAT of thirteen variants previously reported as genome-wide significant (P<5x10−8) and three variants reported as suggestive (P<5×10−7). We also searched for novel associations in APCAT (Stage 1) and followed-up the most promising variants in 4,035 asthmatics and 11,251 healthy controls (Stage 2). Finally, we conducted the first genome-wide screen for interactions with smoking or hay fever.
We observed association in the same direction for all thirteen previously reported variants and nominally replicated ten of them. One variant that was previously suggestive, rs11071559 in RORA, now reaches genome-wide significance when combined with our data (P = 2.4×10−9). We also identified two genome-wide significant associations: rs13408661 near IL1RL1/IL18R1 (PStage1+Stage2 = 1.1x10−9), which is correlated with a variant recently shown to be associated with asthma (rs3771180), and rs9268516 in the HLA region (PStage1+Stage2 = 1.1x10−8), which appears to be independent of previously reported associations in this locus. Finally, we found no strong evidence for gene-environment interactions with smoking or hay fever status.
Population-based cohorts with simple asthma phenotypes represent a valuable and largely untapped resource for genetic studies of asthma.
Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis.
In this study, we aim to identify novel genetic variants for metabolism, characterize their effects on nearby genes, and show that the nearby genes are associated with metabolism and atherosclerosis. To discover new genetic variants, we use an alternative approach to traditional genome-wide association studies: we leverage the information in phenotype covariance to increase our statistical power. We identify variants at seven novel loci and then show that our top signals drive expression of nearby genes AQP9 and SERPINA1 in multiple tissues. We demonstrate that AQP9 and SERPINA1 gene expression, in turn, is associated with metabolite levels. Finally, we show that the genes are associated with atherosclerosis using mouse atherosclerotic lesion size (AQP9) as well as tissue from healthy human arteries and atherosclerotic plaques (AQP9 and SERPINA1). This study illustrates that multivariate analysis of correlated metabolites can boost power for gene discovery substantially. Further functional work will need to be performed to elucidate the biological role of SERPINA1 and AQP9 in atherosclerosis.
Phenotype mining is a novel approach for elucidating the genetic basis of complex phenotypic variation. It involves a search of rich phenotype databases for measures correlated with genetic variation, as identified in genome-wide genotyping or sequencing studies. An initial implementation of phenotype mining in a prospective unselected population cohort, the Northern Finland 1966 Birth Cohort (NFBC1966), identifies neurodevelopment-related traits—intellectual deficits, poor school performance and hearing abnormalities—which are more frequent among individuals with large (>500 kb) deletions than among other cohort members. Observation of extensive shared single nucleotide polymorphism haplotypes around deletions suggests an opportunity to expand phenotype mining from cohort samples to the populations from which they derive.
In a population-based genome-wide analysis including 5122 migraineurs and 18,108 non-migraineurs, rs2651899 (PRDM16), rs10166942 (TRMP8), and rs11172113 (LRP1) were among the top associations (p<5×10−6) with migraine. All three SNPs were significant in meta-analysis among replication cohorts and met genome-wide significance (p<4.3×10−9) in meta-analysis combining discovery and replication cohorts. Rs2651899 and rs10166942 associated with migraine compared to non-migraine headache; none of the three SNPs specifically associated with migraine subtypes or features.
Processing speed is an important cognitive function that is compromised in psychiatric illness (e.g., schizophrenia, depression) and old age; it shares genetic background with complex cognition (e.g., working memory, reasoning). To find genes influencing speed we performed a genome-wide association scan in up to three cohorts: Brisbane (mean age 16 years; N = 1659); LBC1936 (mean age 70 years, N = 992); LBC1921 (mean age 82 years, N = 307), and; HBCS (mean age 64 years, N = 1080). Meta-analysis of the common measures highlighted various suggestively significant (p < 1.21 × 10−5) SNPs and plausible candidate genes (e.g., TRIB3). A biological pathways analysis of the speed factor identified two common pathways from the KEGG database (cell junction, focal adhesion) in two cohorts, while a pathway analysis linked to the GO database revealed common pathways across pairs of speed measures (e.g., receptor binding, cellular metabolic process). These highlighted genes and pathways will be able to inform future research, including results for psychiatric disease.
Information processing speed; Cognitive ability; Genes; Biological pathways
Plausible genome-wide associations for episodic neurological diseases (such as migraine, epilepsy and ataxias) have been slow to emerge. The first such association was reported in a recent genome-wide association study of migraine, with quantitative expression analysis linking the variant to a nearby regulatory gene, MTDH/AEG-1. This putative mechanism, regulating the expression of the primary glutamate transporter in the brain, EAAT2/GLT-1, has interesting implications bridging the gap between Mendelian and common forms in this key group of disorders.
Although genome-wide association studies (GWAS) have identified hundreds of complex trait loci, the pathomechanisms of most remain elusive. Studying the genetics of risk factors predisposing to disease is an attractive approach to identify targets for functional studies. Intracranial aneurysms (IA) are rupture-prone pouches at cerebral artery branching sites. IA is a complex disease for which GWAS have identified five loci with strong association and a further 14 loci with suggestive association. To decipher potential underlying disease mechanisms, we tested whether there are IA loci that convey their effect through elevating blood pressure (BP), a strong risk factor of IA. We performed a meta-analysis of four population-based Finnish cohorts (nFIN = 11 266) not selected for IA, to assess the association of previously identified IA candidate loci (n = 19) with BP. We defined systolic BP (SBP), diastolic BP, mean arterial pressure, and pulse pressure as quantitative outcome variables. The most significant result was further tested for association in the ICBP-GWAS cohort of 200 000 individuals. We found that the suggestive IA locus at 5q23.2 in PRDM6 was significantly associated with SBP in individuals of European descent (pFIN = 3.01E-05, pICBP-GWAS = 0.0007, pALL = 8.13E-07). The risk allele of IA was associated with higher SBP. PRDM6 encodes a protein predominantly expressed in vascular smooth muscle cells. Our study connects a complex disease (IA) locus with a common risk factor for the disease (SBP). We hypothesize that common variants in PRDM6 can contribute to altered vascular wall structure, hence increasing SBP and predisposing to IA. True positive associations often fail to reach genome-wide significance in GWAS. Our findings show that analysis of traditional risk factors as intermediate phenotypes is an effective tool for deciphering hidden heritability. Further, we demonstrate that common disease loci identified in a population isolate may bear wider significance.
When multiple genes or genetic regions contribute to the inherited risk of a disease, it is referred to as a complex disease. Genome-wide association studies (GWAS) aim to detect common genetic variations that associate with complex traits or diseases. Although GWAS have been successful in identifying strongly associated genetic loci, they lack the means to point out true, but less strong, associations. Studying conditions that are related to the disease of interest can help sort out less strong associations. Intracranial aneurysms (IA) are berry-like dilations in cerebral arteries. Most IAs do not give symptoms until they bleed, causing a highly fatal form of stroke. Half of the people who suffer bleeding of an IA die. IA is a complex disease. Both inherited risk and environmental factors contribute to the risk of developing IA. Women, smokers, those with high alcohol intake or high blood pressure are more prone to develop IA and bleeding. GWAS found 19 genetic regions increasing the risk of IA. Here we show that one of these loci, on the long arm of chromosome 5, in addition to raising IA risk also increases systolic blood pressure. We speculate that the cause is modified vascular wall structure.
Common genetic variants have been shown to explain a fraction of the inherited variation for many common diseases and quantitative traits, including height, a classic polygenic trait. The extent to which common variation determines the phenotype of highly heritable traits such as height is uncertain, as is the extent to which common variation is relevant to individuals with more extreme phenotypes. To address these questions, we studied 1,214 individuals from the top and bottom extremes of the height distribution (tallest and shortest ∼1.5%), drawn from ∼78,000 individuals from the HUNT and FINRISK cohorts. We found that common variants still influence height at the extremes of the distribution: common variants (49/141) were nominally associated with height in the expected direction more often than is expected by chance (p<5×10−28), and the odds ratios in the extreme samples were consistent with the effects estimated previously in population-based data. To examine more closely whether the common variants have the expected effects, we calculated a weighted allele score (WAS), which is a weighted prediction of height for each individual based on the previously estimated effect sizes of the common variants in the overall population. The average WAS is consistent with expectation in the tall individuals, but was not as extreme as expected in the shortest individuals (p<0.006), indicating that some of the short stature is explained by factors other than common genetic variation. The discrepancy was more pronounced (p<10−6) in the most extreme individuals (height<0.25 percentile). The results at the extreme short tails are consistent with a large number of models incorporating either rare genetic non-additive or rare non-genetic factors that decrease height. We conclude that common genetic variants are associated with height at the extremes as well as across the population, but that additional factors become more prominent at the shorter extreme.
Although there are many loci in the human genome that have been discovered to be significantly associated with height, it is unclear if these loci have similar effects in extremely tall and short individuals. Here, we examine hundreds of extremely tall and short individuals in two population-based cohorts to see if these known height determining loci are as predictive as expected in these individuals. We found that these loci are generally as predictive of height as expected in these individuals but that they begin to be less predictive in the most extremely short individuals. We showed that this result is consistent with models that not only include the common variants but also multiple low frequency genetic variants that substantially decrease height. However, this result is also consistent with non-additive genetic effects or rare non-genetic factors that substantially decrease height. This finding suggests the possibility of a major role of low frequency variants, particularly in individuals with extreme phenotypes, and has implications on whole-genome or whole-exome sequencing efforts to discover rare genetic variation associated with complex traits.
A cost-efficient way to increase power in a genetic association study is to pool controls from different sources. The genotyping effort can then be directed to large case series. The Nordic Control database, NordicDB, has been set up as a unique resource in the Nordic area and the data are available for authorized users through the web portal (http://www.nordicdb.org). The current version of NordicDB pools together high-density genome-wide SNP information from ∼5000 controls originating from Finnish, Swedish and Danish studies and shows country-specific allele frequencies for SNP markers. The genetic homogeneity of the samples was investigated using multidimensional scaling (MDS) analysis and pairwise allele frequency differences between the studies. The plot of the first two MDS components showed excellent resemblance to the geographical placement of the samples, with a clear NW–SE gradient. We advise researchers to assess the impact of population structure when incorporating NordicDB controls in association studies. This harmonized Nordic database presents a unique genome-wide resource for future genetic association studies in the Nordic countries.
common controls; genome-wide data; Nordic Control Database; population stratification
Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size.
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.
The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.
This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.