The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal Mendelian Long QT Syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals we identified 35 common variant QT interval loci, that collectively explain ∼8-10% of QT variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 novel QT loci in 298 unrelated LQTS probands identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode for proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies novel candidate genes for ventricular arrhythmias, LQTS,and SCD.
genome-wide association study; QT interval; Long QT Syndrome; sudden cardiac death; myocardial repolarization; arrhythmias
Both polygenicity (i.e., many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.
A transition from fetal hemoglobin (HbF) to adult hemoglobin (HbA) normally occurs within a few months after birth. Increased production of HbF after this period of infancy ameliorates clinical symptoms of the major disorders of adult β-hemoglobin: β-thalassemia and sickle cell disease. The transcription factor BCL11A silences HbF and has been an attractive therapeutic target for increasing HbF levels; however, it is not clear to what extent BCL11A inhibits HbF production or mediates other developmental functions in humans. Here, we identified and characterized 3 patients with rare microdeletions of 2p15-p16.1 who presented with an autism spectrum disorder and developmental delay. Moreover, these patients all exhibited substantial persistence of HbF but otherwise retained apparently normal hematologic and immunologic function. Of the genes within 2p15-p16.1, only BCL11A was commonly deleted in all of the patients. Evaluation of gene expression data sets from developing and adult human brains revealed that BCL11A expression patterns are similar to other genes associated with neurodevelopmental disorders. Additionally, common SNPs within the second intron of BCL11A are strongly associated with schizophrenia. Together, the study of these rare patients and orthogonal genetic data demonstrates that BCL11A plays a central role in silencing HbF in humans and implicates BCL11A as an important factor for neurodevelopment.
Development; Genetics; Hematology; Neuroscience
Association mapping and candidate gene studies within IBD linkage regions, as well as genome-wide association studies in CD have led to the discovery of multiple risk genes, but these only explain a fraction of the genetic susceptibility observed in IBD. We have thus been pursuing a region on chromosome 3p21-22 showing linkage to CD and UC using a gene-centric association mapping approach. We identified twelve functional candidate genes by searching for literature co-citations with relevant keywords and for gene expression patterns consistent with immune/intestinal function. We then performed an association study composed of a screening phase, where tagging SNPs were evaluated in 1020 IBD patients, and an independent replication phase in 745 IBD patients. These analyses identified and replicated significant association to IBD for four SNPs within a 1.2 Mb LD region. We then identified a non-synonymous coding variant (rs3197999, R689C) in the macrophage stimulating 1 (MST1) gene (p-value 3.62×10−6) that accounts for the association signal, and shows association to both CD and UC. MST1 encodes MSP, a protein regulating the innate immune responses to bacterial ligands. R689C is predicted to interfere with MSP binding to its receptor, suggesting a role for this gene in the pathogenesis of IBD.
Background and Aims
More than 80% Crohn’s disease (CD) patients will require surgery. Surgery is not curative and rates of re-operation are high. Identification of genetic variants associated with repeat surgery would allow risk stratification of patients who may benefit from early aggressive therapy and/or post-operative prophylactic treatment.
CD patients who had undergone at least one CD-related bowel resection were identified from the Prospective Registry in IBD Study at Massachusetts General Hospital (PRISM). The primary outcome was surgical recurrence. Covariates and potential interactions were assessed using the Cox proportional hazard model. Kaplan-Meier curves for time to surgical recurrence were developed for each genetic variant and analyzed with the log-rank test.
194 patients were identified who had at least 1 resection. Of these, 69 had two or more resections. Clinical predictors for repeat surgery were stricturing (HR 4.18, p=0.022) and penetrating behavior (HR 3.97, p=0.024). Smoking cessation was protective for repeat surgery (HR 0.45, p=0.018). SMAD3 homozygosity for the risk allele was also independently associated with increased risk of repeat surgery (HR 4.04, p=0.001). NOD2 was not associated with increased risk of surgical recurrence.
Stricturing and penetrating behavior were associated with increased risk of surgical recurrence, while smoking cessation was associated with a decreased risk of surgical recurrence. A novel association between SMAD3 and increased risk of repeat operation and shorter time to repeat surgery was observed. This finding is of particular interest as SMAD3 may represent a new therapeutic target specifically for prevention of post-surgical disease recurrence.
Crohn’s disease; Surgery for IBD; Genotype/Clinical Phenotype; NOD2; SMAD3
Genome-wide association studies of the related chronic inflammatory bowel diseases (IBD) known as Crohn’s disease and ulcerative colitis have shown strong evidence of association to the major histocompatibility complex (MHC). This region encodes a large number of immunological candidates, including the antigen-presenting classical HLA molecules1. Studies in IBD have indicated that multiple independent associations exist at HLA and non-HLA genes, but lacked the statistical power to define the architecture of association and causal alleles2,3. To address this, we performed high-density SNP typing of the MHC in >32,000 patients with IBD, implicating multiple HLA alleles, with a primary role for HLA-DRB1*01:03 in both Crohn’s disease and ulcerative colitis. Significant differences were observed between these diseases, including a predominant role of class II HLA variants and heterozygous advantage observed in ulcerative colitis, suggesting an important role of the adaptive immune response to the colonic environment in the pathogenesis of IBD.
Alopecia areata (AA) is a prevalent autoimmune disease with ten known susceptibility loci. Here we perform the first meta-analysis in AA by combining data from two genome-wide association studies (GWAS), and replication with supplemented ImmunoChip data for a total of 3,253 cases and 7,543 controls. The strongest region of association is the MHC, where we fine-map 4 independent effects, all implicating HLA-DR as a key etiologic driver. Outside the MHC, we identify two novel loci that exceed statistical significance, containing ACOXL/BCL2L11(BIM) (2q13); GARP (LRRC32) (11q13.5), as well as a third nominally significant region SH2B3(LNK)/ATXN2 (12q24.12). Candidate susceptibility gene expression analysis in these regions demonstrates expression in relevant immune cells and the hair follicle. We integrate our results with data from seven other autoimmune diseases and provide insight into the alignment of AA within these disorders. Our findings uncover new molecular pathways disrupted in AA, including autophagy/apoptosis, TGFß/Tregs and JAK kinase signaling, and support the causal role of aberrant immune processes in AA.
Progressive myoclonus epilepsies (PMEs) are a group of rare, inherited disorders manifesting with action myoclonus, tonic-clonic seizures, and ataxia. We exome-sequenced 84 unrelated PME patients of unknown cause and molecularly solved 26 cases (31%). Remarkably, a recurrent de novo mutation c.959G>A (p.Arg320His) in KCNC1 was identified as a novel major cause for PME. Eleven unrelated exome-sequenced (13%) and two patients in a secondary cohort (7%) had this mutation. KCNC1 encodes KV3.1, a subunit of the KV3 voltage-gated K+ channels, major determinants of high-frequency neuronal firing. Functional analysis of the p.Arg320His mutant channel revealed a dominant-negative loss-of-function effect. Ten patients had pathogenic mutations in known PME-associated genes (NEU1, NHLRC1, AFG3L2, EPM2A, CLN6, SERPINI1). Identification of mutations in PRNP, SACS, and TBC1D24 expand their phenotypic spectrum to PME. These findings provide important insights into the molecular genetic basis of PME and reveal the role of de novo mutations in this disease entity.
The polymorphism ATG16L1 T300A, associated with increased risk of Crohn’s disease, impairs pathogen defense mechanisms including selective autophagy, but specific pathway interactions altered by the risk allele remain unknown. Here, we use perturbational profiling of human peripheral blood cells to reveal that CLEC12A is regulated in an ATG16L1-T300A-dependent manner. Antibacterial autophagy is impaired in CLEC12A-deficient cells, and this effect is exacerbated in the presence of the ATG16L1∗300A risk allele. Clec12a−/− mice are more susceptible to Salmonella infection, supporting a role for CLEC12A in antibacterial defense pathways in vivo. CLEC12A is recruited to sites of bacterial entry, bacteria-autophagosome complexes, and sites of sterile membrane damage. Integrated genomics identified a functional interaction between CLEC12A and an E3-ubiquitin ligase complex that functions in antibacterial autophagy. These data identify CLEC12A as early adaptor molecule for antibacterial autophagy and highlight perturbational profiling as a method to elucidate defense pathways in complex genetic disease.
•Integrated genomics reveals risk-allele-specific autophagy pathway interactions•CLEC12A is important for antibacterial autophagy in epithelial and immune cells•CLEC12A knockdown amplifies antibacterial autophagy defects in ATG16L1
∗300A cells•Clec12a−/− mice are more susceptible to Salmonella infection in vivo
Although genome-wide association studies are valuable in identifying disease-associated loci, they produce only a partial view of pathogenesis. Using integrated, systems-level approaches to pinpoint genes that interact with the Crohn’s-disease-associated variant ATG16L1 T300A, Begun et al. identify CLEC12A as an innate defense gene that functions in antibacterial autophagy.
The genetic architecture of autism spectrum disorder involves the interplay of common and rare variation and their impact on hundreds of genes. Using exome sequencing, analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, and a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic, transcriptional, and chromatin remodeling pathways. These include voltage-gated ion channels regulating propagation of action potentials, pacemaking, and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodelers, prominently histone post-translational modifications involving lysine methylation/demethylation.
The inflammatory bowel diseases Crohn's disease and ulcerative colitis are common, chronic disorders that cause abdominal pain, diarrhea, and gastrointestinal bleeding. To identify genetic factors that might contribute to these disorders, we performed a genome-wide association study. We found a highly significant association between Crohn's disease and the IL23R gene on chromosome 1p31, which encodes a subunit of the receptor for the proinflammatory cytokine interleukin-23. An uncommon coding variant (rs11209026, c.1142G>A, p.Arg381Gln) confers strong protection against Crohn's disease, and additional noncoding IL23R variants are independently associated. Replication studies confirmed IL23R associations in independent cohorts of patients with Crohn's disease or ulcerative colitis. These results and previous studies on the proinflammatory role of IL-23 prioritize this signaling pathway as a therapeutic target in inflammatory bowel disease.
Exome sequencing is a promising tool for gene mapping in Mendelian disorders. We utilized this technique in an attempt to identify novel genes underlying monogenic dyslipidemias.
Methods and Results
We performed exome sequencing on 213 selected family members from 41 kindreds with suspected Mendelian inheritance of extreme levels of low-density lipoprotein (LDL) cholesterol (after candidate gene sequencing excluded known genetic causes for high LDL cholesterol families) or high-density lipoprotein (HDL) cholesterol. We used standard analytic approaches to identify candidate variants and also assigned a polygenic score to each individual in order to account for their burden of common genetic variants known to influence lipid levels. In nine families, we identified likely pathogenic variants in known lipid genes (ABCA1, APOB, APOE, LDLR, LIPA, and PCSK9); however, we were unable to identify obvious genetic etiologies in the remaining 32 families despite follow-up analyses. We identified three factors that limited novel gene discovery: (1) imperfect sequencing coverage across the exome hid potentially causal variants; (2) large numbers of shared rare alleles within families obfuscated causal variant identification; and (3) individuals from 15% of families carried a significant burden of common lipid-related alleles, suggesting complex inheritance can masquerade as monogenic disease.
We identified the genetic basis of disease in nine of 41 families; however, none of these represented novel gene discoveries. Our results highlight the promise and limitations of exome sequencing as a discovery technique in suspected monogenic dyslipidemias. Considering the confounders identified may inform the design of future exome sequencing studies.
genetics; human; DNA sequencing; Exome sequencing; lipids; Mendelian Genetics
To assess a potential diagnostic and therapeutic biomarker for age-related macular degeneration (AMD), we sequenced the complement factor I gene (CFI) in 2266 individuals with AMD and 1400 without, identifying 231 individuals with rare genetic variants. We evaluated the functional impact by measuring circulating serum factor I (FI) protein levels in individuals with and without rare CFI variants. The burden of very rare (frequency <1/1000) variants in CFI was strongly associated with disease (P = 1.1 × 10−8). In addition, we examined eight coding variants with counts ≥5 and saw evidence for association with AMD in three variants. Individuals with advanced AMD carrying a rare CFI variant had lower mean FI compared with non-AMD subjects carrying a variant (P < 0.001). Further new evidence that FI levels drive AMD risk comes from analyses showing individuals with a CFI rare variant and low FI were more likely to have advanced AMD (P = 5.6 × 10−5). Controlling for covariates, low FI increased the risk of advanced AMD among those with a variant compared with individuals without advanced AMD with a rare CFI variant (OR 13.6, P = 1.6 × 10−4), and also compared with control individuals without a rare CFI variant (OR 19.0, P = 1.1 × 10−5). Thus, low FI levels are strongly associated with rare CFI variants and AMD. Enhancing FI activity may be therapeutic and measuring FI provides a screening tool for identifying patients who are most likely to benefit from complement inhibitory therapy.
Clozapine is a particularly effective antipsychotic medication but its use is curtailed by the risk of clozapine-induced agranulocytosis/granulocytopenia (CIAG), a severe adverse drug reaction occurring in up to 1% of treated individuals. Identifying genetic risk factors for CIAG could enable safer and more widespread use of clozapine. Here we perform the largest and most comprehensive genetic study of CIAG to date by interrogating 163 cases using genome-wide genotyping and whole-exome sequencing. We find that two loci in the major histocompatibility complex are independently associated with CIAG: a single amino acid in HLA-DQB1 (126Q) (P=4.7×10−14, odds ratio, OR=0.19, 95% CI 0.12–0.29) and an amino acid change in the extracellular binding pocket of HLA-B (158T) (P=6.4×10−10, OR=3.3, 95% CI 2.3–4.9). These associations dovetail with the roles of these genes in immunogenetic phenotypes and adverse drug responses for other medications, and provide insight into the pathophysiology of CIAG.
Spontaneously arising (‘de novo’) mutations play an important role in medical genetics. For diseases with extensive locus heterogeneity – such as autism spectrum disorders (ASDs) – the signal from de novo mutations (DNMs) is distributed across many genes, making it difficult to distinguish disease-relevant mutations from background variation. We provide a statistical framework for the analysis of DNM excesses per gene and gene set by calibrating a model of de novo mutation. We applied this framework to DNMs collected from 1,078 ASD trios and – while affirming a significant role for loss-of-function (LoF) mutations – found no excess of de novo LoF mutations in cases with IQ above 100, suggesting that the role of DNMs in ASD may reside in fundamental neurodevelopmental processes. We also used our model to identify ~1,000 genes that are significantly lacking functional coding variation in non-ASD samples and are enriched for de novo LoF mutations identified in ASD cases.
Nemaline myopathy (NM) is a genetic muscle disorder characterized by muscle dysfunction and electron-dense protein accumulations (nemaline bodies) in myofibers. Pathogenic mutations have been described in 9 genes to date, but the genetic basis remains unknown in many cases. Here, using an approach that combined whole-exome sequencing (WES) and Sanger sequencing, we identified homozygous or compound heterozygous variants in LMOD3 in 21 patients from 14 families with severe, usually lethal, NM. LMOD3 encodes leiomodin-3 (LMOD3), a 65-kDa protein expressed in skeletal and cardiac muscle. LMOD3 was expressed from early stages of muscle differentiation; localized to actin thin filaments, with enrichment near the pointed ends; and had strong actin filament-nucleating activity. Loss of LMOD3 in patient muscle resulted in shortening and disorganization of thin filaments. Knockdown of lmod3 in zebrafish replicated NM-associated functional and pathological phenotypes. Together, these findings indicate that mutations in the gene encoding LMOD3 underlie congenital myopathy and demonstrate that LMOD3 is essential for the organization of sarcomeric thin filaments in skeletal muscle.
Genome-wide association studies (GWAS) have identified thousands of loci associated wtih complex traits, but it is challenging to pinpoint causal genes in these loci and to exploit subtle association signals. We used tissue-specific quantitative interaction proteomics to map a network of five genes involved in the Mendelian disorder long QT syndrome (LQTS). We integrated the LQTS network with GWAS loci from the corresponding common complex trait, QT interval variation, to identify candidate genes that were subsequently confirmed in Xenopus laevis oocytes and zebrafish. We used the LQTS protein network to filter weak GWAS signals by identifying single nucleotide polymorphisms (SNPs) in proximity to genes in the network supported by strong proteomic evidence. Three SNPs passing this filter reached genome-wide significance after replication genotyping. Overall, we present a general strategy to propose candidates in GWAS loci for functional studies and to systematically filter subtle association signals using tissue-specific quantitative interaction proteomics.
Contactins and Contactin-Associated Proteins, and Contactin-Associated
Protein-Like 2 (CNTNAP2) in particular, have been widely cited
as autism risk genes based on findings from homozygosity mapping, molecular
cytogenetics, copy number variation analyses, and both common and rare single
nucleotide association studies. However, data specifically with regard to the
contribution of heterozygous single nucleotide variants (SNVs) have been
inconsistent. In an effort to clarify the role of rare point mutations in
CNTNAP2 and related gene families, we have conducted
targeted next-generation sequencing and evaluated existing sequence data in
cohorts totaling 2704 cases and 2747 controls. We find no evidence for
statistically significant association of rare heterozygous mutations in any of
the CNTN or CNTNAP genes, including
CNTNAP2, placing marked limits on the scale of their
plausible contribution to risk.
Prior genetic studies of autism spectrum disorders (ASD) have demonstrated a
role for Contactin-Associated Protein-Like 2 protein
(CNTNAP2), as well as for other genes that code for
Contactin proteins and Contactin-Associated Proteins. While there is strong
evidence that the loss of two copies of the gene CNTNAP2
causes autism and epilepsy, the impact of mutations in only one copy of this
gene, or in only one copy of related genes, is less clear. We performed
large-scale DNA sequencing on a cohort of over 1000 autism patients and
nearly 1000 unaffected controls and did not find significant association at
any of 6 genes in the Contactin family and 4 genes in the
Contactin-Associated Protein family when looking for rare mutations that are
predicted to be disruptive to the protein’s function and are present
in only one copy of the respective gene. We then combined the data on
CNTNAP2 from our laboratory with
CNTNAP2 data from another research laboratory, and
found no significant association of deleterious heterozygous mutations at
this gene. Given the paucity of nonsense mutations identified across the
combined sample, an assessment of their impact was circumscribed. However,
missense heterozygous mutations in CNTNAP2 and in other
Contactins or Contactin-Associated Proteins are not elevated in affected
individuals versus controls and, consequently, do not have a marked impact,
as a group, on the risk for autism spectrum disorders.
Congenital diarrheal disorders (CDDs) are a collection of rare, heterogeneous enteropathies with early onset and often severe outcomes. Here, we report a family of Ashkenazi Jewish descent, with 2 out of 3 children affected by CDD. Both affected children presented 3 days after birth with severe, intractable diarrhea. One child died from complications at age 17 months. The second child showed marked improvement, with resolution of most symptoms at 10 to 12 months of age. Using exome sequencing, we identified a rare splice site mutation in the DGAT1 gene and found that both affected children were homozygous carriers. Molecular analysis of the mutant allele indicated a total loss of function, with no detectable DGAT1 protein or activity produced. The precise cause of diarrhea is unknown, but we speculate that it relates to abnormal fat absorption and buildup of DGAT substrates in the intestinal mucosa. Our results identify DGAT1 loss-of-function mutations as a rare cause of CDDs. These findings prompt concern for DGAT1 inhibition in humans, which is being assessed for treating metabolic and other diseases.
Autophagy is an evolutionarily conserved catabolic process that directs cytoplasmic proteins, organelles and microbes to lysosomes for degradation. Autophagy acts at the intersection of pathways involved in cellular stress, host defense, and modulation of inflammatory and immune responses; however, the details of how the autophagy network intersects with these processes remain largely undefined. Given the role of autophagy in several human diseases, it is important to determine the extent to which modulators of autophagy also modify inflammatory or immune pathways, and whether it is possible to modulate a subset of these pathways selectively. Here, we identify small-molecule inducers of basal autophagy (including several FDA-approved drugs) and characterize their effects on IL-1β production, autophagic engulfment and killing of intracellular bacteria, and development of Treg, TH17, and TH1 subsets from naïve T cells. Autophagy inducers with distinct, selective activity profiles were identified that reveal the functional architecture of connections between autophagy, and innate and adaptive immunity. In macrophages from mice bearing a conditional deletion of the essential autophagy gene Atg16L1, the small molecules inhibit IL-1β production to varying degrees suggesting that individual compounds may possess both autophagy-dependent and autophagy-independent activity on immune pathways. The small molecule autophagy inducers constitute useful probes to test the contributions of autophagy-related pathways in diseases marked by impaired autophagy or elevated IL-1β, and to test novel therapeutic hypotheses.
Two common sources of DNA for whole exome sequencing (WES) are whole blood (WB) and immortalized lymphoblastoid cell line (LCL). However, it is possible that LCLs have a substantially higher rate of mutation than WB, causing concern for their use in sequencing studies. We compared results from paired WB and LCL DNA samples for 16 subjects, using LCLs of low passage number (<5). Using a standard analysis pipeline we detected a large number of discordant genotype calls (approximately 50 per subject) that we segregated into categories of “confidence” based on read-level quality metrics. From these categories and validation by Sanger sequencing, we estimate that the vast majority of the candidate differences were false positives and that our categories were effective in predicting valid sequence differences, including LCLs with putative mosaicism for the non-reference allele (3–4 per exome). These results validate the use of DNA from LCLs of low passage number for exome sequencing.
graphical diagnostics; lymphoblastoid cell line; mosaicism; sequence variant call; strand bias; somatic mutation
Background & Aims
Genome-wide association studies (GWASs) have identified 140 Crohn’s disease (CD) susceptibility loci. For most loci, the variants that cause disease are not known and the genes affected by these variants have not been identified. We aimed to identify variants that cause CD through detailed sequencing, genetic association, expression, and functional studies.
We sequenced whole exomes of 42 unrelated subjects with Crohn’s disease (CD) and 5 healthy individuals (controls), and then filtered single-nucleotide variants by incorporating association results from meta-analyses of CD GWASs and in silico mutation effect prediction algorithms. We then genotyped 9348 patients with CD, 2868 with ulcerative colitis, and 14,567 controls, and associated variants analyzed in functional studies using materials from patients and controls and in vitro model systems.
We identified rare missense mutations in PR domain-containing1 (PRDM1) and associated these with CD. These increased proliferation of T cells and secretion of cytokines upon activation, and increased expression of the adhesion molecule L-selectin. A common CD risk allele, identified in GWASs, correlated with reduced expression of PRDM1 in ileal biopsies and peripheral blood mononuclear cells (combined P=1.6×0−8). We identified an association between CD and a common missense variant, Val248Ala, in nuclear domain 10 protein 52 (NDP52) (P=4.83×10−9). We found that this variant impairs the regulatory functions of NDP52 to inhibit NFκB activation of genes that regulate inflammation and affect stability of proteins in toll-like receptor pathways.
We have extended GWAS results and provide evidence that variants in PRDM1 and NDP52 determine susceptibility to CD. PRDM1 maps adjacent to a CD interval identified in GWASs and encodes a transcription factor expressed by T and B cells. NDP52 is an adaptor protein that functions in selective autophagy of intracellular bacteria and signaling molecules, supporting the role for autophagy in pathogenesis of CD.
inflammatory bowel disease; whole-exome sequencing; complex disease
Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5–5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10−8) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P = 1.5×10−117). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR = 0.84, P = 3×10−4), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.
We explored the coding regions of 3,000 Finnish individuals with 3,000 non-Finnish Europeans (NFEs) using whole-exome sequence data, in order to understand how an individual from a bottlenecked population might differ from an individual from an out-bred population. We provide empirical evidence that there are more rare and low-frequency deleterious alleles in Finns compared to NFEs, such that an average Finn has almost twice as many low-frequency complete knockouts of a gene. As such, we hypothesized that some of these low-frequency loss-of-function variants might have important medical consequences in humans and genotyped 83 of these variants in 36,000 Finns. In doing so, we discovered that completely knocking out the TSFM gene might result in inviability or a very severe phenotype in humans and that knocking out the LPA gene might confer protection against coronary heart diseases, suggesting that LPA is likely to be a good potential therapeutic target.