Hundreds of copy number variants are complex and multi-allelic, in that they have many structural alleles and have rearranged multiple times in the ancestors who contributed chromosomes to current humans. Not only are the relationships of these multi-allelic CNVs (mCNVs) to phenotypes generally unknown, but many mCNVs have not yet been described at the basic levels—alleles, allele frequencies, structural features—that support genetic investigation. To date, most reported disease associations to these variants have been ascertained through candidate gene studies. However, only a few associations have reached the level of acceptance defined by durable replications in many cohorts. This likely stems from longstanding challenges in making precise molecular measurements of the alleles individuals have at these loci. However, approaches for mCNV analysis are improving quickly, and some of the unique characteristics of mCNVs may assist future association studies. Their various structural alleles are likely to have different magnitudes of effect, creating a natural allelic series of growing phenotypic impact and giving investigators a set of natural predictions and testable hypotheses about the extent to which each allele of an mCNV predisposes to a phenotype. Also, mCNVs’ low-to-modest correlation to individual single-nucleotide polymorphisms (SNPs) may make it easier to distinguish between mCNVs and nearby SNPs as the drivers of an association signal, and perhaps, make it possible to preliminarily screen candidate loci, or the entire genome, for the many mCNV–disease relationships that remain to be discovered.
multi-allelic copy number variation; association; mCNV; CNV genotyping; optical mapping; ddPCR
Cancers arise from multiple acquired mutations, which presumably occur over many years. Early stages in cancer development might be present years before cancers become clinically apparent.
We analyzed data from whole-exome sequencing of DNA in peripheral-blood cells from 12,380 persons, unselected for cancer or hematologic phenotypes. We identified somatic mutations on the basis of unusual allelic fractions. We used data from Swedish national patient registers to follow health outcomes for 2 to 7 years after DNA sampling.
Clonal hematopoiesis with somatic mutations was observed in 10% of persons older than 65 years of age but in only 1% of those younger than 50 years of age. Detectable clonal expansions most frequently involved somatic mutations in three genes (DNMT3A, ASXL1, and TET2) that have previously been implicated in hematologic cancers. Clonal hematopoiesis was a strong risk factor for subsequent hematologic cancer (hazard ratio, 12.9; 95% confidence interval, 5.8 to 28.7). Approximately 42% of hematologic cancers in this cohort arose in persons who had clonality at the time of DNA sampling, more than 6 months before a first diagnosis of cancer. Analysis of bone marrow–biopsy specimens obtained from two patients at the time of diagnosis of acute myeloid leukemia revealed that their cancers arose from the earlier clones.
Clonal hematopoiesis with somatic mutations is readily detected by means of DNA sequencing, is increasingly common as people age, and is associated with increased risks of hematologic cancer and death. A subset of the genes that are mutated in patients with myeloid cancers is frequently mutated in apparently healthy persons; these mutations may represent characteristic early events in the development of hematologic cancers. (Funded by the National Human Genome Research Institute and others.)
We sought to obtain novel insights into schizophrenia pathogenesis by exploiting the association between the disorder and chromosomal copy number (CNV) burden. We combined data from 5,745 cases and 10,675 controls with other published datasets containing genome-wide CNV data. In this much-enlarged sample of 11,355 cases and 16,416 controls, we show for the first time that case CNVs are enriched for genes involved in GABAergic neurotransmission. Consistent with non-genetic reports of GABAergic deficits in schizophrenia, our findings now show disrupted GABAergic signaling is of direct causal relevance, rather than a secondary effect or due to confounding. Additionally, we independently replicate and greatly extend previous findings of CNV enrichment among genes involved in glutamatergic signaling. Given the strong functional links between the major inhibitory GABAergic and excitatory glutamatergic systems, our findings converge on a broad, coherent set of pathogenic processes, providing firm foundations for studies aimed at dissecting disease mechanisms.
•First genetic evidence for disruption of GABAergic signaling in schizophrenia•No evidence for CNV disruption of biological processes beyond the CNS•Support for involvement of NMDAR and ARC complexes in schizophrenia•Additional, independent evidence for disruption of glutamatergic signaling
Pocklington et al. show for the first time that CNVs from individuals with schizophrenia are enriched for genes involved in GABAergic neurotransmission. Previous findings of CNV enrichment among genes involved in glutamatergic signaling are independently replicated and greatly extended.
Obesity is heritable and predisposes to many diseases. To understand the genetic basis of obesity better, here we conduct a genome-wide association study and Metabochip meta-analysis of body mass index (BMI), a measure commonly used to define obesity and assess adiposity, in up to 339,224 individuals. This analysis identifies 97 BMI-associated loci (P < 5 × 10−8), 56 of which are novel. Five loci demonstrate clear evidence of several independent association signals, and many loci have significant effects on other metabolic phenotypes. The 97 loci account for ~2.7% of BMI variation, and genome-wide estimates suggest that common variation accounts for >20% of BMI variation. Pathway analyses provide strong support for a role of the central nervous system in obesity susceptibility and implicate new genes and pathways, including those related to synaptic function, glutamate signalling, insulin secretion/action, energy metabolism, lipid biology and adipogenesis.
Determining the chromosomal phase of pairs of sequence variants – the arrangement of specific alleles as haplotypes – is a routine challenge in molecular genetics. Here we describe Drop-Phase, a molecular method for quickly ascertaining the phase of pairs of DNA sequence variants (separated by 1-200 kb) without cloning or manual single-molecule dilution. In each Drop-Phase reaction, genomic DNA segments are isolated in tens of thousands of nanoliter-sized droplets together with allele-specific fluorescence probes, in a single reaction well. Physically linked alleles partition into the same droplets, revealing their chromosomal phase in the co-distribution of fluorophores across droplets. We demonstrated the accuracy of this method by phasing members of trios (revealing 100% concordance with inheritance information), and demonstrate a common clinical application by phasing CFTR alleles at genomic distances of 11–116 kb in the genomes of cystic fibrosis patients. Drop-Phase is rapid (requiring less than 4 hours), scalable (to hundreds of samples), and effective at long genomic distances (200 kb).
Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here, we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain providing biological plausibility for the findings. Many findings have the potential to provide entirely novel insights into aetiology, but associations at DRD2 and multiple genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that play important roles in immunity, providing support for the hypothesized link between the immune system and schizophrenia.
Schizophrenia is a highly heritable neuropsychiatric disorder of complex genetic etiology. Previous genome-wide surveys have revealed a greater burden of large, rare CNVs in schizophrenia cases and identified multiple rare recurrent CNVs that increase risk of schizophrenia although with incomplete penetrance and pleiotropic effects. Identification of additional recurrent CNVs and biological pathways enriched for schizophrenia CNVs requires greater sample sizes. We conducted a genome-wide survey for CNVs associated with schizophrenia using a Swedish national sample (4,719 cases and 5,917 controls). High-confidence CNV calls were generated using genotyping array intensity data and their effect on risk of schizophrenia was measured. Our data confirm increased burden of large, rare CNVs in schizophrenia cases as well as significant associations for recurrent 16p11.2 duplications, 22q11.2 deletions and 3q29 deletions. We report a novel association for 17q12 duplications (odds ratio=4.16, P=0.018), previously associated with autism and mental retardation but not schizophrenia. Intriguingly, gene set association analyses implicate biological pathways previously associated with schizophrenia through common variation and exome sequencing (calcium channel signaling and binding partners of the fragile X mental retardation protein). We found significantly increased burden of the largest CNVs (>500Kb) in genes present in the post-synaptic density, in genomic regions implicated via schizophrenia genome-wide association studies, and in gene products localized to mitochondria and cytoplasm. Our findings suggest that multiple lines of genomic inquiry – genome-wide screens for CNVs, common variation, and exonic variation – are converging on similar sets of pathways and/or genes.
schizophrenia; genetics; genomics; copy number variation; structural variation
Inherited alleles account for most of the genetic risk for schizophrenia. However, new (de novo) mutations, in the form of large chromosomal copy number changes, occur in a small fraction of cases and disproportionally disrupt genes encoding postsynaptic proteins. Here, we show that small de novo mutations, affecting one or a few nucleotides, are overrepresented among glutamatergic postsynaptic proteins comprising activity-regulated cytoskeleton-associated protein (ARC) and N-methyl-D-aspartate receptor (NMDAR) complexes. Mutations are additionally enriched in proteins that interact with these complexes to modulate synaptic strength, namely proteins regulating actin filament dynamics and those whose mRNAs are targets of fragile X mental retardation protein (FMRP). Genes affected by mutations in schizophrenia overlap those mutated in autism and intellectual disability, as do mutation-enriched synaptic pathways. Aligning our findings with a parallel case-control study, we demonstrate reproducible insights into aetiological mechanisms for schizophrenia and reveal pathophysiology shared with other neurodevelopmental disorders.
Several recurrent copy number variants (CNVs) have been shown to increase the risk of developing schizophrenia (SCZ), developmental delay (DD), autism spectrum disorders (ASD) and various congenital malformations (CM). Their penetrance for SCZ has been estimated to be modest. However, comparisons between their penetrance for SCZ or DD/ASD/CM, or estimates of the total penetrance for any of these disorders have not been made yet.
We use data from the largest available studies on SCZ and DD/ASD/CM, including a new sample of 6882 cases and 6316 controls, to estimate the frequencies of 70 implicated CNVs, in carriers with these disorders, in healthy controls and in the general population. On the basis of these frequencies we estimate their penetrance. We also estimate the strength of the selection pressure against CNVs and correlate this against their overall penetrance.
The rates of nearly all CNVs are higher in DD/ASD/CM, compared to SCZ. The penetrance of CNVs is at least several times higher for the development of a disorder from the group of DD/ASD/CM. The overall penetrance of SCZ-associated CNVs for developing any disorder is high, ranging between 10.6% and 100%.
CNVs associated with SCZ have high pathogenicity. The majority of the increased risk conferred by CNVs is towards the development of an earlier-onset disorder, such as DD/ASD/CM, rather than SCZ. The penetrance of CNVs correlates strongly with their selection coefficients. The improved estimates of penetrance will provide crucial information for genetic counselling.
CNV; schizophrenia; penetrance; developmental delay; autism spectrum disorder; selection
Advances in genome analysis, accompanied by the assembly of large patient cohorts, have made possible successful genetic analyses of polygenic brain disorders. If the resulting molecular clues, previously hidden in the genomes of affected individuals, are to yield useful information about pathogenesis and inform the discovery of new treatments, neurobiology will have to rise to many difficult challenges. Here we review the underlying logic of the genetic investigations, describe in more detail progress in schizophrenia and autism, and outline the challenges for neurobiology that lie ahead. We argue that technologies at the disposal of neuroscience are adequately advanced to begin to study the biology of common and devastating polygenic disorders.
By analyzing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we have demonstrated a polygenic burden primarily arising from rare (<1/10,000), disruptive mutations distributed across many genes. Especially enriched genesets included the voltage-gated calcium ion channel and the signaling complex formed by the activity-regulated cytoskeleton-associated (ARC) scaffold protein of the postsynaptic density (PSD), sets previously implicated by genome-wide association studies (GWAS) and copy-number variation (CNV) studies. Similar to reports in autism, targets of the fragile × mental retardation protein (FMRP, product of FMR1) were enriched for case mutations. No individual gene-based test achieved significance after correction for multiple testing and we did not detect any alleles of moderately low frequency (~0.5-1%) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene mapping paradigms in neuropsychiatric disease.
An increased rate of de novo copy number variants (CNVs) has been found in schizophrenia (SZ), autism and developmental delay. An increased rate has also been reported in bipolar affective disorder (BD). Here, in a larger BD sample, we aimed to replicate these findings and compare de novo CNVs between SZ and BD. We used Illumina microarrays to genotype 368 BD probands, 76 SZ probands and all their parents. Copy number variants were called by PennCNV and filtered for frequency (<1%) and size (>10 kb). Putative de novo CNVs were validated with the z-score algorithm, manual inspection of log R ratios (LRR) and qPCR probes. We found 15 de novo CNVs in BD (4.1% rate) and 6 in SZ (7.9% rate). Combining results with previous studies and using a cut-off of >100 kb, the rate of de novo CNVs in BD was intermediate between controls and SZ: 1.5% in controls, 2.2% in BD and 4.3% in SZ. Only the differences between SZ and BD and SZ and controls were significant. The median size of de novo CNVs in BD (448 kb) was also intermediate between SZ (613 kb) and controls (338 kb), but only the comparison between SZ and controls was significant. Only one de novo CNV in BD was in a confirmed SZ locus (16p11.2). Sporadic or early onset cases were not more likely to have de novo CNVs. We conclude that de novo CNVs play a smaller role in BD compared with SZ. Patients with a positive family history can also harbour de novo mutations.
Laboratory red blood cell (RBC) measurements are clinically important, heritable and differ among ethnic groups. To identify genetic variants that contribute to RBC phenotypes in African Americans (AAs), we conducted a genome-wide association study in up to ∼16 500 AAs. The alpha-globin locus on chromosome 16pter [lead SNP rs13335629 in ITFG3 gene; P < 1E−13 for hemoglobin (Hgb), RBC count, mean corpuscular volume (MCV), MCH and MCHC] and the G6PD locus on Xq28 [lead SNP rs1050828; P < 1E − 13 for Hgb, hematocrit (Hct), MCV, RBC count and red cell distribution width (RDW)] were each associated with multiple RBC traits. At the alpha-globin region, both the common African 3.7 kb deletion and common single nucleotide polymorphisms (SNPs) appear to contribute independently to RBC phenotypes among AAs. In the 2p21 region, we identified a novel variant of PRKCE distinctly associated with Hct in AAs. In a genome-wide admixture mapping scan, local European ancestry at the 6p22 region containing HFE and LRRC16A was associated with higher Hgb. LRRC16A has been previously associated with the platelet count and mean platelet volume in AAs, but not with Hgb. Finally, we extended to AAs the findings of association of erythrocyte traits with several loci previously reported in Europeans and/or Asians, including CD164 and HBS1L-MYB. In summary, this large-scale genome-wide analysis in AAs has extended the importance of several RBC-associated genetic loci to AAs and identified allelic heterogeneity and pleiotropy at several previously known genetic loci associated with blood cell traits in AAs.
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate–increasing and heart rate–decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
Structurally complex genomic regions are not yet understood. One such locus, human 17q21.31, contains a megabase-long inversion polymorphism1, many uncharacterized copynumber variations (CNVs), and markers that associate with female fertility1, female meiotic recombination1–3, and neurological disease4,5. Additionally, the inverted H2 form of 17q21.31 appears to be positively selected in Europeans1. We developed a population-genetic approach to reveal complex genome structures and identified nine segregating structural forms of 17q21.31. Both the H1 and H2 forms of the 17q21.31 inversion polymorphism contain independently derived, partial duplications of the KANSL1 (KIAA1267) gene; these duplications, which produce novel KANSL1 transcripts, have both recently risen to high allele frequencies (26% and 19%) in Europeans. An older H2 form, lacking such a duplication, is present at low frequency in Europeans and Central African hunter-gatherer populations. We show that complex genome structures can be analyzed by imputation from SNPs.
Although copy number variants (CNVs) are important in genomic medicine, CNVs have not been systematically assessed for many complex traits. Several large rare CNVs increase risk for schizophrenia (SCZ) and autism and often demonstrate pleiotropic effects; however, their frequencies in the general population and other complex traits are unknown. Genotyping large numbers of samples is essential for progress. Large cohorts from many different diseases are being genotyped using exome-focused arrays designed to detect uncommon or rare protein-altering sequence variation. Although these arrays were not designed for CNV detection, the hybridization intensity data generated in each experiment could, in principle, be used for gene-focused CNV analysis. Our goal was to evaluate the extent to which CNVs can be detected using data from one particular exome array (the Illumina Human Exome Bead Chip). We genotyped 9, 100 Swedish subjects (3, 962 cases with SCZ and 5, 138 controls) using both standard GWAS arrays and exome arrays. In comparison to CNVs detected using GWAS arrays, we observed high sensitivity and specificity for detecting genic CNVs ≥400 kb including known pathogentic CNVs along with replicating the literature finding that cases with SCZ had greater enrichment for genic CNVs. Our data confirm the association of SCZ with 16p11.2 duplications and 22q11.2 deletions and suggest a novel association with deletions at 11q12.2. Our results suggest the utility of exome focused arrays in surveying large genic CNVs in very large samples; and thereby open the door for new opportunities such as conducting well-powered CNV assessment and comparisons between different diseases. The use of a single platform also minimizes potential confounding factors that could impact accurate detection.
schizophrenia; copy number variation; structural variation; genotyping; Illumina; exome array
mRNA synthesis, processing, and destruction involve a complex series of molecular steps that are incompletely understood. Because the RNA intermediates in each of these steps have finite lifetimes, extensive mechanistic and dynamical information is encoded in total cellular RNA. Here we report the development of SnapShot-Seq, a set of computational methods that allow the determination of in vivo rates of pre-mRNA synthesis, splicing, intron degradation, and mRNA decay from a single RNA-Seq snapshot of total cellular RNA. SnapShot-Seq can detect in vivo changes in the rates of specific steps of splicing, and it provides genome-wide estimates of pre-mRNA synthesis rates comparable to those obtained via labeling of newly synthesized RNA. We used SnapShot-Seq to investigate the origins of the intrinsic bimodality of metazoan gene expression levels, and our results suggest that this bimodality is partly due to spillover of transcriptional activation from highly expressed genes to their poorly expressed neighbors. SnapShot-Seq dramatically expands the information obtainable from a standard RNA-Seq experiment.
Major international projects are now underway aimed at creating a comprehensive catalog of all genes responsible for the initiation and progression of cancer. These studies involve sequencing of matched tumor–normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here, we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false positive findings that overshadow true driver events. Here, we show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumor-normal pairs and discover extraordinary variation in (i) mutation frequency and spectrum within cancer types, which shed light on mutational processes and disease etiology, and (ii) mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and allow true cancer genes to rise to attention.
A number of copy number variants (CNVs) have been suggested as susceptibility factors for schizophrenia. For some of these the data remain equivocal, and the frequency in individuals with schizophrenia is uncertain.
To determine the contribution of CNVs at 15 schizophrenia-associated loci (a) using a large new data-set of patients with schizophrenia (n = 6882) and controls (n = 6316), and (b) combining our results with those from previous studies.
We used Illumina microarrays to analyse our data. Analyses were restricted to 520 766 probes common to all arrays used in the different data-sets.
We found higher rates in participants with schizophrenia than in controls for 13 of the 15 previously implicated CNVs. Six were nominally significantly associated (P<0.05) in this new data-set: deletions at 1q21.1, NRXN1, 15q11.2 and 22q11.2 and duplications at 16p11.2 and the Angelman/Prader-Willi Syndrome (AS/PWS) region. All eight AS/PWS duplications in patients were of maternal origin. When combined with published data, 11 of the 15 loci showed highly significant evidence for association with schizophrenia (P<4.1×10–4).
We strengthen the support for the majority of the previously implicated CNVs in schizophrenia. About 2.5% of patients with schizophrenia and 0.9% of controls carry a large, detectable CNV at one of these loci. Routine CNV screening may be clinically appropriate given the high rate of known deleterious mutations in the disorder and the comorbidity associated with these heritable mutations.
De novo mutation plays an important role in Autism Spectrum Disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes, and may also include nucleotide-substitution hotspots. We investigated global patterns of germline mutation by whole genome sequencing of monozygotic twins concordant for ASD and their parents. Mutation rates varied widely throughout the genome (by 100-fold) and could be explained by intrinsic characteristics of DNA sequence and chromatin structure. Dense clusters of mutations within individual genomes were attributable to compound mutation or gene conversion. Hypermutability was a characteristic of genes involved in ASD and other diseases. In addition, genes impacted by mutations in this study were associated with ASD in independent exome-sequencing datasets. Our findings suggest that regional hypermutation is a significant factor shaping patterns of genetic variation and disease risk in humans.
Large and rare copy number variants (CNVs) at several loci have been shown to increase risk for schizophrenia. Aiming to discover novel susceptibility CNV loci, we analyzed 6882 cases and 11 255 controls genotyped on Illumina arrays, most of which have not been used for this purpose before. We identified genes enriched for rare exonic CNVs among cases, and then attempted to replicate the findings in additional 14 568 cases and 15 274 controls. In a combined analysis of all samples, 12 distinct loci were enriched among cases with nominal levels of significance (P < 0.05); however, none would survive correction for multiple testing. These loci include recurrent deletions at 16p12.1, a locus previously associated with neurodevelopmental disorders (P = 0.0084 in the discovery sample and P = 0.023 in the replication sample). Other plausible candidates include non-recurrent deletions at the glutamate transporter gene SLC1A1, a CNV locus recently suggested to be involved in schizophrenia through linkage analysis, and duplications at 1p36.33 and CGNL1. A burden analysis of large (>500 kb), rare CNVs showed a 1.2% excess in cases after excluding known schizophrenia-associated loci, suggesting that additional susceptibility loci exist. However, even larger samples are required for their discovery.
Summary: zCall is a variant caller specifically designed for calling rare single-nucleotide polymorphisms from array-based technology. This caller is implemented as a post-processing step after a default calling algorithm has been applied. The algorithm uses the intensity profile of the common allele homozygote cluster to define the location of the other two genotype clusters. We demonstrate improved detection of rare alleles when applying zCall to samples that have both Illumina Infinium HumanExome BeadChip and exome sequencing data available.
Supplementary data are available at Bioinformatics online.
Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces by utilizing the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning four million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified eight large novel inter-chromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed in RNA and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.
Red blood cell, white blood cell, and platelet measures, including their count, sub-type and volume, are important diagnostic and prognostic clinical parameters for several human diseases. To identify novel loci associated with hematological traits, and compare the architecture of these phenotypes between ethnic groups, the CARe Project genotyped 49,094 single nucleotide polymorphisms (SNPs) that capture variation in ~2,100 candidate genes in DNA of 23,439 Caucasians and 7,112 African Americans from five population-based cohorts. We found strong novel associations between erythrocyte phenotypes and the glucose-6 phosphate dehydrogenase (G6PD) A-allele in African Americans (rs1050828, P < 2.0 × 10−13, T-allele associated with lower red blood cell count, hemoglobin, and hematocrit, and higher mean corpuscular volume), and between platelet count and a SNP at the tropomyosin-4 (TPM4) locus (rs8109288, P = 3.0 × 10−7 in Caucasians; P = 3.0 × 10−7 in African Americans, T-allele associated with lower platelet count). We strongly replicated many genetic associations to blood cell phenotypes previously established in Caucasians. A common variant of the α-globin (HBA2-HBA1) locus was associated with red blood cell traits in African Americans, but not in Caucasians (rs1211375, P < 7 × 10−8, A-allele associated with lower hemoglobin, mean corpuscular hemoglobin, and mean corpuscular volume). Our results show similarities but also differences in the genetic regulation of hematological traits in European- and African-derived populations, and highlight the role of natural selection in shaping these differences.
Genome sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2,951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in non-essential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes, and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.