DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally employed long (400–800 bp) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intra-species genetic variation. We report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterise four million SNPs and four hundred thousand structural variants, many of which are previously unknown. Our approach is effective for accurate, rapid and economical whole genome re-sequencing and many other biomedical applications.
StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions. Data provided by these searches will elucidate gene family evolution in early animals. Unique research tools, including a Nematostella genetic stock library, a primer library, a literature repository and a gene expression library will provide support to the burgeoning Nematostella research community. The development of StellaBase accompanies significant upgrades to CnidBase, the Cnidarian Evolutionary Genomics Database. With the completion of the first sequenced cnidarian genome, genome comparison tools have been added to CnidBase. In addition, StellaBase provides a framework for the integration of additional species-specific databases into CnidBase. StellaBase is available at .
Recent efforts have attempted to describe the population structure of common chimpanzee, focusing on four subspecies: Pan troglodytes verus, P. t. ellioti, P. t. troglodytes, and P. t. schweinfurthii. However, few studies have pursued the effects of natural selection in shaping their response to pathogens and reproduction. Whey acidic protein (WAP) four-disulfide core domain (WFDC) genes and neighboring semenogelin (SEMG) genes encode proteins with combined roles in immunity and fertility. They display a strikingly high rate of amino acid replacement (dN/dS), indicative of adaptive pressures during primate evolution. In human populations, three signals of selection at the WFDC locus were described, possibly influencing the proteolytic profile and antimicrobial activities of the male reproductive tract. To evaluate the patterns of genomic variation and selection at the WFDC locus in chimpanzees, we sequenced 17 WFDC genes and 47 autosomal pseudogenes in 68 chimpanzees (15 P. t. troglodytes, 22 P. t. verus, and 31 P. t. ellioti). We found a clear differentiation of P. t. verus and estimated the divergence of P. t. troglodytes and P. t. ellioti subspecies in 0.173 Myr; further, at the WFDC locus we identified a signature of strong selective constraints common to the three subspecies in WFDC6—a recent paralog of the epididymal protease inhibitor EPPIN. Overall, chimpanzees and humans do not display similar footprints of selection across the WFDC locus, possibly due to different selective pressures between the two species related to immune response and reproductive biology.
WFDC; natural selection; chimpanzees; serine protease inhibitor; reproduction; innate immunity
The Hawaiian honeycreepers are an avian adaptive radiation containing many endangered and extinct species. They display a dramatic range of phenotypic variation and are a model system for studies of evolution, conservation, disease dynamics and population genetics. Development of a genome-scale resources for this group would augment the quality of research focusing on Hawaiian honeycreepers and facilitate comparative avian genomic research.
We assembled the genome sequence of a Hawaii amakihi (Hemignathus virens),and identified ~3.9 million single nucleotide polymorphisms (SNPs) in the genome. Using the amakihi genome as a reference, we also identified ~156,000 SNPs in RAD tag (restriction site associated DNA) sequencing of five honeycreeper species (palila [Loxioides bailleui], Nihoa finch [Telespiza ultima], iiwi [Vestiaria coccinea], apapane [Himatione sanguinea], and amakihi). SNPs are distributed throughout the amakihi genome, and the individual sequenced shows several large regions of low heterozygosity on chromosomes 1, 5, 6, 8 and 11. SNPs from RAD tag sequencing were also found throughout the genome but were found to be more densely located on microchromosomes, apparently a result of differential distribution of the particular site recognized by restriction enzyme BseXI.
The amakihi genome sequence will be useful for comparative avian genomics research and provides a significant resource for studies in such areas as disease ecology, evolution, and conservation genetics. The genome sequences will enable mapping of transcriptome data for honeycreepers and comparison of gene sequences between avian taxa. Researchers will be able to use the large number of SNP markers to genotype honeycreepers in regions of interest or across the whole genome. There are enough markers to enable use of methods such as genome-wide association studies (GWAS) that will allow researchers to make connections between phenotypic diversity of honeycreepers and specific genetic variants. Genome-wide markers will also help resolve phylogenetic and population genetic questions in honeycreepers.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1098) contains supplementary material, which is available to authorized users.
Genome; Hawaiian honeycreepers; SNP; RAD tags; Drepanidines; Hemignathus virens
Endometrial cancer (EC) is the 8th leading cause of cancer death amongst American women. Most ECs are endometrioid, serous, or clear cell carcinomas, or an admixture of histologies. Serous and clear ECs are clinically aggressive tumors for which alternative therapeutic approaches are needed. The purpose of this study was to search for somatic mutations in the tyrosine kinome of serous and clear cell ECs, because mutated kinases can point to potential therapeutic targets.
In a mutation discovery screen, we PCR amplified and Sanger sequenced the exons encoding the catalytic domains of 86 tyrosine kinases from 24 serous, 11 clear cell, and 5 mixed histology ECs. For somatically mutated genes, we next sequenced the remaining coding exons from the 40 discovery screen tumors and sequenced all coding exons from another 72 ECs (10 clear cell, 21 serous, 41 endometrioid). We assessed the copy number of mutated kinases in this cohort of 112 tumors using quantitative real time PCR, and we used immunoblotting to measure expression of these kinases in endometrial cancer cell lines.
Overall, we identified somatic mutations in TNK2 (tyrosine kinase non-receptor, 2) and DDR1 (discoidin domain receptor tyrosine kinase 1) in 5.3% (6 of 112) and 2.7% (3 of 112) of ECs. Copy number gains of TNK2 and DDR1 were identified in another 4.5% and 0.9% of 112 cases respectively. Immunoblotting confirmed TNK2 and DDR1 expression in endometrial cancer cell lines. Three of five missense mutations in TNK2 and one of two missense mutations in DDR1 are predicted to impact protein function by two or more in silico algorithms. The TNK2P761Rfs*72 frameshift mutation was recurrent in EC, and the DDR1R570Q missense mutation was recurrent across tumor types.
This is the first study to systematically search for mutations in the tyrosine kinome in clear cell endometrial tumors. Our findings indicate that high-frequency somatic mutations in the catalytic domains of the tyrosine kinome are rare in clear cell ECs. We uncovered ten new mutations in TNK2 and DDR1 within serous and endometrioid ECs, thus providing novel insights into the mutation spectrum of each gene in EC.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2407-14-884) contains supplementary material, which is available to authorized users.
Endometrial; Cancer; Mutation; TNK2; ACK1; DDR1; Copy number; Tyrosine kinase; Tyrosine kinome
Malignant hyperthermia susceptibility (MHS) is a life-threatening, inherited disorder of muscle calcium metabolism, triggered by anesthetics and depolarizing muscle relaxants. An unselected cohort was screened for MHS mutations using exome sequencing. Our aim was to pilot a strategy for the RYR1 and CACNA1S genes.
Exome sequencing was performed on 870 volunteers not ascertained for MHS. Variants in RYR1 and CACNA1S were annotated using an algorithm that filtered results based on mutation type, frequency, and information in mutation databases. Variants were scored on a six-point pathogenicity scale. Medical histories and pedigrees were reviewed for malignant hyperthermia and related disorders.
We identified 70 RYR1 and 53 CACNA1S variants among 870 exomes. Sixty-three RYR1 and 41 CACNA1S variants passed the quality and frequency metrics but we excluded synonymous variants. In RYR1, we identified 65 missense mutations, one nonsense, two that affected splicing, and one non frameshift indel. In CACNA1S, 48 missense, one frameshift deletion, one splicing and one non frameshift indel were identified. RYR1 variants predicted to be pathogenic for MHS were found in three participants without medical or family histories of MHS. Numerous variants, previously described as pathogenic in mutation databases, were reclassified by us to be of unknown pathogenicity.
Exome sequencing can identify asymptomatic patients at risk for MHS, although the interpretation of exome variants can be challenging. The use of exome sequencing in unselected cohorts is an important tool to understand the prevalence and penetrance of MHS, a critical challenge for the field.
Neurofibromatosis type 1 (NF1) is an autosomal dominant, monogenic disorder of dysregulated neurocutaneous tissue growth. Pleiotropy, variable expressivity and few NF1 genotype-phenotype correlates limit clinical prognostication in NF1. Phenotype complexity in NF1 is hypothesized to derive in part from genetic modifiers unlinked to the NF1 locus. In this study, we hypothesized that normal variation in germline gene expression confers risk for certain phenotypes in NF1. In a set of 79 individuals with NF1, we examined the association between gene expression in lymphoblastoid cell lines with NF1-associated phenotypes and sequenced select genes with significant phenotype/expression correlations. In a discovery cohort of 89 self-reported European-Americans with NF1 we examined the association between germline sequence variants of these genes with café-au-lait macule (CALM) count, a tractable, tumor-like phenotype in NF1. Two correlated, common SNPs (rs4660761 and rs7161) between DPH2 and ATP6V0B were significantly associated with the CALM count. Analysis with tiled regression also identified SNP rs4660761 as significantly associated with CALM count. SNP rs1800934 and 12 rare variants in the mismatch repair gene MSH6 were also associated with CALM count. Both SNPs rs7161 and rs4660761 (DPH2 and ATP6V0B) were highly significant in a mega-analysis in a combined cohort of 180 self-reported European-Americans; SNP rs1800934 (MSH6) was near-significant in a meta-analysis assuming dominant effect of the minor allele. SNP rs4660761 is predicted to regulate ATP6V0B, a gene associated with melanosome biology. Individuals with homozygous mutations in MSH6 can develop an NF1-like phenotype, including multiple CALMs. Through a multi-platform approach, we identified variants that influence NF1 CALM count.
Neurofibromatosis type 1 (NF1) is a relatively common genetic disease that increases the chance to develop a variety of benign and malignant tumors. People with NF1 also typically feature a large number of birthmarks called café-au-lait macules. It is difficult to predict severity or specific problems in NF1. We sought to identify genes (other than NF1, the gene that causes the disease) that influence severity in NF1. We determined the number of café-au-lait macules in two groups of people with NF1. We measured the gene expression of about 10,000 genes in the cultured white blood cells from one group of people. We then sequenced a group of genes whose expression level was increased in people with higher numbers of café-au-lait macules. In the first group, we found common variants in genes MSH6 and near DPH2 and ATP6V0B that were significantly associated with the number of café-au-lait macules. Some of these variants were close to significant in the second group of people. The two variants near DPH2 and ATP6V0B were very significant when analysed in both groups combined. Our work is among the first to identify genetic variants that influence the severity of NF1.
Massively parallel sequencing to identify rare variants is widely practiced in medical research and in the clinic. Genome and exome sequencing can identify the genetic cause of a disease (primary results), but can also identify pathogenic variants underlying diseases that are not being sought (secondary or incidental results). A major controversy has developed surrounding the return of secondary results to research participants. We have piloted a method to analyze exomes to identify participants at-risk for cardiac arrhythmias, cardiomyopathies or sudden death.
Methods and Results
Exome sequencing was performed on 870 participants not selected for arrhythmia, cardiomyopathy, or a family history of sudden death. Exome data from 22 cardiac arrhythmia and 41 cardiomyopathy-associated genes were analyzed using an algorithm that filtered results on genotype quality, frequency, and database information. We identified 1367 variants in the cardiomyopathy genes and 360 variants in the arrhythmia genes. Six participants had pathogenic variants associated with dilated cardiomyopathy (n=1), hypertrophic cardiomyopathy (n=2), left ventricular noncompaction (n=1) or long QT syndrome (n=2). Two of these participants had evidence of cardiomyopathy and one had left ventricular noncompaction on ECHO. Three participants with likely pathogenic variants had prolonged QTc. Family history included unexplained sudden death among relatives.
Approximately 0.5% of participants in this study had pathogenic variants in known cardiomyopathy or arrhythmia genes. This high frequency may be due to self-selection, false positives, or underestimation of the prevalence of these conditions. We conclude that clinically important cardiomyopathy and dysrhythmia secondary variants can be identified in unselected exomes.
arrhythmia (heart rhythm disorders); cardiomyopathy; cardiovascular genomics; genetic heart disease; genetic variation; arrhythmia; genetics; human; genomic medicine
This report describes the NIH Undiagnosed Diseases Program (UDP), details the Program's application of genomic technology to establish diagnoses, and details the Program's success rate over its first two years.
Each accepted study participant was extensively phenotyped. A subset of participants and selected family members (29 patients and 78 unaffected family members) was subjected to an integrated set of genomic analyses including high-density SNP arrays and whole exome or genome analysis.
Of 1191 medical records reviewed, 326 patients were accepted and 160 were admitted directly to the NIH Clinical Center on the UDP service. Of those, 47% were children, 55% were females, and 53% had neurological disorders. Diagnoses were reached on 39 participants (24%) on clinical, biochemical, pathological, or molecular grounds; 21 diagnoses involved rare or ultra-rare diseases. Three disorders were diagnosed based upon SNP array analysis and three others using WES and filtering of variants. Two new disorders were discovered. Analysis of the SNP-array study cohort revealed that large stretches of homozygosity were more common in affected participants relative to controls.
The NIH UDP addresses an unmet need, i.e., the diagnosis of patients with complex, multisystem disorders. It may serve as a model for the clinical application of emerging genomic technologies, and is providing insights into the characteristics of diseases that remain undiagnosed after extensive clinical workup.
rare disease; undiagnosed disease; SNP arrays; whole exome sequencing; neurological disorders
Substantial intrastrain variation at the nucleotide level complicates molecular and genetic studies in zebrafish, such as the use of CRISPRs or morpholinos to inactivate genes. In the absence of robust inbred zebrafish lines, we generated NHGRI-1, a healthy and fecund strain derived from founder parents we sequenced to a depth of ∼50×. Within this strain, we have identified the majority of the genome that matches the reference sequence and documented most of the variants. This strain has utility for many reasons, but in particular it will be useful for any researcher who needs to know the exact sequence (with all variants) of a particular genomic region or who wants to be able to robustly map sequences back to a genome with all possible variants defined.
zebrafish; SNV; genome sequence; CRISPR; variants
We present a high-quality genome sequence of a Neandertal woman from Siberia. We show that her parents were related at the level of half siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neandertal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neandertals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high quality Neandertal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neandertals and Denisovans.
Motivation: Extensive DNA sequencing of tumor and matched normal samples using exome and whole-genome sequencing technologies has enabled the discovery of recurrent genetic alterations in cancer cells, but variability in stromal contamination and subclonal heterogeneity still present a severe challenge to available detection algorithms.
Results: Here, we describe publicly available software, Shimmer, which accurately detects somatic single-nucleotide variants using statistical hypothesis testing with multiple testing correction. This program produces somatic single-nucleotide variant predictions with significantly higher sensitivity and accuracy than other available software when run on highly contaminated or heterogeneous samples, and it gives comparable sensitivity and accuracy when run on samples of high purity.
Supplementary data are available at Bioinformatics online.
An understanding of ctenophore biology is critical for reconstructing events that occurred early in animal evolution. Towards this goal, we have sequenced, assembled, and annotated the genome of the ctenophore Mnemiopsis leidyi. Our phylogenomic analyses of both amino acid positions and gene content suggests that ctenophores rather than sponges are the sister lineage to all other animals. Mnemiopsis lacks many of the genes found in bilaterian mesodermal cell types, suggesting that these cell types evolved independently. The set of neural genes in Mnemiopsis is similar to that of sponges, indicating that sponges may have lost a nervous system. These results present a new view of early animal evolution that accounts for major losses and/or gains of sophisticated cell types, including nerve and muscle cells.
Early-onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD) is a myopathic disorder associated with mutations in MEGF10. By novel analysis of SNP array hybridization and exome sequence coverage, we diagnosed a 10-year old girl with EMARDD following identification of a novel homozygous deletion of exon 7 in MEGF10. In contrast to previously reported EMARDD patients, her weakness was more prominent proximally than distally, and involved her legs more than her arms. MRI of her pelvis and thighs showed muscle atrophy and fatty replacement. Ultrasound of several muscle groups revealed dense homogenous increases in echogenicity. Cloning and sequencing of the deletion breakpoint identified features suggesting the mutation arose by fork stalling and template switching. These findings constitute the first genomic deletion causing EMARDD, expand the clinical phenotype, and provide new insight into the pattern and histology of its muscular pathology.
EMARDD; MEGF10; SNP array; exome sequencing; deletion analysis; myopathy
Antibodies of the VRC01 class neutralize HIV-1, arise in diverse HIV-1-infected donors, and are potential templates for an effective HIV-1 vaccine. However, the stochastic processes that generate repertoires in each individual of >1012 antibodies make elicitation of specific antibodies uncertain. Here we determine the ontogeny of the VRC01 class by crystallography and next-generation sequencing. Despite antibody-sequence differences exceeding 50%, antibody-gp120 cocrystal structures reveal VRC01-class recognition to be remarkably similar. B cell transcripts indicate that VRC01-class antibodies require few specific genetic elements, suggesting that naive-B cells with VRC01-class features are generated regularly by recombination. Virtually all of these fail to mature, however, with only a few—likely one—ancestor B cell expanding to form a VRC01-class lineage in each donor. Developmental similarities in multiple donors thus reveal the generation of VRC01-class antibodies to be reproducible in principle, thereby providing a framework for attempts to elicit similar antibodies in the general population.
The whey acidic protein (WAP) four-disulfide core domain (WFDC) locus located on human chromosome 20q13 spans 19 genes with WAP and/or Kunitz domains. These genes participate in antimicrobial, immune, and tissue homoeostasis activities. Neighboring SEMG genes encode seminal proteins Semenogelin 1 and 2 (SEMG1 and SEMG2). WFDC and SEMG genes have a strikingly high rate of amino acid replacement (dN/dS), indicative of responses to adaptive pressures during vertebrate evolution. To better understand the selection pressures acting on WFDC genes in human populations, we resequenced 18 genes and 54 noncoding segments in 71 European (CEU), African (YRI), and Asian (CHB + JPT) individuals. Overall, we identified 484 single-nucleotide polymorphisms (SNPs), including 65 coding variants (of which 49 are nonsynonymous differences). Using classic neutrality tests, we confirmed the signature of short-term balancing selection on WFDC8 in Europeans and a signature of positive selection spanning genes PI3, SEMG1, SEMG2, and SLPI. Associated with the latter signal, we identified an unusually homogeneous-derived 100-kb haplotype with a frequency of 88% in Asian populations. A putative candidate variant targeted by selection is Thr56Ser in SEMG1, which may alter the proteolytic profile of SEMG1 and antimicrobial activities of semen. All the well-characterized genes residing in the WDFC locus encode proteins that appear to have a role in immunity and/or fertility, two processes that are often associated with adaptive evolution. This study provides further evidence that the WFDC and SEMG loci have been under strong adaptive pressure within the short timescale of modern humans.
WFDC; semenogelins; natural selection; innate immunity; serine protease inhibitors; reproduction
The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects.
genomics; next generation sequencing; exome; molecular diagnosis
Massively-parallel cDNA sequencing (RNA-Seq) is a new technique that holds great promise for cardiovascular genomics. Here, we used RNA-Seq to study the transcriptomes of matched coronary artery disease cases and controls in the ClinSeq® study, using cell lines as tissue surrogates.
Lymphoblastoid cell lines (LCLs) from 16 cases and controls representing phenotypic extremes for coronary calcification were cultured and analyzed using RNA-Seq. All cell lines were then independently re-cultured and along with another set of 16 independent cases and controls, were profiled with Affymetrix microarrays to perform a technical validation of the RNA-Seq results. Statistically significant changes (p < 0.05) were detected in 186 transcripts, many of which are expressed at extremely low levels (5–10 copies/cell), which we confirmed through a separate spike-in control RNA-Seq experiment. Next, by fitting a linear model to exon-level RNA-Seq read counts, we detected signals of alternative splicing in 18 transcripts. Finally, we used the RNA-Seq data to identify differential expression (p < 0.0001) in eight previously unannotated regions that may represent novel transcripts. Overall, differentially expressed genes showed strong enrichment (p = 0.0002) for prior association with cardiovascular disease. At the network level, we found evidence for perturbation in pathways involving both cardiovascular system development and function as well as lipid metabolism.
We present a pilot study for transcriptome involvement in coronary artery calcification and demonstrate how RNA-Seq analyses using LCLs as a tissue surrogate may yield fruitful results in a clinical sequencing project. In addition to canonical gene expression, we present candidate variants from alternative splicing and novel transcript detection, which have been unexplored in the context of this disease.
Coronary artery calcification; RNA-Seq; Lymphoblastoid cell lines; Transcriptome profiling
Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA.
Most of the work on the genetic epidemiology of serum lipids in African Americans (AA) has focused on replicating findings that were identified in European ancestry individuals. While this can be very informative about the generalizability of lipids loci across populations, African ancestry-specific variation will be missed using this approach. Our aim was to comprehensively evaluate five lipid candidate genes in an AA population, from the identification of variants of interest to population-level analysis of high-density lipoprotein cholesterol (HDLC) and triglycerides (TG). We sequenced five genes in individuals with extreme lipids (n = 48) drawn from a population-based study of AA. The variants identified were genotyped in 1,694 AA and analyzed. Notable among the findings were the observation of ancestry specific effect for several variants in the LPL gene among these admixed individuals, with a greater effect observed among those with European ancestry in this region. These associations were further elucidated by replication in West Africans. By beginning with the sequence variation present among AA, investigating ancestry effects, and seeking replication in West Africans, we were able to comprehensively evaluate these candidate genes with a focus on African ancestry individuals.
Early-onset epileptic encephalopathies have been associated with de novo mutations of numerous ion channel genes. We employed techniques of modern translational medicine to identify a disease-causing mutation, analyze its altered behavior, and screen for therapeutic compounds to treat the proband.
Three modern translational medicine tools were utilized: (1) high-throughput sequencing technology to identify a novel de novo mutation; (2) in vitro expression and electrophysiology assays to confirm the variant protein's dysfunction; and (3) screening of existing drug libraries to identify potential therapeutic compounds.
A de novo GRIN2A missense mutation (c.2434C>A; p.L812M) increased the charge transfer mediated by N-methyl-D-aspartate receptors (NMDAs) containing the mutant GluN2A-L812M subunit. In vitro analysis with NMDA receptor blockers indicated that GLuN2A-L812M-containing NMDARs retained their sensitivity to the use-dependent channel blocker memantine; while screening of a previously reported GRIN2A mutation (N615K) with these compounds produced contrasting results. Consistent with these data, adjunct memantine therapy reduced our proband's seizure burden.
This case exemplifies the potential for personalized genomics and therapeutics to be utilized for the early diagnosis and treatment of infantile-onset neurological disease.
The genetic structure of the indigenous hunter-gatherer peoples of southern Africa, the oldest known lineage of modern human, is important for understanding human diversity. Studies based on mitochondrial1 and small sets of nuclear markers2 have shown that these hunter-gatherers, known as Khoisan, San, or Bushmen, are genetically divergent from other humans1,3. However, until now, fully sequenced human genomes have been limited to recently diverged populations4–8. Here we present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, including 13,146 novel amino acid variants. In terms of nucleotide substitutions, the Bushmen seem to be, on average, more different from each other than, for example, a European and an Asian. Observed genomic differences between the hunter-gatherers and others may help to pinpoint genetic adaptations to an agricultural lifestyle. Adding the described variants to current databases will facilitate inclusion of southern Africans in medical research efforts, particularly when family and medical histories can be correlated with genome-wide data.