The whey acidic protein (WAP) four-disulfide core domain (WFDC) locus located on human chromosome 20q13 spans 19 genes with WAP and/or Kunitz domains. These genes participate in antimicrobial, immune, and tissue homoeostasis activities. Neighboring SEMG genes encode seminal proteins Semenogelin 1 and 2 (SEMG1 and SEMG2). WFDC and SEMG genes have a strikingly high rate of amino acid replacement (dN/dS), indicative of responses to adaptive pressures during vertebrate evolution. To better understand the selection pressures acting on WFDC genes in human populations, we resequenced 18 genes and 54 noncoding segments in 71 European (CEU), African (YRI), and Asian (CHB + JPT) individuals. Overall, we identified 484 single-nucleotide polymorphisms (SNPs), including 65 coding variants (of which 49 are nonsynonymous differences). Using classic neutrality tests, we confirmed the signature of short-term balancing selection on WFDC8 in Europeans and a signature of positive selection spanning genes PI3, SEMG1, SEMG2, and SLPI. Associated with the latter signal, we identified an unusually homogeneous-derived 100-kb haplotype with a frequency of 88% in Asian populations. A putative candidate variant targeted by selection is Thr56Ser in SEMG1, which may alter the proteolytic profile of SEMG1 and antimicrobial activities of semen. All the well-characterized genes residing in the WDFC locus encode proteins that appear to have a role in immunity and/or fertility, two processes that are often associated with adaptive evolution. This study provides further evidence that the WFDC and SEMG loci have been under strong adaptive pressure within the short timescale of modern humans.
WFDC; semenogelins; natural selection; innate immunity; serine protease inhibitors; reproduction
Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA.
Most of the work on the genetic epidemiology of serum lipids in African Americans (AA) has focused on replicating findings that were identified in European ancestry individuals. While this can be very informative about the generalizability of lipids loci across populations, African ancestry-specific variation will be missed using this approach. Our aim was to comprehensively evaluate five lipid candidate genes in an AA population, from the identification of variants of interest to population-level analysis of high-density lipoprotein cholesterol (HDLC) and triglycerides (TG). We sequenced five genes in individuals with extreme lipids (n = 48) drawn from a population-based study of AA. The variants identified were genotyped in 1,694 AA and analyzed. Notable among the findings were the observation of ancestry specific effect for several variants in the LPL gene among these admixed individuals, with a greater effect observed among those with European ancestry in this region. These associations were further elucidated by replication in West Africans. By beginning with the sequence variation present among AA, investigating ancestry effects, and seeking replication in West Africans, we were able to comprehensively evaluate these candidate genes with a focus on African ancestry individuals.
To utilize high-throughput sequencing to determine the etiology of juvenile-onset neurodegeneration in a 19-year-old woman with progressive motor and cognitive decline.
Exome sequencing identified an initial list of 133,555 variants in the proband's family, which were filtered using segregation analysis, presence in dbSNP, and an empirically derived gene exclusion list. The filtered list comprised 52 genes: 21 homozygous variants and 31 compound heterozygous variants. These variants were subsequently scrutinized with predicted pathogenicity programs and for association with appropriate clinical syndromes.
Exome sequencing data identified 2 GLB1 variants (c.602G>A, p.R201H; c.785G>T, p.G262V). β-Galactosidase enzyme analysis prior to our evaluation was reported as normal; however, subsequent testing was consistent with juvenile-onset GM1-gangliosidosis. Urine oligosaccharide analysis was positive for multiple oligosaccharides with terminal galactose residues.
We describe a patient with juvenile-onset neurodegeneration that had eluded diagnosis for over a decade. GM1-gangliosidosis had previously been excluded from consideration, but was subsequently identified as the correct diagnosis using exome sequencing. Exome sequencing can evaluate genes not previously associated with neurodegeneration, as well as most known neurodegeneration-associated genes. Our results demonstrate the utility of “agnostic” exome sequencing to evaluate patients with undiagnosed disorders, without prejudice from prior testing results.
Most endometrial cancers can be classified histologically as endometrioid, serous, or clear cell. Non-endometrioid endometrial cancers (NEECs; serous and clear cell) are the most clinically aggressive of the three major histotypes and are characterized by aneuploidy, a feature of chromosome instability. The genetic alterations that underlie chromosome instability in endometrial cancer are poorly understood. In the present study, we used Sanger sequencing to search for nucleotide variants in the coding exons and splice junctions of 21 candidate chromosome instability genes, including 19 genes implicated in sister chromatid cohesion, from 24 primary, microsatellite-stable NEECs. Somatic mutations were verified by sequencing matched normal DNAs. We subsequently resequenced mutated genes from 41 additional NEECs as well as 42 endometrioid ECs (EECs). We uncovered nonsynonymous somatic mutations in ESCO1, CHTF18, and MRE11A in, respectively, 3.7% (4 of 107), 1.9% (2 of 107), and 1.9% (2 of 107) of endometrial tumors. Overall, 7.7% (5 of 65) of NEECs and 2.4% (1 of 42) of EECs had somatically mutated one or more of the three genes. A subset of mutations are predicted to impact protein function. The co-occurrence of somatic mutations in ESCO1 and CHTF18 was statistically significant (P = 0.0011, two-tailed Fisher's exact test). This is the first report of somatic mutations within ESCO1 and CHTF18 in endometrial tumors and of MRE11A mutations in microsatellite-stable endometrial tumors. Our findings warrant future studies to determine whether these mutations are driver events that contribute to the pathogenesis of endometrial cancer.
Genomic technologies, such as whole-exome sequencing, are a powerful tool in genetic research. Such testing yields a great deal of incidental medical information, or medical information not related to the primary research target. We describe the management of incidental medical information derived from whole-exome sequencing in the research context. We performed whole-exome sequencing on a monozygotic twin pair in which only 1 child was affected with congenital anomalies and applied an institutional review board–approved algorithm to determine what genetic information would be returned. Whole-exome sequencing identified 79 525 genetic variants in the twins. Here, we focus on novel variants. After filtering artifacts and excluding known single nucleotide polymorphisms and variants not predicted to be pathogenic, the twins had 32 novel variants in 32 genes that were felt to be likely to be associated with human disease. Eighteen of these novel variants were associated with recessive disease and 18 were associated with dominantly manifesting conditions (variants in some genes were potentially associated with both recessive and dominant conditions), but only 1 variant ultimately met our institutional review board–approved criteria for return of information to the research participants.
whole-exome sequencing; incidental medical information
Fatty acid hydroxylase-associated neurodegeneration due to fatty acid 2-hydroxylase deficiency presents with a wide range of phenotypes including spastic paraplegia, leukodystrophy, and/or brain iron deposition. All previously described families with this disorder were consanguineous, with homozygous mutations in the probands. We describe a 10-year-old male, from a non-consanguineous family, with progressive spastic paraplegia, dystonia, ataxia, and cognitive decline associated with a sural axonal neuropathy. The use of high-throughput sequencing techniques combined with SNP array analyses revealed a novel paternally derived missense mutation and an overlapping novel maternally derived ∼28-kb genomic deletion in FA2H. This patient provides further insight into the consistent features of this disorder and expands our understanding of its phenotypic presentation. The presence of a sural nerve axonal neuropathy had not been previously associated with this disorder and so may extend the phenotype.
fatty acid 2-hydroxylase; fatty acid hydroxylase-associated neurodegeneration; exome sequencing; deletion analysis; neuropathy
In this study we assess exome sequencing (ES) as a diagnostic alternative for genetically heterogeneous disorders. Since ES readily identified a previously reported homozygous mutation in the CAPN3 gene for an individual with an undiagnosed limb girdle muscular dystrophy, we evaluated ES as a generalizable clinical diagnostic tool by assessing the targeting efficiency and sequencing-coverage of 88 genes associated with muscle disease (MD) and spastic paraplegia (SPG). We used three exome-capture kits on 125 individuals. Exons constituting each gene were defined using the UCSC and CCDS databases. The three exome-capture kits targeted 47–92% of bases within the UCSC-defined exons, and 97%–99% of bases within the CCDS-defined exons. An average of 61.2–99.5% and 19.1–99.5% of targeted bases per gene were sequenced to 20X coverage within the CCDS-defined MD and SPG coding exons, respectively. Greater than 95–99% of targeted known mutation positions were sequenced to ≥1X coverage and 55–87% to ≥20X coverage in every exome. We conclude therefore that ES is a rapid and efficient first tier method to screen for mutations, particularly within the CCDS annotated exons, although its application requires disclosure of the extent of coverage for each targeted gene and supplementation with second tier Sanger sequencing for full coverage.
CAPN3; exome; LGMD; HSP; neuromuscular disorders; clinical genetic testing
While genomic sequencing methods are powerful tools in the discovery of the genetic underpinnings of human disease, incidentally-revealed novel genomic risk factors may be equally important, both scientifically, and as relates to direct patient care. We performed whole-exome sequencing on a child with VACTERL association who suffered severe post-surgical neonatal pulmonary hypertension, and identified a potential novel genetic risk factor for this complication: a heterozygous mutation in CPSI. Newborn screening results from this patient’s monozygotic twin provided evidence that this mutation, in combination with an environmental trigger (in this case, surgery), may have resulted in pulmonary artery hypertension due to inadequate nitric oxide production. Identification of this genetic risk factor allows for targeted medical preventative measures in this patient as well as relatives with the same mutation, and illustrates the power of incidental medical information unearthed by whole-exome sequencing.
Whole-exome sequencing; CPSI; pulmonary artery hypertension; VACTERL
G protein-coupled receptors (GPCRs), the largest human gene family, are important regulators of signaling pathways. However, knowledge of their genetic alterations is limited. In this study, we used exon capture and massively parallel sequencing methods to analyze the mutational status of 734 GPCRs in melanoma. This investigation revealed that one family member, GRM3, was frequently mutated and that one of its mutations clustered within one position. Biochemical analysis of GRM3 alterations revealed that mutant GRM3 selectively regulated the phosphorylation of MEK, leading to increased anchorage-independent growth and migration. Melanoma cells expressing mutant GRM3 had reduced cell growth and cellular migration after short hairpin RNA–mediated knockdown of GRM3 or treatment with a selective MEK inhibitor, AZD-6244, which is currently being used in phase 2 clinical trials. Our study yields the most comprehensive map of genetic alterations in the GPCR gene family.
We report an early onset spastic ataxia-neuropathy syndrome in two brothers of a consanguineous family characterized clinically by lower extremity spasticity, peripheral neuropathy, ptosis, oculomotor apraxia, dystonia, cerebellar atrophy, and progressive myoclonic epilepsy. Whole-exome sequencing identified a homozygous missense mutation (c.1847G>A; p.Y616C) in AFG3L2, encoding a subunit of an m-AAA protease. m-AAA proteases reside in the mitochondrial inner membrane and are responsible for removal of damaged or misfolded proteins and proteolytic activation of essential mitochondrial proteins. AFG3L2 forms either a homo-oligomeric isoenzyme or a hetero-oligomeric complex with paraplegin, a homologous protein mutated in hereditary spastic paraplegia type 7 (SPG7). Heterozygous loss-of-function mutations in AFG3L2 cause autosomal-dominant spinocerebellar ataxia type 28 (SCA28), a disorder whose phenotype is strikingly different from that of our patients. As defined in yeast complementation assays, the AFG3L2Y616C gene product is a hypomorphic variant that exhibited oligomerization defects in yeast as well as in patient fibroblasts. Specifically, the formation of AFG3L2Y616C complexes was impaired, both with itself and to a greater extent with paraplegin. This produced an early-onset clinical syndrome that combines the severe phenotypes of SPG7 and SCA28, in additional to other “mitochondrial” features such as oculomotor apraxia, extrapyramidal dysfunction, and myoclonic epilepsy. These findings expand the phenotype associated with AFG3L2 mutations and suggest that AFG3L2-related disease should be considered in the differential diagnosis of spastic ataxias.
Mitochondria are cellular organelles important for converting sugar or fats into energy that cells can use for their functions and survival. Many neurological diseases are the result of mitochondrial dysfunction as affected cells are unable to cope with lowered energy supplies and increased oxidative stress. These deficiencies cause accumulation of cellular damage and eventually cell death. Spastic ataxias are neurological disorders involving cells with large energy requirements, the cerebellar Purkinje cells and the cerebral upper motor neurons. When these cells function improperly or die, individuals develop symptoms of incoordination (ataxia) and abnormal muscle tone in their legs (spastic paraplegia). Using emerging techniques of whole-exome sequencing we discovered that homozygous mutations in the AFG3L2 gene caused spastic ataxia in two brothers of a consanguineous family. AFG3L2 encodes a subunit of mitochondrial matrix proteases (m-AAA proteases) that regulate the functional integrity of mitochondria. Heterozygous mutations in AFG3L2 were previously found to cause a disorder involving the Purkinje cells of the cerebellum resulting in ataxia. Interestingly, another isoform of m-AAA proteases consists of AFG3L2 complexing with paraplegin, a similar protein associated with a hereditary spastic paraplegia. Our analysis provides insight into why different mutations in m-AAA protease subunits cause different neurological disorders.
Ciliary dysfunction leads to a broad range of overlapping phenotypes, termed collectively as ciliopathies. This grouping is underscored by genetic overlap, where causal genes can also contribute modifying alleles to clinically distinct disorders. Here we show that mutations in TTC21B/IFT139, encoding a retrograde intraflagellar transport (IFT) protein, cause both isolated nephronophthisis (NPHP) and syndromic Jeune Asphyxiating Thoracic Dystrophy (JATD). Moreover, although systematic medical resequencing of a large, clinically diverse ciliopathy cohort and matched controls showed a similar frequency of rare changes, in vivo and in vitro evaluations unmasked a significant enrichment of pathogenic alleles in cases, suggesting that TTC21B contributes pathogenic alleles to ∼5% of ciliopathy patients. Our data illustrate how genetic lesions can be both causally associated with diverse ciliopathies, as well as interact in trans with other disease-causing genes, and highlight how saturated resequencing followed by functional analysis of all variants informs the genetic architecture of disorders.
ATAD5, the human ortholog of yeast Elg1, plays a role in PCNA deubiquitination. Since PCNA modification is important to regulate DNA damage bypass, ATAD5 may be important for suppression of genomic instability in mammals in vivo. To test this hypothesis, we generated heterozygous (Atad5+/m) mice that were haploinsuffficient for Atad5. Atad5+/m mice displayed high levels of genomic instability in vivo, and Atad5+/m mouse embryonic fibroblasts (MEFs) exhibited molecular defects in PCNA deubiquitination in response to DNA damage, as well as DNA damage hypersensitivity and high levels of genomic instability, apoptosis, and aneuploidy. Importantly, 90% of haploinsufficient Atad5+/m mice developed tumors, including sarcomas, carcinomas, and adenocarcinomas, between 11 and 20 months of age. High levels of genomic alterations were evident in tumors that arose in the Atad5+/m mice. Consistent with a role for Atad5 in suppressing tumorigenesis, we also identified somatic mutations of ATAD5 in 4.6% of sporadic human endometrial tumors, including two nonsense mutations that resulted in loss of proper ATAD5 function. Taken together, our findings indicate that loss-of-function mutations in mammalian Atad5 are sufficient to cause genomic instability and tumorigenesis.
Genomic instability is a hallmark of tumorigenesis, suggesting that mutations in genes suppressing genomic instability contribute to this phenotype. In this study, we demonstrate for the first time that haploinsufficiency for Atad5, a protein that is important in stabilizing stalled DNA replication forks by regulating PCNA ubiquitination during DNA damage bypass, predisposes >90% of mice to tumorigenesis in multiple organs. In heterozygous Atad5 mice, both somatic cells and the spontaneous tumors showed high levels of genomic instability. In a subset of sporadic human endometrial tumors, we identified heterozygous loss-of-function somatic mutations in the ATAD5 gene, consistent with the role of mouse Atad5 in suppressing tumorigenesis. Collectively, our findings suggest that ATAD5 may be a novel tumor suppressor gene.
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein–protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the noninteracting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain–domain interactions. Given a protein–protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain–domain interactions, and used known domain–domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain–domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.
co-evolution; protein—protein interaction; domain—domain interaction; MLE, Maximum Likelihood Estimation; PDB, Protein Data Bank; RCDP, Relative Co-evolution of Domain Pairs; SLA, Sequence Lengths Assigned; DPEA, Domain Pair Exclusion Analysis; RDFF, Random Decision Forest Framework