|Home | About | Journals | Submit | Contact Us | Français|
Exome sequencing is revolutionizing Mendelian disease gene identification. This results in improved clinical diagnosis, more accurate genotype-phenotype correlations and new insights into the role of rare genomic variation in disease.
The identification of the causative mutation for a Mendelian disease enables molecular diagnosis and carrier testing in the patient and his or her family. This is of great importance for patient management and family counseling, and serves as a starting point for therapeutic interventions . Furthermore, the identification of Mendelian disease genes contributes to our understanding of gene functions and biological pathways underlying health and disease in general , and lessons learned from rare diseases are often also relevant to common disease . Research aimed at the identification of genes that cause Mendelian disease has received a boost over the past couple of years by the introduction of new technologies that enable the sequencing of DNA at a much higher throughput and at much lower costs than previously possible .
Although traditional gene mapping approaches (such as karyotyping , linkage analysis  homozygosity mapping  and copy number variation (CNV) analysis ) have led to great insights into Mendelian disease over the past few decades (Figure (Figure1),1), they are unable to detect all forms of genomic variation (Table (Table1).1). The approach applied is dependent on whether the disease is, for example, caused by single nucleotide mutations or by CNVs, which is difficult to predict in advance. In addition, mapping approaches would often not reduce the number of candidate genes sufficiently for straightforward follow-up by Sanger sequencing . For example, genome-wide single nucleotide polymorphism analysis in a large Dutch pedigree with autosomal-dominant familial exudative vitreoretinopathy (FEVR, MIM 613310), a retinal disorder, identified a linkage peak of about 40 Mb on chromosome 7, containing more than 300 genes . Even after adding linkage data from a second FEVR family the region was still too large for straightforward disease-gene identification, and Sanger sequencing of a few candidate genes did not identify causative mutations. Next generation sequencing (NGS) has the potential to identify all kinds of genetic variation at base-pair resolution throughout the human genome in a single experiment. This can be performed much faster and more cost efficiently than with traditional techniques (the sequencing of a genome by traditional techniques needed many years and cost millions of dollars, whereas NGS technology can sequence a genome for less than $7,000 and within a week ). This enables the detailed genomic analysis of large numbers of patients . In the case of the two families with FEVR, we  used next generation sequencing to investigate the entire coding sequence of the 40-Mb region in a single affected individual from the first family and identified mutations in tetraspanin 12 (TSPAN12) to be the cause of FEVR in both families and in three additional families. For most Mendelian disorders, however, there is no disease locus known and an unbiased approach is required.
There are two unbiased sequencing approaches for detecting genetic variation within an individual: whole genome sequencing and whole exome sequencing. Whole genome sequencing is the ultimate approach for detecting all genomic variation in a patient's genome in a single experiment. However, current NGS instruments are limited in terms of throughput and cost efficiency. Therefore, this approach is limited to gene discovery projects in large genome sequencing centers and service companies. More cost-efficient sequencing strategies have been developed to study the approximately 1% of our genome that is protein-coding (the exome), by using various capturing approaches to enrich before NGS. Exome sequencing has rapidly become one of the main tools for studying the genetic causes of Mendelian disease  because academic groups with access to only one or two NGS systems can use this approach to study the exomes of hundreds of patients with Mendelian diseases per year and the bioinformatic challenges are modest when compared with whole genome sequencing. Although sequencing a complete genome may take only 1 week on a single machine, one can sequence more than 20 exomes in the same time. Since November 2009, exome sequencing has led to the identification of over 30 new genes in Mendelian diseases (Table (Table2)2) [14,15]. Although publication bias makes it difficult to assess the actual success rate of these approaches, from our own studies we estimate that whole exome sequencing identifies the major disease gene in at least 50% of the projects focused on rare but clinically well-defined Mendelian diseases (our unpublished data). The rapid expansion of Mendelian disease genes by exome sequencing is providing new insights because technological limitations have probably biased our current knowledge. Here, we discuss how our view of Mendelian disease is changing as a result of whole exome sequencing experiments and a limited number of whole genome sequencing approaches.
In the past 2 years we have seen many proof-of-concept studies using exome or genome studies to identify new disease genes for recessive and dominant disorders. These publications paint a mixed picture of phenotypes, genes and mutations underlying Mendelian disease (Figure (Figure1,1, Table Table2).2). There is a bias towards recessive disorders as their genetic causes are easier to identify than those that cause dominant disorders. This is because genes carrying rare homozygous or compound heterozygous variants are not frequent in the unaffected population, and these can easily be prioritized for follow-up. In addition, past sample collection has mainly focused on familial cases with recessive inheritance.
From a review of the Mendelian diseases studied (Table (Table2),2), it is clear that not every whole exome sequencing experiment will result in the identification of a new disease gene. There are several examples in which the underlying genetic cause was not evident from the phenotype, yet whole exome sequencing revealed mutations in a known disease gene. For example, Choi et al.  identified mutations in the gene solute carrier family 26, member 3 (SLC26A3), encoding an epithelial Cl-/HCO3- exchanger, in a case with the initial differential diagnosis of Bartter syndrome, a renal salt-wasting disease. After these mutations were identified the clinical diagnosis was re-evaluated and changed to congenital chloride diarrhea (CLD, MID 214700), a disease that was already known to result from mutations in this gene . Similarly, Worthey et al.  reported a case in which a diagnosis of intractable inflammatory bowel disease (MIM 266600) was initially missed but was revealed after the identification of a missense mutation in XIAP, the X-linked inhibitor of apoptosis gene, by exome sequencing. Both these examples [16,18] show that unbiased whole exome sequencing can have an enormous impact on patient management by assisting clinicians in making the proper diagnosis, a phenomenon known as reverse phenotyping .
Mutations in genes that are known to cause disease will also be identified frequently when whole exome or genome sequencing is applied to genetically heterogeneous diseases that can be caused by monogenic mutations in many different genes. The technological limitations of Sanger sequencing often did not allow routine analysis of all known disease genes in patients with genetically heterogeneous disorders before whole exome approaches. The most prominent example of this was the identification of mutations in the SH3 domain and tetratricopeptide repeat domain 2 gene (SH3TC2, a gene known to cause neuropathy) as the cause of Charcot-Marie Tooth neuropathy (MIM 601596) in a family by whole genome sequencing . An unbiased base-pair resolution approach can also reveal mutations in multiple genes that jointly explain a combination of two Mendelian phenotypes. For example, a study identified mutations in dihydroorotate dehydrogenase (DHODH) and dynein, axonemal, heavy chain 5 (DNAH5) in two siblings as the explanation of the combined phenotype of Miller syndrome (postaxial acrofacial dysostosis; MIM 263750) and primary ciliary dyskinesia, respectively . Traditional mapping approaches would probably have missed the mutations in DNAH5 as these were unique to this sibling and not present in other patients with Miller syndrome, which severely complicates mapping. Unbiased whole exome sequencing, on the other hand, identifies all variants and allows for detailed analysis of individual cases and families. This is advantageous because the clinical spectrum of a disease can often be wider than previously appreciated, and multiple mutations might jointly explain a more complex phenotype.
Allelic heterogeneity, in which a disease can be caused by mutations in different genes, has long been recognized to occur in Mendelian diseases. For example, in the case of Fanconi anemia (MIM 227650), a myelodysplastic disorder with chromosome breakage affecting all bone marrow elements and associated with cardiac, renal and limb malformations, mutations in many genes give rise to the exact same phenotype . Conversely, it is also clear that different mutations in the same gene can result in completely different phenotypes, as is the case for tumor protein p63, in which different mutations can lead to several monogenic malformation syndromes . This extreme form of phenotypic variability suggests completely different biological pathways are involved.
Unbiased whole exome and/or genome sequencing has identified further examples of unrelated phenotypes caused by different mutations in the same gene. Sobreira et al.  sequenced the complete genome of a single patient with metachondromatosis (MIM 156250), a skeletal dysplasia. They identified pathogenic loss-of-function mutations in protein tyrosine phosphatase, nonreceptor-type 11 (PTPN11) for which gain-of-function mutations are known to cause Noonan syndrome (MIM 163950), characterized by short stature and facial features, and pulmonary stenosis. Another example comes from Norton et al. , who identified both truncating and missense mutations in Bcl2-associated athanogene 3 (BAG3) in patients with dilated cardiomyopathy (MIM 115200), characterized by cardiac dilatation and reduced systolic function. A missense variant in this gene was previously known to cause myofibrillar myopathy (MIM 612954), a strikingly different phenotype characterized by skeletal muscle weakness associated with cardiac conduction blocks, arrhythmias, and restrictive heart failure. The authors  also observed phenotypic variability in a zebrafish model, where translation initiation blocking of the whole gene gave a single phenotype of heart failure. By contrast, a second morpholino oligonucleotide that splices out exon 2 resulted in axis curvature that might be analogous to skeletal myopathy. This observation led the authors  to suggest fundamental differences in mechanisms of disease. Similarly, Pierce et al.  observed mutations in HSD174B, which encodes 17β-hydroxysteroid dehydrogenase type 4, in two adult patients with Perrault syndrome, characterized by ovarian dysgenesis and deafness. Different classes of HSD17B4 mutations have previously been associated with three different types of D-bifunctional protein (DBP) deficiency (MIM 261515), which is generally fatal within the first 2 years of life. The authors  suggested that the specific mutations they identified lead to a much milder phenotype that allows patients to survive to puberty and causes ovarian dysgenesis in females in addition to the known neurological defects associated with DBP. Using traditional approaches, these genes might not have been considered likely candidates for the disease.
In addition, for some novel disease genes the type of mutation may be specific to the disease in which these were observed. In two patients with Sensenbrenner syndrome, an autosomal recessive disorder characterized by ectodermal features, craniosynostosis and hypodontia, whole exome sequencing revealed a missense and a nonsense mutation in WD-repeat-containing protein 35 (WDR35) . It may well be that two missense mutations result in a milder phenotype, whereas other mutations could result in a more severe phenotype, as confirmed by a subsequent study of patients with a lethal short rib polydactyly syndrome .
Many dominant Mendelian disorders occur sporadically because the severity and early onset of the disorder preclude transmission to subsequent generations. Consequently, there are no families available for genetic studies. Genetic variants associated with these diseases are under strong negative selection and will be rapidly eliminated from the genetic pool . For these reasons, many genes that cause 'sporadic' disease remain to be identified. Important new insights have come from studying these sporadic forms of Mendelian disease by whole exome sequencing.
Exome sequencing first identified de novo mutations causing rare syndromic forms of dominant sporadic Mendelian disease, such as Schinzel-Giedion syndrome (MIM 269150) and Kabuki syndrome (MIM 147920) [29,30]. Both disorders are characterized by intellectual disability and typical facial features, and were anticipated to be largely caused by mutations in a single gene, which facilitated interpretation of exome sequencing data. Multiple unrelated patients with the same syndrome were sequenced and variant prioritization was focused on genes showing severe mutations in multiple if not all patients. For Schinzel-Giedion syndrome this resulted in the identification of heterozygous mutations in SETBP1 (encoding the SET binding protein 1, a histone-lysine N-methyltransferase) in 12 out of 13 patients tested . All mutations in SETBP1 occurred de novo in the patient and were not detected in DNA from the unaffected parents. The disease mutations clustered in a genomic stretch of just 11 nucleotides, affecting three of four consecutive amino acids. Interestingly, individuals with partial chromosome 18 deletions affecting SETBP1 do not show clinical overlap with Schinzel-Giedion syndrome. Collectively, this indicates that these mutations are likely to confer a gain of function. Exome sequencing is particularly useful for identifying these types of mutations for which no other genome-wide approach is applicable. We suspect that gain-of-function mutations are largely underrepresented in the current databases, and that many more will be detected by exome sequencing.
After the detection of de novo mutations in rare Mendelian disorders, subsequent studies focused on their roles in common neurodevelopmental disorders, such as intellectual disability  and autism (MIM 209850) . The high population frequency of these disorders has been hypothesized to reflect de novo mutations that compensate for allele loss due to severely reduced fecundity. If this were the case, the frequency of these disorders would reflect the size of the mutational target; disease caused by de novo mutations in a single gene will be very rare, whereas disorders caused by de novo mutations in many genes can occur at a high prevalence (Figure (Figure2).2). In other words, the complexity of many common diseases might be primarily due to genetic heterogeneity, with novel defects in different genes causing the same disease . This would explain both the genetic and the clinical heterogeneity observed for mental illnesses and would explain why these disorders have a low recurrence risk . To investigate this hypothesis, Vissers et al.  sequenced the exome of ten patients with intellectual disability and their healthy parents. After filtering out all inherited genomic variation, nine de novo mutations were identified in ten patients, between none and two per patient. Two de novo mutations were observed in genes previously linked to intellectual disability, and an analysis of the gene function and the mutation type indicated that an additional four de novo mutations in novel genes are likely to be pathogenic.
Similar results were recently reported by O'Roak et al. , who studied 20 patients with sporadic autism using the same approach . Related work has also been reported for patients with schizophrenia (MIM 181500), although not yet using unbiased whole genome or exome approaches. In this case the authors  performed Sanger sequencing for 401 synapse-expressed genes in 143 patients with schizophrenia and identified eight de novo mutations, two of which were in SH3 and multiple ankyrin repeat domains 3 (SHANK3), a gene known to cause schizophrenia [35,36]. If confirmed in larger follow-up studies, this would indicate that a significant proportion of these common disorders is caused by rare de novo mutations. This would provide strong support for the hypothesis that rare variants substantially contribute to common diseases and would also represent a change of focus in Mendelian genetic disease research, where the emphasis has been on studying familial forms of disease.
Severe early onset disorders, such as intellectual disability, autism and childhood schizophrenia, are clearly the first group of disorders on which to test this de novo mutation hypothesis, as inherited mutations are less likely to have a major role in these disorders because of the reduced fitness of those affected by the disease . The contribution of de novo mutations to adult-onset diseases with less impact on fertility is expected to be lower. However, we note that this is mainly dependent on the size of the mutational target and remains to be determined. A problem in this respect, however, is the availability of surviving parents of offspring who have a late-onset disease in order to prove de novo occurrence . Related to this, Conrad et al.  recently published the first genome sequencing study in two healthy trios of offspring with both their parents and validated 49 de novo germline mutations in one offspring and 35 in the other. This study  also highlighted the fact that cell lines are not the preferred material for this analysis, as 952 and 643 non-germline de novo mutations were observed in both offspring that are likely to have been caused by cell line creation and culturing. Given the apparent role of rare de novo mutations in disease, it is evident that we need to learn much more about the general occurrence of these mutations in our population and perform detailed comparisons of de novo mutations in patients versus controls.
Although some studies support the arguments that a significant proportion of common disorders (such as intellectual disability, autism and schizophrenia) represent an accumulation of rare disorders, true multigenic/complex diseases can also benefit from exome sequencing. Until NGS studies of the size of genome-wide association studies are broadly applicable and affordable (that is, until one can run and analyze 1,000 exomes routinely), insights can be obtained by studying small cohorts from the extreme ends of the phenotypic spectrum of common traits, because the Mendelian forms of these traits are expected to be overrepresented in this group. The first example of this approach was provided by whole exome sequencing in patients with extremely low low-density lipoprotein (LDL) cholesterol levels, which identified mutations in ANGPTL3 . The identified gene is secreted and expressed primarily in the liver and encodes the angiopoietin-like 3 protein. ANGPTL3 has a role in lipoprotein lipase and endothelial lipase inhibition, thereby increasing plasma triglyceride and high-density lipoprotein (HDL) cholesterol levels. The very low LDL levels in these individuals could therefore be explained by two mutated alleles for this gene.
Although only a few exome studies have been conducted so far (Table (Table2),2), these have already provided new insights into human gene networks. For example, it was known through classical disease gene identification that histone modifiers have an important role in human developmental diseases. Haploinsufficiency of histone methyltransferases, such as NSD1 (encoding nuclear receptor binding SET domain protein 1), NSD2 (encoding nuclear SET domain-containing protein 2 and also called Wolf-Hirschhorn syndrome candidate 1, WHSC1) and EHMT1 (encoding euchromatic histone-lysine N-methyltransferase 1), cause several congenital diseases . Exome sequencing studies have confirmed the importance of this class of genes. Among the few exome studies that identified dominant genes for Mendelian disorders, three disease genes function in histone modification: (i) mutations in the histone methyltransferase gene MLL2 (H3K4me) have been shown to be the cause of Kabuki syndrome ; (ii) point mutations in SETBP1 - encoding the SET binding protein 1, a histone-lysine N-methyltransferase - have been identified as the cause of Schinzel-Giedion syndrome ; (iii) most recently, ASXL1, encoding the protein Additional sex-combs-like 1, was implicated in Bohring-Opitz syndrome (MIM 605039), characterized by severe malformations and intellectual disability . ASXL1 is an interactor of lysine-specific demethylase 1 (LSD1) and therefore ASXL1 can be considered another histone modification gene .
Histone modifiers might have a dual role in disease. Although germline mutations in these genes can lead to developmental disorders, somatic mutations have been reported in leukemias and other malignancies. Translocations of NSD1 to NUP98 (encoding nucleoporin 98 kDa) and of NSD2/WHSC1 to IgH occur in some hematologic malignancies . In addition, SETBP1 translocations with NUP98 have been described in pediatric acute T-cell lymphoblastic leukemia , and somatic mutations in ASXL1 occur in several forms of leukemia [45,46]. Further credence for this dual role for genes involved in histone modification has been given by recent studies. It was already anticipated that the DNMT3A, DNMT3B and DNMT3L proteins were primarily responsible for the establishment of genomic DNA methylation patterns and should therefore have an important role in human developmental, reproductive and mental health . Exome sequencing identified DNMT3A (encoding DNA methyltransferase 3A) mutations in monocytic leukemia, confirming the suspected link with cancer development . Exome sequencing also led to the identification of mutations in DNMT1 that cause both central and peripheral neurodegeneration in a form of hereditary sensory and autonomic neuropathy with dementia and hearing loss . These exome sequencing findings add to a growing list of genes for which somatic mutations have been identified in malignancies and germline mutations in developmental disorders.
The interpretation of missense variants in Mendelian diseases is challenging. A common method to address pathogenicity is to assume that purifying selection 'constrains' evolutionary divergence at phenotypically important nucleotides and amino acids . Following an earlier suggestion , exome studies have now confirmed that pathogenic missense variants indeed tend to affect highly conserved nucleotides (using GERP, PhyloP or PhastCons scores), or amino acids (by multiple sequence alignment). All scores in some way measure the difference between the number of nucleotide substitutions that have occurred at a site during evolution and the number of substitutions that are expected from neutral evolution. As an example, the nucleotides affected by de novo mutations in SETBP1 that cause Schinzel-Giedion syndrome are among the most highly conserved bases in the genome, according to PhyloP vertebrate conservation scores . This observation is unlikely to be explained by an ascertainment bias, as evolutionary conservation is generally used only to strengthen the existing evidence of pathogenicity, rather than to prioritize variants for follow-up.
Exome studies have revealed mutations causing Mendelian disease in more than 30 genes, many of which had no previously known function. This knowledge can help to identify essential biological pathways disrupted in these often severe disorders. For example, Haack et al.  obtained a promising clinical response to a multivitamin scheme including daily riboflavin treatment in a patient with a complex 1 deficiency (MIM 252010) after whole exome sequencing revealed mutations in ACAD9, a member of the mitochondrial acyl-CoA dehydrogenase protein family. Another example involves SERPINF1, in which mutations were detected by whole exome sequencing in patients with osteogenesis imperfecta . There is much to learn from the first gene therapy trials with pigment epithelium-derived factor (PEDF, encoded by SERPINF1). PEDF is thought to counteract the effect of vascular endothelial growth factor (VEGF) - a signal protein produced by cells that stimulates vasculogenesis and angiogenesis - and trials are dedicated to the treatment of the wet form of age-related macular degeneration . In a mouse model of ischemia-induced retinal angiogenesis, PEDF eliminated aberrant neovascularization , which suggests that PEDF might have potential in treating osteogenesis imperfecta. Finally, Züchner et al.  used whole exome sequencing to identify mutations in dehydrodolichyl diphosphate synthase (DDHDS) in patients with retinitis pigmentosa, linking this disease to N-linked glycosylation pathways, suggesting new possibilities for therapeutic interventions. Although the number of these promising examples is small, one might foresee that several clinical conditions will be better understood after the identification of the underlying disease gene, and in some cases this will enable usable therapy.
Exome sequencing and, in a few cases, genome sequencing, has significantly progressed the field of Mendelian disease in the past 2 years. The unbiased nature of these approaches is providing significant insights into the genetic causes of Mendelian disease in general, and of sporadic disease in particular, by revealing rare de novo mutations as a common cause of disease. This approach is crucial for drawing accurate genotype-phenotype correlations and will undoubtedly improve diagnosis for the millions of individuals with Mendelian disease, improve family counseling and reveal new therapeutic targets. The next challenge in disease research will be to systematically study the role of variation in the non-coding part of our genome in health and disease. The study of Mendelian diseases will be crucial in this endeavor, as they offer the advantage of high penetrant mutations, the ability to perform family studies and look for segregation of variation with disease, and the possibility of finding recurrent mutations in unrelated patients with similar phenotypes.
This work was financially supported by the Netherlands Organization for Health Research and Development (ZonMW grants 917-66-363 and 911-08-025 to JAV, the EU-funded TECHGENE project (Health-F5-2009-223143 to JAV) and the AnEUploidy project (LSHG-CT-2006-37627 to AH, HGB and JAV).