We show that the sequencing of the exomes of affected individuals from a few unrelated kindreds, with appropriate filtering against public SNP databases and a small number of HapMap exomes, is sufficient to identify a single candidate gene for a previously unsolved monogenic disorder, Miller syndrome. Several factors were important to the success of this study. First, Miller syndrome is a very rare disorder that is inherited in an autosomal recessive pattern. Therefore, the causal variants were unlikely to be found in public SNP databases or control exomes. Second, genes for recessive diseases will, in general, be easier to find than genes for dominant disorders because fewer genes in any single individual have 2 or more novel or rare nonsynonymous variants. Third, we were fortunate that there was no genetic heterogeneity in our sample of Miller cases. In the presence of heterogeneity, it is possible to relax stringency by allowing for genes common to subsets of all affected individuals to be considered candidates, although this will reduce power (). Third, all of the individuals with Miller syndrome for whom exomes were sequenced were of European ancestry. Sequencing exomes of affected individuals sampled from populations with a different geographical ancestry who have a higher number of novel and/or rare variants (e.g., sub-Saharan African, East Asian) will make the identification of candidate genes more difficult. This will become less of an issue as databases of human polymorphisms become increasingly comprehensive.
Additional factors could facilitate the future application of this strategy. Mapping information, such as blocks of homozygosity, could focus the search to a smaller pool of candidates. The number of candidate variants can also be reduced further by comparison between variants in a case to those found in each parent. For autosomal dominant disorders, this strategy can discover de novo coding variants as neither parent is predicted to have a mutation that causes a fully penetrant dominant disorder, whereas for recessive disorders, parents are predicted to be carriers of the disease-causing variants.
There are at least three aspects of this approach where we see significant scope for improvement. The first relates to missed variant calls, either due to low coverage or because some variants are not identified easily with current sequencing platforms (e.g. within repeat tracts in coding sequences). The second is that our filtering relied on a public SNP database (dbSNP) that is a highly uneven ascertainment of variation across the genome. It would be better to rely on catalogues of common variation that are ascertained in a single study either exome-wide (as with the 8 HapMap exomes2
) or genome-wide (e.g. as with the 1000 Genomes project), and where estimates of allele frequency are available. Increasing the number of control exomes progressively reduces the relevance of dbSNP to this analysis (Supplementary Figure 2
). Furthermore, as increasingly deep catalogs of polymorphism become available, it may be necessary to establish frequency-based thresholds for defining “common” variation that is unlikely to be causal. A third concern is that the specificity of this approach is currently reduced by a subset of genes that recurrently appear enriched for novel variants. These include long genes, but also genes that are subject to systematic technical artifacts (e.g. mis-mapped reads due to duplicated or highly similar sequence in the genome). For sequences that are known to be duplicated or have paralogues (e.g. genes from large gene families, or pseudogenes), these artifacts are mostly removed during read alignment (as reads with non-unique placements are removed from consideration). However, duplicated sequences not represented in the reference genome are not removed and spuriously appear as enriched for novel variants (e.g. CDC27).
The mechanism by which mutations in DHODH
cause Miller syndrome is unclear. The primary known function of dihydroorotate dehydrogenase is to catalyze the conversion of dihydroorotate to orotic acid, an intermediate in the pyrimidine de novo
biosynthesis pathway (Supplementary Figure 3
. Orotic acid is subsequently converted to uridine monophosphate (UMP) by UMP synthase. Pyrimidine biosynthesis might be particularly sensitive to the step mediated by dihydroorotate dehydrogenase21
and the classical rudimentary
phenotype in D. melanogaster
, reported by T.H. Morgan in 1910 and characterized by wing anomalies, defective oogenesis, and malformed posterior legs, is caused by mutations in the same pathway22-24
. However, the clinical characteristics of the other inborn errors of pyrimidine biosynthesis such as orotic aciduria, caused by mutations in UMP synthase, do not include malformations. Indeed, inborn errors of metabolism are, in general, a rare cause of birth defects so DHODH
would be given little weight a priori
as a candidate for a multiple malformation disorder. Thus, the discovery that mutations in DHODH
cause Miller syndrome reveals both a new role for pyrimidine metabolism in craniofacial and limb development as well as a novel function of dihydroorotate dehydrogenase that remains to be explored.
Selective inhibition of pyrimidine or purine biosynthesis has long been used as a therapeutic option to treat various cancers and autoimmune disorders. Leflunomide, a prodrug that is converted in the gastrointestinal tract to the active metabolite, A771726, reduces de novo
pyrimidine biosynthesis by selectively inhibiting dihydroorotate dehydrogenase21
. In mice, use of leflunomide during pregnancy causes a wide range of limb and craniofacial defects, the most common of which are exencephaly, cleft palate, and “open eye” or failure of eyelid to close25
. These phenotypic characteristics recapitulate some of the malformations observed in individuals with Miller syndrome providing further evidence that it is caused by mutations in DHODH
The developmental pathways disrupted by leflunomide are unknown but their elucidation could help understand the mechanism by which DHODH
mutations cause malformations. In the liver of mice treated with leflunomide, TNF-α production is repressed by the direct inhibition of NF-κB activity26
. Interruption of NF-κB signaling during development can result in disrupted cell migration, diminished cellular proliferation, and increased apoptosis27
. Indeed, open eye is a defect observed in mice with targeted disruption of TNF-α28
C. Furthermore, NF-κB plays an important role in limb morphogenesis, specifically as a transducer of signals that regulate Sonic hedgehog (Shh)
controls, in part, anterior-posterior patterning of the digits and Shh-/-
knockout mice fail to form digits 2-529
. These observations suggest that the malformations observed in individuals with Miller syndrome could be caused by perturbed NF-κB signaling due to loss of DHODH
The pattern of malformations observed in individuals with Miller syndrome is similar to those of individuals with fetal exposure to methotrexate (). Methotrexate is a well-established inhibitor of de novo purine biosynthesis, and its anti-proliferative actions are thought to be due to its inhibition of dihydrofolate reductase and folate-dependent transmethylations. Accordingly, defects of both purine and pyrimidine biosynthesis appear to be capable of causing a similar pattern of birth defects. However, at low doses methotrexate also decreases plasma levels of pyrimidines as well as purines. This observation raises the possibility that methotrexate embryopathy might indeed be caused by its effects on pyrimidine rather than purine metabolism. Given that not all embryos exposed to methotrexate manifest birth defects, functional polymorphisms in DHODH or other genes in the de novo pyrimidine biosynthesis pathway could influence susceptibility to methotrexate embryopathy.
Individuals with Miller syndrome have similar phenotypic characteristics to those with Nager syndrome, another rare monogenic disorder that primarily affects the craniofacial skeleton. In contrast to Miller syndrome, the limb defects observed in individuals with Nager syndrome affect the anterior elements of the upper limb. Nevertheless, it has been hypothesized that Miller and Nager syndromes were caused by different mutations in the same gene. We resequenced DHODH in twelve unrelated individuals diagnosed with Nager syndrome but found no pathogenic mutations (data not shown). Accordingly, Nager syndrome and Miller syndrome are either not allelic or Nager syndrome is caused exclusively by mutations in regulatory elements that alter the expression of DHODH.
Rare diseases are arbitrarily defined as those that affect fewer than 200,000 individuals in the U.S. Per this definition, more than 7,000 rare diseases have been delineated, and in aggregate these affect more than 25 million people [Rare Diseases Act of 2002, Section 2, Findings]. The majority of these diseases are “genetic disorders” and many of them are thought to be monogenic. The bulk of genes underlying these rare monogenic diseases remain unknown. Lack of information about the genes and pathways that underlie rare monogenic diseases is a major gap in our scientific knowledge. Discovery of the genetic basis of a large collection of rare disorders that have, to date, been unyielding to analysis will substantially expand our understanding of biology of rare diseases, facilitate accurate diagnosis and improved management, and provide initiative for further investigation of novel therapeutics.
We have demonstrated that exome sequencing of a small number of affected family members or affected unrelated individuals is a powerful, efficient, and cost-effective strategy for markedly reducing the pool of candidate genes for rare monogenic disorders, and may even identify the responsible gene(s) specifically. This approach will likely become a standard tool for the discovery of genes underlying rare monogenic diseases and provide important guidance for developing an analytical framework for finding rare variants influencing risk of common disease.