|Home | About | Journals | Submit | Contact Us | Français|
Mitochondrial disorders are among the most severe metabolic disorders wherein patients suffer from multisystemic phenotypes, often resulting in early death.1 Clinical, biochemical, and genetic heterogeneity among individuals, together with poor understanding of gene‐to‐phenotype relationships, pose significant diagnostic and therapeutic challenges for clinicians. In light of recent advances in next generation sequencing technologies, whole exome sequencing (WES) is emerging as the new global standard for the diagnosis of monogenic disorders, including mitochondrial diseases.2 However, owing to genetic heterogeneity of mitochondrial disorders and ongoing discovery of novel disease genes, WES data may not provide clinicians with enough certainty for a definitive diagnosis.
With these challenges in mind, we present the Leigh Map, a novel computational gene‐to‐phenotype network to be used as a diagnostic resource for mitochondrial disease, using Leigh syndrome (Mendelian Inheritance in Man 256000), the most genetically heterogeneous and most frequent phenotype of pediatric mitochondrial disease,3, 4 as a prototype. Leigh syndrome is a progressive neurodegenerative disorder defined neuropathologically by spongiform basal ganglia and brainstem lesions.4, 5 Clinical manifestations include psychomotor retardation, with regression, and progressive neurological abnormalities related to basal ganglia and/or brainstem dysfunction, often resulting in death within 2 years of initial presentation.4, 6 However, many patients may also present with multisystemic (eg, cardiac, hepatic, renal, or hematological) phenotypes. To date, there are 89 genes known to cause Leigh syndrome, the majority of which are difficult to definitively differentiate from each other, either biochemically or clinically. We hypothesized that these multisystemic features may help to distinguish different genetic subtypes of Leigh syndrome.
The Leigh Map (freely available at vmh.uni.lu/#leighmap), was built on the Molecular Interaction NEtwoRks VisuAlization (MINERVA) platform7 previously used to construct networks of Parkinson disease and human metabolism.8, 9, 10 The network comprises 89 genes and 236 phenotypes, expressed in Human Phenotypic Ontology (HPO) terms,11, 12 providing sufficient phenotypic and genetic variation to test the network's diagnostic capability. The Leigh Map aims to enhance the interpretation of WES data to aid clinicians in providing faster and more accurate diagnoses for patients so that appropriate measures can be taken for optimal management. The phenotypic components of the Leigh Map can be queried to generate a list of candidate genes. In addition, the genetic components of the Leigh Map may also be queried to browse a list of all reported phenotypes associated with a particular gene defect. We propose that this functionality can be used to enhance clinical surveillance of patients with an established genetic diagnosis. Blinded validation of test cases containing clinical and biochemical, but not genetic, data demonstrated that 2 independent testers were able to predict the correct causative gene using this method in 80% of cases. The success of the Leigh Map demonstrates the efficacy of computational networks as diagnostic aids for mitochondrial disease (Fig (Fig11).
The genetic and phenotypic information gathered in this study came from an initial knowledgebase of >900 publications, collected from PubMed (latest search November 2016) and the senior author's personal archive. To facilitate data collection from this large breadth of literature associated with Leigh syndrome, we performed systematic literature mining with QDA Miner Lite (v1.4.2; Provalis Research, Montreal, Quebec, Canada) to generate a list of genes reported to cause Leigh syndrome or Leigh‐like syndromes, and their corresponding phenotypes. Phenotypic information was standardized by manually entering each reported phenotype into Phenomizer (compbio.charite.de/phenomizer),11, 12 a free online resource, which catalogues thousands of standardized human phenotypes, to obtain the appropriate HPO term and number. In addition to obtaining individual Leigh syndrome genes and phenotypes, we collected information on additional parameters that will give users further insight for an informed diagnosis. Such parameters include modes of inheritance, magnetic resonance imaging findings, and patient demographic information. These data were then organized into an Excel file. Although we aimed to rely solely on text mining to obtain these data, some publications required manual clarification, owing to formatting errors on QDA Miner, which were especially prevalent in publications with large tables. In total, we consulted >500 publications to create the Leigh Map. A simplified version of the gene‐to‐phenotype knowledgebase is provided in Tables 1 and 2.
The Leigh Map was manually assembled using CellDesigner (v4.4)13 by incorporating phenotypic, genetic, and demographic data collected through literature mining. The map layout loosely follows mitochondrial structure. The outermost compartment represents the cytosol, where it is possible to find the nucleus and the mitochondrion. Three nuclear genes, nuclear envelope protein NUP62, nuclear export protein RANBP2, and adenosine deaminase ADAR, have been included in our network as genes causing a clinical and radiological phenotype closely resembling Leigh syndrome.14, 15, 16 The mitochondrion is visualized in its double membrane structure, and mitochondrial genes are grouped according to function and can be found in their submitochondrial location (eg, outer membrane, matrix). To represent gene‐to‐phenotype associations, a submap was created for each gene, displaying all phenotypes associated with any given gene defect. Also incorporated at this stage are links to external databases (eg, Uniprot17 and HGNC18) and modes of inheritance. This approach enables a modular overview of the map, avoiding overwhelming the user with the “hairball” effect caused by the high connectivity of the network. All submaps were integrated in the MINERVA framework,7 which makes use of the Google Maps application programming interface, enables content query, and allows a low‐latency interactive navigation of the network and its submodules simply by clicking a specific gene and opening the embedded submap window available on the interface.
Navigation through the network is similar to that of Google Maps, wherein the user can reveal increasingly specific components of information by zooming in on the different compartments (Fig (Fig2,2, Supplementary Figs 1–4). Additional data (patient demographics, modes of inheritance, external annotations, etc) can be accessed by clicking an element of the map. The corresponding data will be displayed in the left panel. The search functionality enables the query of multiple genes and phenotypes. The query results are displayed in the information panel and are also highlighted on the map. When searching for multiple phenotypes, all genes associated with each phenotype will be listed. Opening the submap for any given gene will display 1 or more of the highlighted phenotype elements, providing an immediate visual interpretation of the search results.
The Leigh Map provides data about 89 genes reported to cause Leigh syndrome and Leigh‐like syndromes, the highest number of Leigh syndrome genes that has been collated to date, as well as 236 associated phenotypes. The network consists of >1,700 interactions, all of which can be manually queried by the user. To facilitate access, causative Leigh syndrome genes are segregated according to gene function and arranged on a simplified schematic of the mitochondrion. Genes with similar functions are grouped together in subcategories. Examples of gene categories that can be found on the Leigh Map include genes involved in oxidative phosphorylation (eg, NDUFA1, SDHA) and genes that maintain mitochondrial DNA (eg, POLG, SUCLA2; see Fig Fig2).2). Expression of Leigh syndrome phenotypes in HPO terms11, 12 serves to normalize the network, thereby eliminating discrepancies in clinical jargon for phenotypes for which >1 synonym exists. “Leukodystrophy,” for example, can be described alternatively as “leukoencephalopathy” or “white matter changes.” The use of different nomenclature varies among clinicians and in different geographical regions; therefore, the use of a single HPO term (leukodystrophy; HP: 0002415) simplifies the Leigh Map and encourages its widespread utilization (Fig (Fig33).
Blinded validation by 2 nonclinical investigators using a series of anonymized test cases revealed that the Leigh Map was able to identify the correct gene for 16 of 20 cases. The first and second authors, who both lack formal clinical expertise, acted as independent blinded testers of the network. The anonymized test cases were obtained from the senior author's clinical practice, a national mitochondrial disease clinic where patients with Leigh syndrome who have diverse clinical presentations and genetic causes are diagnosed and managed. The criteria for these test cases were patients who had a definitive genetic diagnosis of Leigh syndrome, confirmed by Sanger sequencing or WES. Testers were provided with clinical vignettes and biochemical data, without genetic information. All corresponding phenotypes identified from each test case were entered into the query box of the Leigh Map, each separated by a semicolon. The search tool then generated a list of candidate genes for each phenotype in individual panels, which were then manually browsed to establish a list of candidate genes (see Fig Fig3).3). We define "candidate genes" as those that include >50% of the queried phenotypes. Due to the immense number of phenotypes on the network, every test case generated a list of potentially causative genes. For 10 cases, the Leigh Map was able to identify the correct gene as the "top hit," that is, the gene corresponding to the highest number of matched phenotypes. The network also predicted the correct gene for an additional 6 cases, in which they were not the top hit. In the remaining 4 test cases, the Leigh Map failed to produce the correct gene as one of the generated candidate genes. In all cases, the Leigh Map produced a shortlist of no more than 8 candidate genes, effectively eliminating ~90% of the genes in the network. Multiple advanced search is not yet possible on this platform, so some manual deduction is required for the use of the Leigh Map at this time.
Due to its high success rate in predicting causative genes by nonclinical testers, we conclude that the Leigh Map is an efficacious diagnostic resource that, in combination with WES data and metabolic testing, can be used by clinicians to provide patients with accurate diagnoses or to direct further biochemical investigation. Increased certainty of the genetic causes of mitochondrial disease has significant implications, because it could potentially attenuate the need for invasive diagnostic procedures, namely muscle biopsy with an attendant general anesthetic, which could pose risk to pediatric patients. It is important to iterate that we do not propose that the Leigh Map act as a substitute for WES data or other relevant functional studies, but rather as a supplement to these techniques.
The computational nature of the Leigh Map allows for the addition of novel disease genes or phenotypes with relative ease; thereby, clinicians have access to a database of all current causative genes, which can enhance the interpretation of WES data. Ideally, we will update both the phenotypic and genetic components of the Leigh Map concurrently with the literature and also develop a facility wherein experts can submit additional genetic or phenotypic information. This is especially beneficial within the context of mitochondrial diseases, because novel genes are constantly being identified. For Leigh syndrome specifically, one‐third of the causative genes were identified within the past 5 years.3
Currently, the most significant limitation of the Leigh Map is the lack of a multiple advanced search facility. Although the absence of this feature does not detract from the network's accuracy, it does reduce its ease of use. Future work aims to implement this feature into the network. Furthermore, the efficacy of the Leigh Map is affected by the breadth of literature available for individual genes. SURF1, one of the earliest mitochondrial disease genes to be identified and the most common nuclear genetic cause of Leigh syndrome, is the subject of numerous publications.19 Thus, SURF1 is associated with>90 phenotypes in the Leigh Map, the largest number for any single gene. In contrast, the recently characterized complex I assembly gene C17ORF89 20 only features in a small section of a larger publication and accordingly is associated with only 2 phenotypes on the Leigh Map, although patients who harbor this mutation may display other phenotypes.
Expanding the current gene‐to‐phenotype binary of the Leigh Map is a future prospect that can further improve its usefulness as a diagnostic resource. Although there are no current curative therapies for mitochondrial disease, there are numerous compounds that are aimed at symptomatic management, including anticonvulsant drugs used to manage epilepsy and cofactor and vitamin supplements, such as coenzyme Q10, thiamine, and biotin, used to treat corresponding deficiencies. The addition of drug targets (a current feature of the MINERVA platform) to the Leigh Map could potentially provide insight into the effectiveness of various agents in treating mitochondrial disease in specific genetic contexts. For example, patients with SLC19A3 mutations respond dramatically to biotin and thiamine therapy,21 whereas those with HIBCH mutations may benefit from N‐acetyl cysteine.22 cDNA and protein mutations and annotations regarding animal models are also useful potential supplements to the Leigh Map. Leigh syndrome is a defined disorder5 wherein certain phenotypes appear almost ubiquitously, including hypotonia (91% of patients), developmental delay (82%), lactic acidosis (78%), and failure to thrive (61%). The failure to deduce the correct candidate genes for a minority of our test cases was due to the predominant presence of these common Leigh syndrome phenotypes and a lack of discriminating phenotypes. We found more success in "diagnosing" cases that presented with less frequently observed phenotypes such as cardiomyopathy (59%), optic atrophy (47%), or renal tubulopathy (15%). Therefore, the addition of these extra elements can be helpful in narrowing down a large list of candidate genes, thereby increasing the predictive power of the Leigh Map. An alternative approach to increase diagnostic power for common phenotypes is to incorporate a scoring system, which is a common element in other bioinformatics resources such as BLAST.23 In the context of our network, we propose "common" phenotypes be scored lower than less frequently observed phenotypes. The addition of a scoring system would complement the more sophisticated advanced search feature that we aim to implement in the future.
Progressive improvements in sequencing technologies and increased global cooperation have allowed for the generation of copious amounts of genetic and clinical information pertaining to mitochondrial disease. The Leigh Map effectively integrates these clinical and scientific data into an efficacious diagnostic resource for a genetically heterogeneous disorder, the success of which provides the basis for the construction of larger computational networks for a wider scope of mitochondrial and metabolic diseases.
S.R. and I.T. were involved in the conception and design of the study. J.R. and A.N. acquired the data and created the network. All authors drafted the manuscript and the figures.
Nothing to report.
Additional supporting information can be found in the online version of this article.
Funding for this study was provided by the British Inherited Metabolic Disease Group, an ATTRACT program grant (FNR/A12/01) from the Luxembourg National Research Fund, and a Great Ormond Street Hospital Children's Charity leadership award (V1260; S.R.).