The creation of human phenomic databases has been suggested to systematically collect and analyze phenotypic information
[15],
[20]–
[22]. In this study, we established a clinical phenotype catalog of 174 mitochondrial disease genes () that account for ~10% of all known disease genes
[26]. In order to define and classify clinical phenotypes from 1,636 medical case reports, we developed a terminologic system that is based on the hierarchical MeSH ontology. Because automated text mining is limited in annotating clinical disorders from the literature
[18],
[19], our mapping of “phenotypes to language” required the manual review of each full-text article
[17]. This classification of phenotypic features for each gene allowed the comparison of disorders between different disease genes (). To measure clinical phenotype similarity between disease genes, we calculated a numerical value (QPA, quantitative phenotypic associations) that takes into account all annotated gene-feature associations, the overlap of features between two disease genes, and the frequency of the shared feature across all genes. Thus, QPA are based on the hypothesis that the value of a feature varies inversely with the number of genes with which it is associated
[16].
The analysis of disease gene pairs with QPA in comparison to Likelihood Ratios (LR) for functional interactions
[34] showed positive correlations. Disease genes with stronger evidence for functional interactions (higher LR) displayed greater similarities in their clinical phenotypes (higher QPA). We discovered the most prominent phenotypic similarities within mitochondrial protein complexes () supporting previously predicted genotype-phenotype associations of protein complexes
[32]. However, we also noted complexes with lower phenotypic similarities (e.g. BCKDH - Maple syrup urine disease; GCC - Glycine encephalopathy) highlighting the importance for individual gene inspection. Since this analysis was limited to disease genes (DG), we were interested in learning the properties of a larger network that included non-disease candidate genes (CG). Utilizing the genome-wide study by Franke et al.
[34], we created a functional network of more than 1.9 million gene interactions for 162 mitochondrial DG and 4,577 CG. Our analysis identified significant differences in functional interactions for DG and CG with a higher average connectivity for CG. This difference was detected for both the mitochondrial and non-mitochondrial gene groups (). In addition, while the total number of DG interactions was similar for DG and CG, the relative fraction of DG interactions (i[disease-genes]/i[all-genes]) was higher for DG indicating that DG are more likely to interact with each other. Previous smaller scale studies (~100× fewer interactions) have predicted intermediate and peripheral positions of DG in gene functional networks with relatively fewer interactions than essential genes
[33],
[46]. Our results expand on this hypothesis showing that essential and non-disease genes (CG) can be distinguished from DG based on gene interaction patterns (). Furthermore, we also identified network properties differentiating mitochondrial from non-mitochondrial genes. Mitochondrial genes showed a lower average connectivity, which may be due to the double-membrane structure of the organelle limiting the detection of protein-protein interactions
[47]. However, the higher connectivity between mitochondrial genes may relativize this problem. Future studies will help to answer the question of the connectivity of mitochondrial genes and perhaps genes of other cellular compartments as well.
In the final part of this study we utilized the discovered interaction patterns to predict new mitochondrial DG. Using two different approaches, we identified 168 non-disease genes that resembled the characteristic interaction patterns of the 162 mitochondrial DG (estimated TP rate

=

85.8%). If diseases are linked to a genomic interval, the predicted DG can be prioritized from a larger list of functional candidates for mutational screen in affected individuals (). For example, the optic atrophy 2 (OPA2) linkage interval contains seven mitochondrial genes that include three known DG of which HSD17B10 is associated with optic atrophy
[48]–
[50], and three predicted DG of which two genes (NDUFB11, TIMM17B) interact with mitochondrial DG causing optic atrophy. Our phenome knowledgebase (
www.mitophenome.org) can also be applied to investigate disorders through gene network association, in particular common conditions that are caused by single gene defects in a subset of patients
[51]. For example, a search for Parkinson disease returns 12 mitochondrial DG with interactions to 24 predicted DG (e.g. CCS, MECR, PRKAR2B). Similarly, seizures and mental retardation, a common combination of mitochondrial features, is caused by 59 DG that interact with 124 predicted DG. With the decreasing cost of DNA sequencing, high-throughput screens linking phenotypes with genotypes will further increase the accuracy of gene-feature associations. To this end, easy navigation between clinical phenotype and gene information promises to aid in the recognition and diagnosis of mitochondrial disorders.