Mitochondria are dynamic organelles essential for cellular life, death, and differentiation. Although they are best known for ATP production via oxidative phosphorylation (OXPHOS), they house myriad other biochemical pathways and are centers for apoptosis and ion homeostasis. Mitochondrial dysfunction causes over 50 diseases ranging from neonatal fatalities to adult onset neurodegeneration, and is a likely contributor to cancer and type II diabetes (
DiMauro and Schon, 2003;
Lowell and Shulman, 2005;
Wallace, 2005). The 13 proteins encoded by the mitochondrial genome have been known since its sequencing (
Anderson et al., 1981) and have been linked to a variety of maternally inherited disorders. However, there may be as many as 1500 nuclear-encoded mitochondrial proteins (
Lopez et al., 2000), though less than half have been identified with experimental support. A complete protein inventory for this organelle across tissues would provide a molecular framework for investigating mitochondrial biology and pathogenesis.
Recent progress in defining the mitochondrial proteome has been driven by large-scale approaches, including mass spectrometry (MS) based proteomics in mammals (
Forner et al., 2006;
Foster et al., 2006;
Johnson et al., 2007;
Kislinger et al., 2006;
Mootha et al., 2003a;
Taylor et al., 2003) and yeast (
Reinders et al., 2006;
Sickmann et al., 2003), epitope tagging combined with microscopy in yeast (
Huh et al., 2003;
Kumar et al., 2002), and computation (
Calvo et al., 2006;
Emanuelsson et al., 2000;
Guda et al., 2004). However each of these methods suffers from intrinsic technical limitations. MS-based approaches struggle with distinguishing genuine mitochondrial proteins from co-purifying contaminants, and published reports exhibit up to 41% false positive rates (
Table S1). Additionally, these approaches tend to miss low abundance proteins or those expressed only in specific tissues or developmental states, and thus capture only 23-40% of known mitochondrial components (
Table S1). Other experimental approaches such as epitope tagging are limited by the availability of cDNA clones, tag interference, and over-expression artifacts. While integrative machine-learning methods can be more comprehensive (
Calvo et al., 2006;
Jansen et al., 2003), they require subsequent experimental validation.
Here, we perform in-depth protein mass spectrometry, microscopy, and machine learning to construct a protein compendium of the mitochondrion. We perform MS-based proteomics on both highly purified and crude mitochondrial preparations to discover genuine mitochondrial proteins and distinguish them from contaminants based on enrichment. We integrate these MS data with six other genome-scale datasets of mitochondrial localization using a Bayesian framework and additionally perform the most extensive GFP tagging study focused on mammalian mitochondria. The resulting compendium consists of 1098 genes () and their protein expression across 14 mouse tissues. Although not complete, this represents the most comprehensive and accurate molecular characterization of the organelle to date.
Our compendium provides a framework for identifying novel proteins within pathways resident in the mitochondrion. Here, we focus on complex I (CI) of the electron transport chain, a macromolecular structure composed of ~45 subunits in mammals (
Carroll et al., 2006). CI deficiency is the most common cause of rare, respiratory chain diseases (
DiMauro and Schon, 2003) and has been implicated in Parkinson's disease (
Schapira, 2008). Half of the patients with CI deficiency lack mutations in any known CI subunit, suggesting that yet unidentified genes crucial for maturation, assembly, or stability of CI are mutated in the remaining cases (
Janssen et al., 2006). Multiple assembly factors for much smaller complexes IV and V have been identified in
S. cerevisiae, and it is estimated that complex IV alone requires over 20 factors (
Devenish et al., 2000;
Fontanesi et al., 2006). However, the absence of CI in
S. cerevisiae has impeded similar studies and, to date, only three CI assembly and maturation factors have been identified (
Ogilvie et al., 2005;
Saada et al., 2008;
Vogel et al., 2005).
To systematically discover proteins essential for CI function, we apply the technique of phylogenetic profiling which uses shared evolutionary history to highlight functionally related proteins (
Pellegrini et al., 1999). This approach was recently used to identify the CI assembly factor NDUFA12L using five yeast species (
Ogilvie et al., 2005). We apply this approach more broadly to our mitochondrial protein inventory and report that 19 of these proteins share ancestry with a large subset of CI proteins. We validate several of these predictions in cellular models and additionally report that one of these genes,
C8orf38, harbors a causative mutation in an inherited CI deficiency.
Together, these studies illustrate the utility of an expanded mitochondrial inventory in advancing basic and disease biology of the organelle. Our compendium, called MitoCarta, is freely available at
www.broad.mit.edu/publications/MitoCarta.