|Home | About | Journals | Submit | Contact Us | Français|
Respiratory complex I (NADH:quinone oxidoreductase) is an entry point to the electron transport chain in the mitochondria of many eukaryotes. It is a large, multisubunit enzyme with a hydrophilic domain in the matrix and a hydrophobic domain in the mitochondrial inner membrane. Here we present a comprehensive analysis of the protein composition and post-translational modifications of complex I from Pichia pastoris, using a combination of proteomic and bioinformatic approaches. Forty-one subunits were identified in P. pastoris complex I, comprising the 14 core (conserved) subunits and 27 supernumerary subunits; seven of the core subunits are mitochondrial encoded. Three of the supernumerary subunits (named NUSM, NUTM, and NUUM) have not been observed previously in any species of complex I. However, homologues to all three of them are present in either Yarrowia lipolytica or Pichia angusta complex I. P. pastoris complex I has 39 subunits in common with Y. lipolytica complex I, 37 in common with N. crassa complex I, and 35 in common with the bovine enzyme. The mitochondrial encoded subunits (translated by the mold mitochondrial genetic code) retain their N-α-formyl methionine residues. At least eight subunits are N-α-acetylated, but the N-terminal modifications of the nuclear encoded subunits are not well-conserved. A combination of two methods of protein separation (SDS-PAGE and HPLC) and three different mass spectrometry techniques (peptide mass fingerprinting, tandem MS and molecular mass measurements) were required to define the protein complement of P. pastoris complex I. This requirement highlights the need for inclusive and comprehensive strategies for the characterization of challenging membrane-bound protein complexes containing both hydrophilic and hydrophobic components.
Respiratory complex I (NADH:quinone oxidoreductase) is an entry point to the electron transport chain in mitochondria and many aerobic bacteria. It couples NADH oxidation and quinone reduction to proton translocation across the inner mitochondrial (or plasma) membrane, so it is central to energy transduction. Consequently, complex I dysfunctions are linked to an increasing number of neuromuscular and neurodegenerative diseases, including Parkinson's disease, as well as to oxidative stress and aging (1).
Complex I is an l-shaped assembly, with a hydrophobic “arm” in the inner mitochondrial (or plasma) membrane, and a hydrophilic arm extending into the mitochondrial matrix (or cytoplasm) (2). The 14 core subunits of complex I are sufficient for catalysis and conserved in all complex I–encoding species; they comprise two sets of seven subunits that correspond to the two domains (3, 4). The seven hydrophobic subunits are encoded by the mitochondrial genome in eukaryotes, and the seven nuclear-encoded hydrophilic subunits ligate the flavin mononucleotide (which catalyzes NADH oxidation), and the eight (or nine) iron-sulfur clusters (which transfer electrons from the flavin to quinone). The structure of the hydrophilic domain of complex I from Thermus thermophilus, comprising the seven hydrophilic core subunits and a frataxin-like supernumerary subunit, shows how the subunits and cofactors are arranged (5). In addition to the 14 core subunits, eukaryotic complexes I contain a variable number of supernumerary subunits also. The roles of most of the supernumerary subunits are not well defined, although specific functions for some of them have been proposed (6). Twenty-one supernumerary subunits are common to all eukaryotic complexes I, but additional subunits vary between fungi, plants, and animals, as well as between individual species (7). The most intensively characterized eukaryotic complex I is the 1 MDa complex from Bos taurus that contains 45 dissimilar subunits, all with homologues in the human genome (6, 8–10).
The subunit composition of the complexes I from two fungi, Neurospora crassa and Yarrowia lipolytica, have been analyzed previously. Thirty-nine subunits have been identified in N. crassa complex I (11), and four of them do not have mammalian homologues. The mitochondrial genome sequence of Y. lipolytica was published in 2001 (12) and the first assessment of the subunit composition of complex I from Y. lipolytica in 2002 described 37 subunits, identified by a combination of SDS-PAGE, MALDI-TOF mass spectrometry and Edman degradation (13). Subsequently, three additional subunits were described, and, in 2008, the masses of a set of subunits were measured using laser-induced liquid bead ion desorption MS. The accuracy of the molecular mass data was variable, but observed masses were paired with predicted masses within 130 Da (14). Thus, 40 subunits have been identified in Y. lipolytica complex I; 37 of them are found in N. crassa, and 35 in B. taurus.
Pichia pastoris is a methylotrophic ascomycete, which is a commonly used over-expression host for recombinant proteins. The purification and characterization of complex I from P. pastoris, and the related species Pichia angusta, have been described previously (15). Here we present a comprehensive analysis of the protein composition and post-translational modifications of complex I from P. pastoris, using a combination of proteomic and bioinformatic approaches. Proteomic techniques provide a rapid and efficient means of defining the composition of protein complexes. Here the subunits of P. pastoris complex I were resolved either by reverse-phase HPLC or by SDS-PAGE, and then analyzed by peptide mass fingerprinting and tandem mass spectrometry of tryptic peptides, by MALDI-TOF mass spectrometry. The analysis was complicated by the presence of hydrophobic membrane proteins, and by the presence of small subunits which produce few or no proteolytic fragments. After reverse-phase separation, the subunits were also analyzed by ESI-MS, and their molecular masses provided additional information on the subunit composition and amino acid sequences, and on stable post-translational modifications. Forty-one different proteins have been identified in complex I from P. pastoris, including three proteins (named NUSM, NUTM, and NUUM) that have not been detected previously in any species of complex I.
SDS-PAGE of 10-20 μg of protein was performed on 18-22% acrylamide gradient Laemmli-style gels (18) for 4 h at 300 mV, and stained with 0.2% Coomassie Blue R250. For reverse-phase HPLC, complex I was precipitated by 20 volumes of cold ethanol, centrifuged at 16,000 × g for 3 min, and the supernatant, containing detergents and lipids, was discarded. The pellet was dissolved in 60% (v/v) formic acid, 15% (v/v) trifluoroethanol, and 1% (v/v) hexafluoroisopropanol, and injected onto a PLRP-S reverse phase HPLC column (1 mm i.d. × 75 mm, Varian Inc., Palo Alto, CA). Proteins were eluted at 50 μl min−1 in a linear gradient of 0-70% propan-2-ol and 15-20% trifluoroethanol, in 1% hexafluoroisopropanol and either 50 mm ammonium formate (pH 3.1) or 0.1% trifluoroacetic acid (pH 1.8) (19).
Protein bands were excised from the SDS-PAGE gels, diced into small cubes (approximately 1 mm3), and digested by trypsin as described previously (20). The tryptic peptides were analyzed by peptide mass fingerprinting and tandem MS using an Applied Biosystems/MDX SCIEX 4800 Plus MALDI TOF/TOF mass spectrometer, with α-cyano-4-hydroxy cinnamic acid as the matrix. Spectra were calibrated using the trypsin autolysis peptides at 2163.057 and 2273.160 Da, and a calcium-associated matrix ion at 1060.048 Da. Peptide sequences were obtained using the same spectrometer, by collision-induced dissociation, at a collision energy of 1 kV. Typically, tandem MS spectra contained data accumulated from 2,500 laser pulses. Fractions from HPLC were dried in a speed-vac centrifuge, then dissolved in 0.2% (w/v) ammonium bicarbonate, 0.5 mm CaCl2, and 12 ng/ml trypsin, and digested overnight at 37 °C.
MALDI-TOF mass spectra were smoothed and monoisotopically labeled using the 4000 Data Explorer software (version 3.5.3, Applied Biosystems). Monoisotopic peak lists were generated using the peaks-to-mascot feature, with a minimum signal-to-noise ratio of 10, and a peak density of 10 (occasionally 15) peaks per 200 Da. Known peaks from human keratin, trypsin autolysis, and matrix associated ions were excluded (see Supplementary Information). Then, the peak lists were compared with the in-house P. pastoris protein sequence database (5,209 sequences, see below), or to the online National Center for Biotechnology Information (NCBInr) database (version 20080912 comprising 7,031,513 sequences) using the Mascot (Matrix Science Ltd., London, UK) search algorithm, version 2.1.0 (21). The significance threshold for peptide identification was p < 0.05, and a single missed cleavage was allowed. The mass tolerances were 70 ppm for peptide mass fingerprinting and 0.8 Da and 120 ppm for tandem MS. N-terminal acetylation and formylation, cysteine propionamidation, and methionine oxidation were selected as variable modifications in initial searches. Error tolerant searches were also performed to identify additional matches that rely on amino acid substitutions or further artifactual or unusual modifications.
Fractions from reverse-phase HPLC were analyzed by ESI-MS in positive ion mode, either online with a Quattro Ultima triple quadrupole mass spectrometer (Waters-Micromass, Waters Corp., Milford, MA) scanning 700–2100 m/z every 5 s, or offline using a Q-TOF1 mass spectrometer (Waters-Micromass). Samples were loaded into the Q-TOF1 by flow injection into 50% isopropanol, and both spectrometers were calibrated with myoglobin and trypsinogen over the mass/charge range of 616 to 2181. Protein molecular masses were determined from series of multiply charged ions, using the component analysis function of MassLynx, version 3.4 (Waters-Micromass).
The genome sequence of P. pastoris strain X-33 was obtained from Integrated Genomics Inc. (Chicago, IL) in flat file format, and comprised 13 contiguous segments (contigs): 12 nuclear contigs and the mitochondrial genome sequence. A set of approximately 5,000 ORFs and their translated protein sequences were obtained also. Consequently, a genomic DNA sequence database and a protein sequence database were assembled in FASTA format, and tblastn and blastp were used, respectively, to identify the sequences of known complex I subunits (NCBI BLAST v 2.2.16, statistical E-value threshold 1 × 10−10) (22). Comparisons were performed with the amino acid sequences of the subunits of Y. lipolytica complex I plus two subunits (10.4 kDa and NURM) specific to N. crassa complex I (14), and 10 additional subunits from B. taurus complex I (8). Twenty-seven putative complex I subunits, homologous to known nuclear-encoded subunits from Y. lipolytica, were identified in the P. pastoris protein sequence database (although the ORF lengths for NUEM and NUKM required adjusting to produce proteins similar in size to their Y. lipolytica homologues, and a single base deletion was corrected in the C-terminal region of NUAM). A further five putative nuclear-encoded sequences were identified in the DNA sequence database, and were translated, along with their flanking regions, using the NCBInr ORF finder tool (ORFFINDER) (23). ORFs for proteins of similar size to the Y. lipolytica homologues were identified for two sequences, and single introns were predicted in the remaining three (NI9M, NB5M, and NIDM) using the prediction program AUGUSTUS (24) with the organism set as Debaryomyces hansenii. The five additional sequences were added to the in-house P. pastoris protein sequence database. No homologues to the Y. lipolytica NUNM subunit, the N. crassa 10.4 kDa and NURM subunits, or the 10 subunits considered specific to mammalian complex I were identified. Finally, ORFFINDER was used to translate the P. pastoris mitochondrial genome, using both the “yeast” and the “mold, protozoan and coelenterate” codes (the yeast code is used by, for example, Saccharomyces cerevisiae (25) and Kluveromyces thermotolerans (26) and the mold code by, for example, N. crassa (27) and Y. lipolytica (12)). Both codes produced viable sequences for the seven expected subunits of complex I, which were identified by comparison with the NCBInr database and added to the in-house P. pastoris database. In summary, 39 putative subunits of P. pastoris complex I (seven nuclear core subunits, seven mitochondrial core subunits, and 25 supernumerary subunits) were identified. All the DNA and protein sequences for the P. pastoris complex I subunits have been deposited in the European Molecular Biology Laboratory (EMBL) database and the respective accession numbers, along with the amino acid sequences, are presented in the Supplementary Information.
Fig. 1 shows an analysis of complex I from P. pastoris by SDS-PAGE. Each band was excised from the gel, digested with trypsin, and the products were analyzed by peptide mass fingerprinting and tandem mass spectrometry. The unstained regions between the bands were analyzed also because the hydrophobic subunits of complex I do not stain intensively with Coomassie Blue. The data were compared against the in-house P. pastoris protein sequence database and the data are included in Table I and presented fully in Table S1. Thirty-four of the 39 predicted subunits were detected by peptide mass fingerprinting, and 35 were detected by tandem MS. All seven nuclear encoded core subunits were detected, and five of the seven mitochondrial encoded subunits; NU6M and NULM were not detected because, irrespective of the mitochondrial translation code used, their sequences do not contain any tryptic peptides with masses between 700 and 3,000 Da. Significantly more peptides were matched to sequences produced by translation with the mold mitochondrial genetic code (see Table I) than with the yeast mitochondrial code. In particular, subunit NU2M was only identified when the mold genetic code was used, and only the mold code explains a peptide mass and sequence data from subunit NU5M, corresponding to the peptide LIYYTFLNNPNSPK (ATA is translated as Ile by the mold code (underlined), and as Met by the yeast code). Twenty-three of the 25 predicted supernumerary subunits were detected; the ST1 and ACPM2 subunits were not. Two additional proteins, that have been named NUSM and NUUM and which are not similar to any known subunits of complex I, were detected also. Note that the preparation of complex I described here is highly pure: peptide mass fingerprinting and tandem MS analyses sporadically detected only four additional proteins: plasma membrane H+– adenosine triphosphatase, band 7 stomatin protein, 40s ribosomal subunit and a GatB/YeqY superfamily protein (genbank entries 254565045, 254569368, 254566987, and 254566543, respectively, from P. pastoris GS115). All four proteins already have known functions, are present in only low amounts (they were not visible in SDS-PAGE) and were not detected by molecular mass measurements, so they are considered low-level impurities.
To evaluate the completeness of our analysis, complex I from Y. lipolytica was analyzed alongside that from P. pastoris, by comparison of mass spectrometry data with the online NCBInr database. Thirty-six of the 40 known subunits were detected. The second acyl carrier protein (ACPM2) and the mitochondrial encoded subunits NU2M, NU6M, and NULM were not detected; NU6M and NULM do not produce tryptic peptides with masses within the range analyzed.
Fig. 2 shows a typical reverse-phase chromatogram for the fractionation of the subunits of P. pastoris complex I. The reverse-phase procedure is compatible with the recovery of intrinsic membrane proteins, and separates both hydrophilic and hydrophobic components in a form suitable for ESI MS (19). The hydrophilic proteins elute first, and the highly hydrophobic proteins elute toward the end of the gradient. The eluting proteins were either collected manually and their intact molecular masses were analyzed by ESI MS, or the eluant from the column was analyzed online using ESI MS. In some experiments, portions of each fraction were digested with trypsin and analyzed using peptide mass fingerprinting and tandem MS to determine the order in which the proteins elute, and to correlate molecular masses with specific subunits. The combined data from all ESI-MS experiments are summarized in Table II. Note that, for some of the larger proteins, the peaks in the ESI spectra were broad; therefore, precise measurements were difficult.
Molecular masses (within 100 ppm) were obtained for 21 of the 39 putative subunits of P. pastoris complex I, and for the two additional proteins, NUSM and NUUM (see Table II). The data obtained for core hydrophilic subunits NUAM, NUBM, NUCM, and NUHM were less accurate and no data were obtained for either NUIM or NUKM. Of the mitochondrial-encoded subunits, corresponding molecular masses (within 100 ppm) were obtained for NU6M and NULM (when the sequences were translated with the mold mitochondrial code). The masses for four further mitochondrial encoded subunits differed by 3.8 to 36 Da from their predictions, and no mass was assigned for NU5M. Reliable molecular mass assignments were made for most supernumerary subunits, although the masses obtained for NUFM, NI2M, NIPM, ACPM1, NIDM, and NB2M were outside the accepted 100 ppm tolerance. Satisfactory explanations for the differences are lacking. They may result from errors or variations in the sequence data (as for NIDM and NB2M, see Table II), protein isoforms, or protein modifications. No mass data were obtained for ACPM2 (with or without a pantetheine-4′-phosphate modification, with or without an acyl group). Putative subunit ST1 remained undetected (therefore, it is not considered present in complex I from P. pastoris); however, a mass similar to that predicted for ACPM1 (with a pantetheine-4′-phosphate modification) was detected, and tandem MS analysis of peptides from the digest of a reverse-phase HPLC fraction detected two corresponding peptides (see Table II and Table S1). Finally, the molecular mass of a third unknown protein component corresponded to a P. pastoris protein named NUTM (the identification was subsequently supported by peptide mass fingerprinting and tandem MS data, see Table S1). In summary, the reverse-phase separation and associated molecular mass data led to the identification of four subunits (NULM, NU6M, NUTM, and ACPM2) which were not detected by analysis following SDS-PAGE, and accurate molecular masses for 24 subunits.
Table II summarizes the post-translational modifications of the subunits of P. pastoris complex I. Ten subunits are known to be unmodified, either with (one) or without (nine) the initiator methionine. Fourteen subunits are known to be modified by the removal of N-terminal mitochondrial import sequences. Eight subunits are known to be N-α-acetylated (seven of them lack the initiator methionine, and two of them (NUZM and NB6M) are partially acetylated). In B. taurus complex I, 18 subunits have mitochondrial import sequences (6) and only one conserved subunit, B13, differs in this respect between P. pastoris and B. taurus. In B. taurus complex I, 13 subunits are N-α-acetylated (6), but the acetylation pattern is not strongly conserved, and only five subunits are N-α-acetylated in both species. Two mitochondrial-encoded subunits, NU6M and NULM, were confirmed to retain their N-α-formyl groups in P. pastoris: previous measurements on the B. taurus ND subunits showed that all seven of them retain the N-α-formyl group (19). Finally, all the N-terminal modifications reported in Table II are fully consistent with both peptide mass fingerprinting and tandem MS data, and for 12 subunits the molecular masses of the N-terminal peptides of the processed proteins were measured directly (see Table S1).
Three further post-translational modifications have been observed in B. taurus and N. crassa complexes I. In B. taurus, the histidine-rich N-terminal peptide of B12 (NB2M) is modified by the methylation of one, two, or three histidine residues (28). The B. taurus N-terminal peptide is not conserved in the shorter P. pastoris subunit (or in those of N. crassa or Y. lipolytica) and no evidence for histidine methylation was observed here (see Table II). In B. taurus, subunit B18 (NB8M) is N-α-myristoylated (29), but subunit NB8M is unmodified in both P. pastoris (see Table II) and Y. lipolytica (14) (the N terminus of NB8M is truncated in P. pastoris and Y. lipolytica, relative to B. taurus). A myristoylated lysine has been identified in NU5M of N. crassa complex I (30); it is not present in B. taurus (19), but cannot be either excluded or confirmed in P. pastoris because the mass of NU5M is not known. The SDAP (ACPM) subunits of B. taurus, N. crassa, and Y. lipolytica complex I are all modified by covalent linkages between a serine and a pantetheine-4′-phosphate group, and then by an acyl group. Here, the molecular mass data indicates that the ACPM1 subunit of P. pastoris complex I is modified by a pantetheine-4′-phosphate group also, but no acyl group has been detected.
The genomic sequences of six nuclear-encoded subunits of P. pastoris contain introns (see Supplementary Information). The introns are 62 to 140 bases long, and they all start with the common 5′ consensus hexamer, GTAAGT (31). In comparison, all the supernumerary subunits of Y. lipolytica complex I except NUEM, NESM, and ST1 have been reported to contain at least one intron (32), but, as for P. pastoris, none of the core hydrophilic subunits contain any. In contrast, all the nuclear-encoded subunits of N. crassa complex I have been reported to contain introns (11), revealing P. pastoris as a relatively simple model organism for the genetic manipulation of complex I.
Several lines of evidence show clearly that P. pastoris, like Y. lipolytica and N. crassa, uses the mold mitochondrial genetic code, rather than the yeast code. The molecular masses of NU3M, NU6M, and NULM correspond closely to masses predicted only by the mold code, and tryptic peptides were identified for NU2M only following translation by the mold code. In addition, a peptide from NU5M (see above) analyzed by tandem MS clearly matches a peptide sequence predicted by the mold code, but not by the yeast code.
The mitochondrial genomes of fungi often contain introns; they are difficult to predict, and may either encode endonucleases or other functional genes, or be noncoding (33). Introns have been identified in the ND subunit genes in N. crassa (one in each of NU1M, NULM, and two in NU5M (34, 35)) and Y. lipolytica (one in NU1M and two in NU5M (12)). None of the ND subunits from other fungal complexes I have been characterized sufficiently, but the genes encoding the ND subunits in Debaryomyces hansenii, Pichia Canadensis, and Candida albicans (which are closely related to P. pastoris) are not predicted to contain introns (36). The number of introns present in complete fungal mitochondrial genomes varies widely, from approximately 35 in Podospora anserina (37), to only two in P. canadensis (38). Therefore, there may be introns in the genes for the ND subunits of P. pastoris. The masses of NU6M and NULM correspond closely to the masses predicted by direct translation; therefore, these sequences do not contain introns. For NU1M, NU2M, NU3M, and NU4M, masses were observed that are close to the masses predicted by direct translation, and the translated proteins are homologous to proteins from other species, suggesting that there are no introns in these sequences. Two observed masses (59,086.1 (S.D. 5.0) and 62,012.5 (S.D. 7.3) Da) could be assigned to N-α-formylated NU2M, corresponding to alternative initiation codons (the first corresponds more closely to its calculated mass), and no mass from any putative sequence for NU5M was identified. Our data suggests it is unlikely that there are introns present in the genes for the NU1M, NU3M, NU4M, NU6M or NULM subunits in P. pastoris; however, it is possible that there are introns in the genes for NU2M and NU5M.
Our analyses provide strong evidence for two previously unknown subunits in complex I from P. pastoris. A molecular mass (Table II) together with high sequence coverage (60%) from peptide masses and tandem MS data for seven tryptic peptides provide unequivocal evidence for protein NUSM. The data for protein NUUM include a molecular mass and high confidence identification of three tryptic peptides by tandem MS (Table I). The presence of a third protein, NUTM, is less certain. A molecular mass corresponding to the sequence NUTM was obtained by ESI MS, although only a single tryptic fragment was identified with high confidence by tandem MS (Table S1).
The genomes of related organisms were searched for homologues to the three putative new subunits using tblastn (see Fig. 3). Homologues to NUSM and NUTM were identified in P. angusta and Dekkera bruxellensis, but not in Y. lipolytica or N. crassa. Homologues to NUUM were identified in Glomerella graminicola, Aspergillus fumigatus, N. crassa, Magnaporthe grisea, D. bruxellensis, and Y. lipolytica. No P. angusta homologue was identified, but the publically available genome sequence is incomplete. Re-analysis of peptide mass fingerprinting and tandem MS data for complex I from P. angusta (15) subsequently revealed the homologues to NUSM and NUTM. Significant Mascot scores were obtained for both proteins with peptide mass fingerprinting and tandem MS data obtained from SDS-PAGE bands consistent with the expected masses of 21 kDa and 10 kDa (Table S2). In addition, a molecular mass obtained from ESI analysis following reverse-phase fractionation of the P. angusta enzyme corresponds to the predicted sequence of NUTM (predicted 8890.3, measured 8891.1 Da). Re-analysis of the peptide mass finger printing and tandem MS data from complex I from Y. lipolytica revealed two peptides from the homologue to NUUM (see Table S2) from a protein with an apparent mass of approximately 8 kDa in SDS-PAGE. The identification of the three proteins in the purified complexes I from other species increases the likelihood that they are true subunits of the enzyme. None of the three proteins display significant sequence similarity to any previously known complex I subunits, and all three are predicted to contain a single transmembrane helix (using ConPred II (39)).
Forty-one proteins have been detected in complex I from P. pastoris, comprising the 14 conserved core subunits, and 27 supernumerary subunits, three of which have not been detected previously in any other species of complex I, and two of which, NUXM and NUZM, are specific to fungal complexes I (7). Forty-one proteins have also been detected in complex I from Y. lipolytica (including a NUTM homologue), and Y. lipolytica and P. pastoris complex I have 39 subunits in common; NUSM and NUTM have been detected in P. pastoris but not Y. lipolytica, and NUNM and ST1 in Y. lipolytica but not P. pastoris. Thirty-nine proteins have been detected in complex I from N. crassa (11), and N. crassa and P. pastoris complex I have 37 subunits in common. NUSM, NUTM, NUUM, and ACPM2 have been detected in P. pastoris but not N. crassa, and the 17.8/NURM and 10.4 kDa subunits in N. crassa but not P. pastoris. It is interesting that most of the variable subunits (NUSM, NUTM, NUUM, NUNM, NURM, and 10.4 kDa) contain a single transmembrane helix, as do four of the subunits found in B. taurus complex I but not in the three fungal species (KFYI, MLRQ, AGGG, and SGDH) suggesting that they may substitute for one another.
The sequences of the seven core hydrophilic subunits of P. pastoris complex I were compared with those of a number of other fungi using pair-wise alignment with ClustalW (40) (see Fig. S8). Of the species compared, P. pastoris complex I is most closely related to the enzymes from Pichia stipitis and D. hansenii, and least closely related to the enzyme from Ustilago maydis. In general, the comparison is consistent with the classification of P. pastoris in the family of Saccharomycetaceae (see Fig. S8).
The sequences of the subunits of P. pastoris complex I were compared with those from Paracoccus denitrificans, Y. lipolytica, and B. taurus (as examples of bacteria, fungi, and mammals) using pair-wise alignment. The percentage identities and similarities, insertions or extensions, and deletions or truncations, are presented in Table S3. As expected, P. pastoris complex I is most closely related to the enzyme from Y. lipolytica. P. denitrificans complex I contains significantly fewer subunits, and they are generally less similar to the P. pastoris proteins than the B. taurus subunits are. The core hydrophilic subunits are best conserved between species with 47-87% similarity (32-75% identity). The core hydrophobic subunits are notably less conserved, with 40-77% similarity (17-49% identity); subunits NU2M and NU6M are the least conserved, and there are significant differences in length in several cases. In particular, P. pastoris NU2M is significantly longer than the B. taurus homologue. Among P. pastoris and B. taurus or Y. lipolytica, the supernumerary subunits are generally less conserved than the core subunits, with 27-82% similarity (13-63% identity). The best conserved supernumerary subunits (>50% similarity for B. taurus and >60% for Y. lipolytica) are NUEM (39 kDa), NB6M (B16.6), NB4M (B14), and NI8M (B8); the least conserved (<35% similarity for B. taurus and < 50% for Y. lipolytica) are NUJM (B14.7), NIMM (MWFE), NIPM (15 kDa), and NB5M (B15). There are significant differences in the sizes of a number of subunits between P. pastoris and B. taurus (notably NESM (ESSS), NIMM (MWFE), NI2M (B22), NIDM (PDSW), and NB5M (B15)). NIMM is particularly interesting because the P. pastoris protein is significantly longer than the Y. lipolytica and B. taurus proteins because of an internal insertion of 22 amino acids, and C-terminal extensions of 34 or 53 amino acid residues, respectively.
This article contains supplemental material.