Increasing amounts of data that can be mined for information about how proteins in cells assemble as metabolic pathways, signal transduction pathways, and gene circuits, are generated each day. Datasets available for such tasks include the primary literature, large scale micro array experiments, whole genome two hybrid screenings, full genome sequences, and the patterns of conserved/non-conserved homologues and orthologues in them. Theoretical and computational methods are being developed and used to analyze these different types of data and infer networks of proteins or genes that are involved in the same cellular process(es) (e.g. [
1-
10]).
In general, the networks derived by the computational analysis of these data are static, in the sense that they provide little information, if any, about the flow of causality and events in the process and no information about the dynamics of the processes and its regulation (however, see [
11]). For example, the involvement of proteins X, Y and Z in a process does not elucidate if X catalyzes a reaction that produces a substrate for another reaction catalyzed by Z or by Y, or if X modulates Y or Z activity. This can be an important problem while assembling the network structure of either novel pathways (e.g. Iron-Sulfur Cluster biogenesis) or complex pathways with an unclear reaction and regulation network, (e. g. cell cycle). Thus, it is a challenge to transform the network of interactions inferred from the analysis of static data into a causal network that allows for the creation of mathematical models whose dynamic behavior can be analyzed and tested against experimental observations.
To achieve such a goal, strategies that combine the different theoretical and computational methods to identify proteins and generate a set of plausible alternative network topologies for the process of interest are needed. Such networks can then be translated into mathematical models whose dynamic behavior can be analyzed and compared to that of the real system, thus discriminating against some of the proposed topologies when they do not reproduce the expected behavior. Such an analytical process integrates omics data and provides testable predictions and information about systemic behavior.
The more than likely absence of known mechanistic and kinetic data for each of the individual proteins in a novel pathway hinders the process of translating network topology into a mathematical model. A way around the problem is by using approximation theory [
12]. This well-established methodology approximates the continuous functions that typically describe the kinetics of protein processes by using, for example, truncated Taylor series, either in linear or non-linear spaces (see e.g. [
13-
19]). Among the non-linear approximations, the power-law formalism provides a useful representation that comes associated with powerful and eclectic analytical methods (see e.g. [
20-
24]).
In this paper, we shall focus on defining and applying a global strategy combining bioinformatics tools and mathematical modeling to reconstruct the network structure of a pathway. Computational tools will be used for a) obtaining relevant information on genes and proteins that are identified as playing a role in the target pathway, b) checking putative interactions between proteins, c) testing the co-evolution of different proteins, and d) for setting-up alternative networks that accommodate all this information. Then, expert knowledge is used to curate the set of alternative network structures. Finally, mathematical models are used to explore the systemic behavior of each alternative network and comparing it with existing experimental data.
As a benchmark problem we shall focus on the Iron-Sulfur Cluster (ISC) biogenesis pathway. ISC are widespread cofactors of proteins that work as catalytic mediators, as electron transport mediators, and as sensors for the oxidation state of the cell and of its environment [
25-
32]. Although ISC have been known to assemble autonomously in proteins, in recent years, an evolutionarily conserved set of proteins that controls this assembly has been identified [
29,
33,
34]. In eukaryotes, initial ISC biogenesis is mitochondrial [
35]. Deregulation of ISC biogenesis in humans can create different pathological effects, leading to diseases such as Friedreich's ataxia, X-linked sideroblastic anemia, or hypochromic anemia. In yeast, deleting one ISC biogenesis gene creates cells that accumulate iron and have a decrease/deregulation in the activity of ISC dependent proteins. The extent of the phenotype ranges from mild (e.g. Δ
GRX5 strains [
36]) to lethal (e.g. Δ
ARH1 strains [
37]), depending on the protein that is mutated. Friedreich's ataxia is linked to mutations in one of the ISC biogenesis proteins (Frataxin) which has as a homologous protein in yeast the proteinYfh1 (Yeast Frataxin Homologue 1). Additionally, iron accumulation can lead to cellular aging and its associated diseases.
Although spontaneous assembly of ISC has been known to occur both
in vivo and
in vitro, it has been observed that mutations in a set of proteins that are evolutionarily conserved cause defects in ISC biogenesis. These proteins are evolutionarily conserved and form a putative ISC biogenesis pathway. The details and topology of this pathway are still not fully understood. In
S. cerevisiae, the eukaryotic organism in which the ISC biogenesis has been more extensively studied, the following proteins are involved: Arh1, Yah1, Yfh1, Isu1, Isu2, Isa1, Isa2, Nfu1, Nfs1, Isd11, Mge1, Ssq1, Jac1, Atm1 and Grx5 (Table ). The current dogma in the field assumes that Isu1, Isu2, Isa1, Isa2 and Nfu1 are somehow the scaffolds where the ISC initially assembles before being transferred to the appropriate ISC dependent apo-proteins. However, recent results may be casting some doubt into this, as there appears to be some involvement of Isa1/Isa2 in Fe supply for the clusters of specific ISC dependent apo-proteins. Furthermore, the role of Nfu1 is unclear. Atm1 is likely to be the transporter involved in exporting ISC to the cytoplasm. Arh1 and Yah1 are a feredoxin reductase-feredoxin pair that probably regulates electron transfer during the initial assembly of the cluster. Nfs1 is a cysteine desulfurase that provides the sulfur for the clusters and Isd11 is fundamental for Nfs1 to fulfill its role. It is unclear how Isd11 facilitates the functions of Nfs1. In bacteria, some cysteine desulfurases also have an assistant protein that facilitates the transfer of sulfur to the clusters via formation of and S-S bond. However, Isd11 does not have cysteine residues, which precludes such a mechanism for its action. Ssq1 (HSP 70 like protein), Jac1 (HSP 40 like protein) and Mge1 (Nucleotide exchange factor) are protein chaperones that are involved in assisting the pathway, although their exact role is unclear. It has been shown that Isu1 activates the ATPase activity of the HSP70 type chaperone Ssq1. Atm1 appears to participate in the exporting of the ISC clusters from the mitochondrial matrix to the cytoplasm. Again, the exact substrate of Atm1 is unknown. Grx5 is a monothyolic glutaredoxin whose function in ISC biogenesis is unclear. In prokaryotes this biogenesis is cytoplasmatic. In some cases more than one system is involved in the biogenesis of ISC. For example in
E. coli, the ISC system (homologue to that of
S. cerevisiae) [
38-
42] and the Suf system [
43] are parallel systems that are involved in the biosynthesis of ISC. While the ISC system is the one responsible for regular assembly of ISC, the Suf system becomes important when the bacteria are under oxidative stress.
| Table 1Proteins involved in ISC synthesis in Saccharomyces cerevisiae. |
As mentioned in the previous paragraph, there is enough information to attribute a function to some of the proteins involved in ISC biogenesis in
S. cerevisiae. This is the case for example of Nfs1, Isu1, Isu2, or Atm1. However, the role of other proteins is still not clear. For example, what do Isa1, Isa2 or Nfu1 do in the process? What is the role of the chaperones Ssq1-Jac1-Mge1 or of Grx5 in ISC biogenesis? Thus ISC biogenesis is a good benchmark problem for the application of the methodology we describe, as it will provide the chance to validate some of the prediction with published experimental results. Simultaneously, the methodology will generate biological insight regarding some of the proteins with an unclear role, thus creating an added value from the methodological and from the biological point of view. In previous papers [
34,
44,
45] we combined structural bioinformatics with experiments and kinetic modeling to investigate the possible role of proteins Arh1, Yah1 and Grx5 in mitochondrial ISC biogenesis. In this paper we present a structured computational approach that is used to infer and analyze probable topologies for the global network of mitochondrial ISC biogenesis. We analyze seven of the proteins involved in the process (Arh1, Yah1, Yfh1, Grx5, Nfs1, Ssq1 and Jac1), proposing likely systemic roles for their action in ISC biogenesis.