The inference of gene regulatory networks from gene expression data is crucial for enhancing our understanding about relations between genes [1
]. In general, a gene network describes a map of direct physical (biochemical) interactions among genes, gene products or metabolites that occur in the living cell [4
] and, hence, enable a systems biology approach [6
]. It has been demonstrated that gene regulatory networks, as a specific type thereof, can be indirectly inferred from steady state gene expression data, which are measured under different conditions either in individual tissues or cell types [9
In general, it is believed that the gene regulatory network is governed by major hub genes like transcription factors that directly bind specific DNA segments in the nucleus and activate or repress the expression of other genes [1
]. Further, it has been proposed that the genes in cellular networks are organized by a hierarchical and modular structure. This assumption has been studied, e.g., for metabolic networks [13
]. A hierarchical modularity implies functional community structures of interconnected layers in the network with a potentially heterogeneous modularity structure. For example, for the protein network of E. coli
it has been demonstrated that the center of the network has a higher modularity than the periphery of the network [14
In the following, we consider the periphery of a network to be given by leaf genes or linearly connected genes, while the central regions are complex, composed of genes with a high node degree. In [15
] the functional modularity of different layers in the yeast and the E. coli
protein network was observed to be governed mainly by a central and a peripheral layer, connected by an intermediate layer exhibiting a reduced modularity. The central layers of these networks were described to be highly enriched by genes that are located in the nucleus for regulating, e.g., the cell cycle, while the periphery is governed by metabolic, transport systems and cell communication processes. These results are consistent with the simplified view that the physical periphery of a cell produces signaling cascades that are induced by extracellular signals that are detected by transmembrane protein receptors. In turn, this leads to a transduction and amplification of extrinsic and intrinsic signaling cascades through the cytoplasm to the nucleus culminating in the regulation of gene expression. For an intuitive visualization of these intricate processes see Figure .
The gene regulatory network is composed of the transcriptional regulatory network, protein network and a signaling network spanning the whole cell.
The inference of gene interactions in a gene regulatory network from gene expression data is often discussed in connection with the nuclear transcriptional regulatory network [1
]. In the simplified transcription factor vs target gene model, a transcription factor affects directly the gene expression of the mRNA of a target gene. This may give the impression that gene interactions inferred from expression data need to be interpreted in the context of transcription regulation. For this reason, inferred networks from gene expression data are frequently equated with the transcriptional regulatory network. However, this is not justified because expression data convey only information about the dynamic state of genes correspondingly their mRNAs and, hence, do not provide direct information about any type of biochemical binding, including transcription regulation, at all. Instead, inferred interactions from expression data are not limited to transcription regulation, but can also include protein-protein interactions [18
]. To emphasize this, we use the terminology gene regulatory network
for a network that is inferred from gene expression data to point out that this is not necessarily a transcription regulatory network but a mixture of this and a protein-protein network [19
The major purpose of this paper is to infer a gene regulatory network from a large-scale B-cell lymphoma gene expression data set, and to investigate its structural and biological organization. Immature B-cell lymphocytes are cells from the bone marrow that play an important role in the adaptive immune system. When B-cells are activated by an antigen they differentiate to memory B-cells, to antibody secreting plasma B-cells or proliferate intermediately to germinal centers (centroblasts and centrocytes) [20
]. B-cells are one of the most interesting cell types for the study of mammalian signaling and cell differentiation processes due to their unique physiological properties governing the adaptive immune system. Malignancy of the different B-cell lymphocyte types leads to a variety of lymphoma and leukemia disease phenotypes such as B-cell chronic lymphocytic leukemia
(BCLL, germinal center), Burkitt lymphoma
(BL, germinal center), Diffuse large B-cell lymphoma
(DLBCL, germinal center), Follicular lymphoma
(FL, germinal center), Hairy cell leukemia
(HCL, memory B-cells), Mantle cell lymphoma
(MCL, immature B-cells) and Multiple myeloma
(MM, plasma cells). For our analysis, we use the microarray data set from [21
] which contains samples from the germinal centers of lymphoma patients and experimental transformed germinal center cell types.
In a previous study, it has been found that the C3NET inference algorithm has a considerably higher true positive
(TP) rate for leaf edges of genes in a network that are sparsely connected [18
]. For this reason we hypothesize that this method has characteristics which are very beneficial for the inference of peripheral regions of the gene regulatory network of B-cells. Due to the fact that B-cells are highly receptive to external stimuli, as described above, knowledge of these interactions seems viable for gaining a deeper functional understanding of the intricate differentiation processes.
In order to analyze the structural organization of B-cell lymphoma, we infer a gene regulatory network by using C3NET in combination with an ensemble approach. This means, instead of applying the inference method to one data set, we are applying it to a bootstrap [22
] ensemble of data sets. This allows not only to assess local network-based measures down to the level of individual edges [23
] but also to obtain an average
network structure which is amenable for a hierarchical analysis, as we will show in this article.
There are several large-scale B-cell lymphoma related gene expression data sets available of germinal center tumor samples from Diffuse large B-cell lymphoma
(DLBCL), Follicular lymphoma
(FL) and Burkitt lymphoma
]. In this paper, we study the gene regulatory network from B-cell lymphoma by using the data set in [21
]. For an independent validation of our results we study in addition two Diffuse large B-cell lymphoma data set described in [25
To demonstrate the validity of our bootstrap approach, we are using simulations comparing results from a bootstrap ensemble with an ensemble of independently generated data. For a principle overview of the generation of the bootstrap data, see Figure . In this figure, the data set
refers to the k-th data set from the bootstrap ensemble.
Figure 2 Illustration of our simulation set-up to generate bootstrap data sets. Using the true underlying network Gtrue as reference, we estimate F-scores for each of the inferred networks. The colors of the arrows correspond to the boxplots in Figure (more ...)
In this paper, we infer the peripheral region of the gene regulatory network inferred from a large-scale B-cell lymphoma gene expression data set by using the C3NET algorithm. We provide a functional and a structural analysis of the largest connected component for this network. Further, we analyze the hierarchical organization of the network components of the B-cell gene regulatory network as revealed by the bootstrap approach.