PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of aabcDove Medical PressThis ArticleSubscribeSubmit a ManuscriptSearchFollowDovepressAdvances and Applications in Bioinformatics and Chemistry
 
Adv Appl Bioinform Chem. 2012; 5: 23–59.
Published online 2012 September 7. doi:  10.2147/AABC.S32622
PMCID: PMC3459542

A novel biclustering approach with iterative optimization to analyze gene expression data

Video abstract

Video

Keywords: biclustering, microarray data, genetic algorithm, Pearson’s correlation coefficient

Abstract

Objective

With the dramatic increase in microarray data, biclustering has become a promising tool for gene expression analysis. Biclustering has been proven to be superior over clustering in identifying multifunctional genes and searching for co-expressed genes under a few specific conditions; that is, a subgroup of all conditions. Biclustering based on a genetic algorithm (GA) has shown better performance than greedy algorithms, but the overlap state for biclusters must be treated more systematically.

Results

We developed a new biclustering algorithm (binary-iterative genetic algorithm [BIGA]), based on an iterative GA, by introducing a novel, ternary-digit chromosome encoding function. BIGA searches for a set of biclusters by iterative binary divisions that allow the overlap state to be explicitly considered. In addition, the average of the Pearson’s correlation coefficient was employed to measure the relationship of genes within a bicluster, instead of the mean square residual, the popular classical index. As compared to the six existing algorithms, BIGA found highly correlated biclusters, with large gene coverage and reasonable gene overlap. The gene ontology (GO) enrichment showed that most of the biclusters are significant, with at least one GO term over represented.

Conclusion

BIGA is a powerful tool to analyze large amounts of gene expression data, and will facilitate the elucidation of the underlying functional mechanisms in living organisms.

Background

The complete sequencing of the genomes of many organisms has led to the launch of various omics studies. In one study, the advent of deoxyribonucleic acid (DNA) microarray technology has enabled the monitoring of the expression levels of numerous genes at a time, under many different growth conditions. This technique is now widely used in diverse types of biological research, such as identifying disease markers, reconstructing cellular signaling pathways, and inferring gene regulatory networks. DNA microarray technology has also provided numerous biological insights.13 Data generated from even a few array measurements are quite complex, and the amounts of microarray data available in public databases are dramatically increasing, due to the efficiency and rapid improvement of DNA microarray technologies. As a result, the interpretation of DNA microarray data obtained under a large number of conditions has become a challenging problem.

In the analyses of a large dataset, as the first step, researchers usually search for similar patterns appearing within the data. In the case of DNA microarray data, similar patterns of gene expression data are often investigated by using cluster analyses, such as K-means clustering4 and hierarchical clustering.5 Although clustering can provide considerable biological information, conventional clustering algorithms may not be suitable for some analyses of microarray data for the following two reasons. Firstly, there are many genes that encode proteins involved in several functional activities at a time, but the conventional clustering methods cannot identify these genes, because they only allow a gene to belong to one cluster at a time, instead of multiple clusters. Secondly, it is difficult to find the genes that are co-expressed under a few specific conditions but are differently expressed under other conditions because the similarity of the genes in conventional clustering is determined by the entire expression data.6,7

In terms of the above shortcomings, biclustering is more effective than conventional clustering, since it can cluster both genes and conditions simultaneously, and a gene (or a condition) can be involved in multiple clusters at a time.7 The concept of biclustering was first proposed by Hartigan,8 and Cheng and Church9 applied it to search for the most homogeneously expressed genes over certain sets of conditions by using greedy search algorithms.9 Most biclustering algorithms have been implemented with greedy search algorithms,1,10,11 to reduce the calculation costs. One such bicluster, a maximum bicluster, is known as a nondeterministic polynomial time (NP)-complete problem that can possibly be solved in polynomial time using a nondeterministic Turing machine,12 and a greedy search algorithm is required for actual applications to provide efficient approximations. Usually, one greedy search results in one bicluster, and the greedy search approach is repeatedly applied to the data, while preventing the reproduction of similar biclusters. The greedy search then tries to obtain a set of various biclusters as the final output.

Biclustering has also been implemented by using a genetic algorithm (GA) to find a practical solution to balance bicluster quality and calculation cost. A GA emulates an evolutionary processes to obtain nearly optimal solutions.13 Initially, a set of candidate solutions is prepared; each solution being called a chromosome. The chromosomes evolve by exchanging their parts and changing some elements into a different state, and elite chromosomes are selected to survive as the parents of the next generation. This evolution and selection process is repeated over a number of generations to yield an optimal solution.13 Bleuler et al14 first applied GA to biclustering, whereby a binary string (representing a gene or a condition belonging to a bicluster, or not) was employed as a representation of chromosomes. To avoid any redundancy of the resulting biclusters, Bleuler et al introduced a special selection operator called environment selection. Chakraborty and Maka15 have generated a similar GA-based biclustering, but different in terms of chromosome initialization. Initial chromosomes are prepared by K-means clustering. These methods find an optimum set of biclusters from one GA search. For such methods, it would be difficult to obtain a set of various, nonredundant biclusters, because only better chromosomes can survive by the selection process of GA, and thus the resulting biclusters tend to converge into similar results in the later generations.14,15 Another type of GA-based biclustering, Sequential Evolutionary Biclustering (SEBI), has a distinct strategy. SEBI initially applies GA to select the optimal bicluster, and then this process is repeated so that the genes and the conditions in the biclusters already selected are less likely to be selected again. In other words, although SEBI would generate a set of diverse biclusters, it de-empathizes the overlap of biclusters, a significant feature of biclustering.16

In the present study, we propose BIGA as the basis of a novel biclustering approach. In BIGA, an attempt is made to progressively divide the large amounts of input data into small datasets, by iteratively using GA, such as SEBI. Instead of evaluating a set of biclusters, GA is applied to each division process. Therefore, the resulting biclusters are substantially diverse. In addition, BIGA introduces the overlap state explicitly defined in the ternary digit (or trit) encoding chromosome. In this study, the algorithm is described, the performance of BIGA is compared with those of six existing biclustering algorithms, and the biological relevance of BIGA is evaluated by using gene ontology (GO) enrichment analyses. Finally, we conclude that BIGA is a powerful and practical solution for biclustering with high-dimensional data.

Material and methods

Definition of biclusters

BIGA accepts a set of gene expression data with the matrix form D = (G, C), including N rows of genes G = {g1, g2, …, gN} and M columns of conditions or samples C = {c1, c2, …, cM}, where N and M are the total numbers of genes and conditions, respectively. All genes will be clustered into K overlapping biclusters B = {B1, B2, …, BK}, and each bicluster (Bi) corresponds to a submatrix Bi = (X, Y) of D, where X [subset, dbl equals] G and Y [subset, dbl equals] C. The sizes of X and Y, ie, the numbers of genes and the conditions of a bicluster, are denoted by n and m, in which nN and mM, respectively.

Binary-iterative genetic algorithm

In order to decompose D into B systematically, a binary tree was introduced. Generally, a binary tree comprises nodes and directed edges, in which each node can be extended to at most two child nodes.17 In this work, we regarded each bicluster and each edge as a node and a parent–child relationship between a bicluster pair, respectively. We designated the method as BIGA.

BIGA consists of the following three steps. A schematic diagram of BIGA is shown in (Figure 1).

Figure 1
Schematic diagram of binary-iterative genetic algorithm. (A) Decomposition of a parent bicluster into two child biclusters encoded in a string (left panel). The string indicates that a parent bicluster (middle panel) is divided into two child biclusters ...

Step 1: A division of microarray data is represented by a string, a sequence of trit (0, 1, 2) with the length of n (number of genes in the parent bicluster) +m (number of conditions in the parent bicluster). The trit 0, 1, and 2 means that an associated gene or condition is contained in either of two biclusters, bleft or bright, or both, respectively. This means that one string can encode the division of one bicluster into two biclusters, while allowing overlap. An example of this encoding is shown in (Figure 1A). The “|” symbol serves as a spacer of the genes and conditions for clarity. The string is equivalent to the division illustrated by the matrix (microarray data, or a bicluster) in the middle of (Figure 1A). In the matrix, the rows and the columns correspond to the genes and the conditions, respectively. The cell of the matrix belongs to either bleft (blue cell), bright (red), or both (violet), under the decoding rule shown in (Figure 1B). The white cells are ignored because they are not coexpressed with color cells. Consequently, the bicluster shown in the middle of (Figure 1A) represents the division into two biclusters on the right of (Figure 1A).

Step 2: To search for the best chromosome (the best trit string) representing the optimal division of a bicluster, GA is performed (rectangles in Figure 1C). In the GA procedure, a mutation and a crossover are introduced into each chromosome. Each number on a chromosome is altered to 0, 1, or 2, for the mutation; whereas two chromosomes exchange corresponding parts with each other in the crossover. Chromosomes with higher fitness scores (described in the following section) survive in the next generation, and all other chromosomes are discarded. GA was implemented via Java Genetic Algorithm Product,18 with a mutation rate of 0.01 and a crossover rate of 0.5. Finally, the best chromosome after 100 generations of GA (the underlined string in the rectangle) is selected, based on the fitness score (see the next section). The best chromosome is then decoded into two biclusters (bleft and bright). We decide whether to continue with further decompositions after the evaluation of the biclusters, as follows.

Step 3: Evaluation of biclusters. For each child bicluster, the numbers of genes and conditions, the average Pearson’s correlation coefficient (PCC), and the parent–child redundancy are examined to decide whether we should quit or continue the decomposition. Subsequently, the bicluster is either accepted as an element of the final biclusters, B, or discarded. We calculate the PCC of every gene pair in a bicluster, and average them (the average PCC). The parent–child redundancy is defined as the ratio of the number of genes of the child bicluster (n′) to that of the parent bicluster (n). Therefore, a small parent–child redundancy indicates that the child bicluster contains a smaller number of genes than the parent, and a large parent–child redundancy means that the number of genes in the child bicluster is almost the same as that of the parent. The average PCC and the parent–child redundancy are abbreviated as C and R, respectively. The decision process is illustrated in (Figure 1D). Briefly, the process employs four rules: (I) we quit the decomposition and accept the bicluster if C is higher than the threshold τc. (II) we quit the decomposition and discard the bicluster if the bicluster is “small,” which is judged by the thresholds τn and τm for n′ and m′, respectively. (III) we also quit the decomposition and discard the bicluster if the redundancy, R, is small (R < τr) or large (R > 1 − τr). The latter rule was employed to reduce the calculation cost, because a child bicluster that is similar to its parent bicluster and has a low C is not considered to produce promising results. Using the forth rule: (IV) we continue the decomposition. Four thresholds, τn, τm, τc, and τr, were empirically determined as 30, 10, 0.65, and 0.15, respectively (see Table S1). The Greek symbols in (Figure 1D) indicate the rule applied in each decision. In (Figure 1C), the accepted and discarded biclusters are marked by + and – symbols. The bicluster to be decomposed is marked by a * symbol. Figure 1C indicates that four biclusters are accepted.

Fitness function

In general, large biclusters including co-expressed genes across many specific conditions are preferable. The average PCC of a bicluster was employed to evaluate the gene co-expression. Furthermore, the relative area A of the bicluster, defined by (n′/n)α (m′/m)β, using the gene and condition numbers of the parent and child biclusters was used to evaluate the size of a bicluster. Two parameters were introduced for gene-weight (α) and condition-weight (β), to control the balance between the number of genes and that of the conditions (0 < α, β < 1) in a relative area, A. The fitness function of a chromosome was defined as follows (Equation 1):

equation mm1
(1)

where c, bi (i = left or right), A(b), and C(b) denote a chromosome, one of the child biclusters, the relative area of child bicluster b, and the average PCC of child bicluster b, respectively.

The balance between α and β was important in order to select biologically meaningful biclusters when using f(c). Since a high average PCC for a large number of genes was obtained rather easily when only a small number of conditions were considered, a certain number of conditions should be required for each bicluster, to ensure the biological significance. The variation of α and β was empirically estimated, and finally 0.3 and 0.5 were chosen, respectively (see the results in Table S1).

Assessment procedure

Six existing methods were compared to evaluate the performance of BIGA: Cheng and Church algorithm,9 Statistical-Algorithmic Method for Bicluster Analysis (SAMBA),19,20 order-preserving submatrix (OPSM),1 iterative signature algorithm (ISA),11 binary inclusion-maximal biclustering algorithm (BIMAX),21 and SEBI.16 SEBI is selected as a representative of the GA-based biclustering approaches,15,16 because SEBI adopts an outstanding system to reduce the redundancy of biclusters and performs iterative evolutionary searches like BIGA. The five other methods are based on greedy searches. Data provided by Gasch et al22 was used for the analyses of Saccharomyces cerevisiae. The analyses contained 2993 genes and 173 stress conditions, as a result the data size was large and abundant annotations were available. Prelic et al21 used this dataset to evaluate algorithms, and the resultant sets of biclusters for the five greedy-search algorithms are publicly available. These bicluster sets were obtained for comparison with our results. Neither the results of SEBI for the data nor SEBI itself is publicly available. The framework of SEBI was re-implemented in a second experiment.16 Note that there might be some minor differences between SEBI and the re-implemented SEBI. Henceforth, we denote mySEBI as our implementation.

The sets of biclusters were evaluated in terms of the following four points. Since PCC is a widely used parameter to assess the similarity of expression patterns, the distribution of the average PCC of all biclusters was examined. One may consider the mean square residual (MSR) of biclusters9 to be useful as an indicator of the coherence of biclusters, but PCC is better than MSR in terms of finding the functional relevance of genes,2326 in much biological data, for example, the involvement of the same pathway or the participation in the same protein complex.27,28 The existing methods do not necessarily optimize the correlation of biclusters, and some biclusters derived from other algorithms can contain biclusters showing strong anti-correlation (ie, genes expressed inversely). The absolute value of PCC was used to estimate such biclusters for comparisons.

Coverage and overlap are also important measures to evaluate the biclustering, as higher coverage and lower overlap are preferable for further biological analyses. Previous studies29 used “cell coverage,” by calculating the percentages of area (genes × conditions) covered by the biclusters, and “cell overlap” by measuring the intersection areas of the biclusters. In this study, “gene coverage” and “gene overlap,” were adopted because higher cell coverage can be achieved even by a high coverage of conditions and a low coverage of genes, and this result is not biologically significant. In addition, cell overlap ignores the overlap of genes shared in any two biclusters, if the conditions in the biclusters are completely different. Gene coverage is defined as the ratio of genes that are assigned to any biclusters to all genes, and gene overlap is the ratio of total genes overlapping on multiple biclusters to the genes assigned to any biclusters (Equation 2):

equation mm2
(2)

Gene coverage can evaluate the ability of an algorithm to decide the cluster for each gene, and gene overlap can measure the ability of an algorithm to specify the clusters for genes that are not necessarily involved in multiple biological processes.

The biological significance of the results by measuring the GO enrichment was also evaluated. More precisely, FuncAssociate (2.0; Roth Laboratories, Harvard University, Boston, MA), a tool for finding overrepresented GO terms in a set of genes was utilised. Using this tool, we performed Fisher’s exact test to determine the probability of the appearance of genes associated with a GO term in each bicluster.30 FuncAssociate calculates an adjusted P-value (Padj) from the simulations, instead of the corrections of multiple tests. Padj is the probability of obtaining at least one false positive for any desired cutoff. We considered a biologically significant bicluster as one that is relevant to at least one GO term with a statistically significant appearance (namely, Padj less than significance level). The number of such biclusters, relative to the total number of biclusters (the GO enrichment), was used to estimate each algorithm. A previous study by Prelic et al21 evaluated the biological relevance of existing algorithms, using the GO enrichment.

Results and discussion

Biclusters for the Saccharomyces cerevisiae microarray data

With the selected parameters and thresholds, BIGA found 164 biclusters from the S. cerevisiae microarray data. The average numbers of genes and conditions in the biclusters are 92.25 and 23.65, respectively (Table 1). The detailed statistics of each bicluster are provided in Table S2. The properties of the biclusters obtained by other methods are also summarized in Table 1.

Table 1
Comparing quantitative metrics among biclustering algorithms

Performance evaluation

The distribution of the average PCCs of the biclusters obtained by each biclustering algorithm is shown in the boxplot (Figure 2A). The thick line around the middle of the box indicates the median of the average PCCs. The top and bottom of the box indicate the upper and the lower quartiles, respectively. The circles show the outliers (more than 1.5 times the upper quartile or less than 1.5 times the lower quartile from the median). The whiskers mean the range of data between the maximum and the minimum values, other than the outliers. According to the plots, OPSM performs the best with a very small deviation in the average PCCs. Apart from OPSM, BIGA can outperform the other methods when compared by the median of the average PCC. One may consider that the fitness function of BIGA takes the average PCC into account (Equation 1), and thus it is obvious that the average PCC of BIGA is good. However, note that the results are not necessarily satisfactory if the optimization procedure does not work well, or the balance between the average PCC and the area of the bicluster in (Equation 1) is inappropriate. Next, using the the Wilcoxon signed-rank test the study examined whether the distribution of the average PCCs of BIGA is significantly better than those of the other algorithms.31 The results showed that BIGA detects significantly more co-expressed genes in biclusters than the other methods, except for OPSM (the highest P-value is only 5.4 × 10−6 against SAMBA). To clarify the performance, the expression profiles of the four best biclusters with higher average PCCs are demonstrated in Figure S1. Note: the reason for the highest performance of OPSM was related to the gene coverage and these analyses will be discussed later.

Figure 2
(A) Distribution of the average Pearson correlation coefficients for each biclustering algorithm, represented by a boxplot. (B) Histogram of gene coverage for each biclustering algorithm. The y-axis represents the coverage ratio between the union of genes ...

The gene coverage and the gene overlap are shown in (Figure 2B and 2C), respectively. As a result, BIGA achieved the fourth-highest gene coverage among the seven algorithms (Figure 2B). SAMBA could classify almost 100% of the genes into biclusters, but each bicluster contained more than 900 genes (Table 1) with extremely high overlap (Figure 2C), which will make the succeeding experimental or bioinformatics analyses difficult. mySEBI could produce a set of biclusters that would include 95% of all genes with a small amount of overlap. CC showed the best gene coverage (highest) and overlap (lowest). The results indicate that the techniques to reduce redundancy of biclusters in SEBI and CC are efficient for gaining high coverage and low overlap. However, the average PCCs of the biclusters by both algorithms were very low (Figure 2A). OPSM produced biclusters with the highest correlation (Figure 2A), but failed to achieve higher gene coverage due to the small number of clusters (Table 1). The average PCCs of OPSM and BIGA are high, because both methods adopt gene co-expression in the target function. By contrast, CC and SEBI adopt MSR instead of PCC. Although MSR can sometimes identify coherent biclusters, it is not necessarily efficient to achieve higher correlations of genes.

BIGA yielded the second-largest gene overlap, with 6.29 (Figure 2C), which may imply that the biclusters of BIGA are mutually similar. The pairwise overlap (PO) of two biclusters defined by XiXj/Xi [union or logical sum] Xj, where Xi and Xj are genes in biclusters Bi and Bj, respectively, was measured to examine the similarity of the biclusters more directly, and plotted in Figure 3A. The median of the POs for BIGA was not very large, as compared with those of the other methods, indicating that the biclusters determined by BIGA are not necessarily similar. Moreover, the variety of biclusters using the single-linkage clustering method, where the distance between two biclusters defined by 1.0–PO was investigated. At each cut-off distance, the number of clusters was counted and normalized by the total number of biclusters, which we call the fraction of independent biclusters. When the cut-off distance is sufficiently small, no biclusters are merged and FIB is 1.0. This state indicates that the biclusters are independent and diverse. On the other hand, when the cut-off distance is sufficiently large, most of the biclusters may be merged together, and FIB will converge to 0.0. This state means that all of the biclusters are judged as being similar to each other. We consider a higher FIB to be an indicator illustrating the variety of the resultant biclusters. According to the plot (Figure 3B), the FIBs of SAMBA and ISA are obviously low in almost the whole cut-off distance range, showing that their biclusters are rather similar. The FIBs of OPSM show that its ability to detect diverse biclusters is moderate. CC, mySEBI, BIMAX, and BIGA provided a wider variety of biclusters than the other algorithms, when the cut-off distance was less than 0.5. In summary, the average bicluster determined by BIGA contains many genes that are shared with other biclusters (Figure 2C): however, when focusing on each pair of biclusters, a small number of genes are shared (Figure 3A). Consequently, the biclusters determined by BIGA seem to be independent (Figure 3B), and cover most of the genes efficiently (Figure 2B).

Figure 3
(A) Distribution of pairwise overlap (PO) of biclusters, shown in boxplots for each algorithm. Thick lines, boxes, whiskers, and circles indicate the same things as in (Figure 2A). (B) The fraction of independent biclusters (FIB) over the cut-off distance. ...

Evaluation of biological relevance by gene ontology enrichment analyses

In the study by Prelic et al21 on the evaluation of existing methods using GO enrichment, OPSM showed the best performance (100% of the biclusters were significant at the 0.05 significance level). However, it only produced twelve biclusters (Table 1), and thus the gene coverage was the lowest (Figure 2B). Less than half of the biclusters produced by CC were judged to be significant,21 probably because CC cannot detect biclusters with a higher average PCC (Figure 2A). The percentages of significant biclusters from mySEBI are 93%, 81%, 69%, and 42% for the 0.05, 0.01, 0.005, and 0.001, respectively. By contrast, 94.5% of the biclusters produced by BIGA were judged to be significant at the 0.05 significance level. This value was changed to 88.4%, 86.0%, and 79.3% for the 0.01, 0.005, and 0.001 significance levels, respectively. The performance of BIGA is almost the same as those of BIMAX and ISA in GO enrichment,21 but BIGA outperforms them in the gene coverage (Figure 2B).

There was a functional relationship between the resultant biclusters by BIGA, based on the enriched GO terms at the 0.001 significance level. Among the 122 GO-enriched terms, ribosome-related terms (ribosome GO:0005840, ribosomal subunit GO:0033279, etc) are abundant in many biclusters (50 biclusters). This observation was consistent with the fact that 60% of transcription was devoted to ribosomal ribonucleic acid (RNA),32 because genes with higher expression levels tend to be clustered. Apart from the ribosome-related terms, primary metabolic (GO:0044238), translation (GO:0006412), protein-related (GO:0044267, GO:0019538), macromolecule-related (GO:0009059, GO:0034645, GO:0044260, GO:0043170), and biopolymer-related (GO:0043283, GO:0034960, GO:0043284, GO:0034961) processes also frequently appeared in several biclusters. This indicated that the genes involved in these terms are primary or essential in many biological processes. Five GO terms that are most enriched at the 0.001 significance level for each bicluster five specific GO terms among them are shown in Table S2.

Furthermore, the novel aspects of the biclusters identified by BIGA were examined. For each bicluster defined by BIGA, the PO against all biclusters identified by the other five methods was measured and the maximum PO was derived (Table S2). The highest value of the maximum POs was at most 0.12, indicating that the biclusters defined by BIGA are quite different from those determined by the other methods. To explore the relationships of the genes that were detected only by BIGA, on the study examined the biclusters of BIGA that were not similar to any of the other biclusters; that is, the biclusters with maximum pair-wise similarity scores < 0.05. In bicluster 109 (the maximum PO = 0.039 with bicluster 29 of CC), 16 out of 86 genes are involved in a cellular nitrogen metabolic process (GO:0034641), eg, SAS3 (YBL052C), TEF2 (YBR118W), and SWD3 (YBR175W), are co-expressed under twelve conditions. In bicluster 118 (0.037 with bicluster 56 of CC), 26 out of 66 genes, eg, RRN6 (YBL014C), ORC2 (YBR060C), and PAF1 (YBR279W), are involved in an RNA metabolic process (GO:0016070). In bicluster 160 (0.037, bicluster 24 of ISA), 33 out of 74 genes, such as HEK2 (YBL032W), ROX3 (YBL093C), and SIF2 (YBR103W), are related to a nucleic acid metabolic process (GO:0090304). These results demonstrate that BIGA is useful to reveal the functional relevance underlying the biclusters. Furthermore, some genes belonged to the same bicluster, even though they lacked known co-functional evidence (see the biclusters in Table S2 without significant GO terms). These genes represent promising experimental targets that bridge biological processes exhibiting co-expression under specific conditions.

Conclusion

The development of biclustering algorithms has allowed biologists to start unraveling the underlying functional mechanisms in living organisms. We propose BIGA as an alternative biclustering technique, since it was designed to address the conventional problems of the pre-existing methods. Biclustering is obviously advantageous in accounting for the overlap state among clusters, but the suitable amount of overlap is still ambiguous and different algorithms often produce solutions with various degrees of overlap. We tried to develop a novel chromosome-encoding mode that explicitly defines the overlap between biclusters. BIGA revealed that the most frequently appearing genes express their functions in fundamental and essential biological processes, such as translation. A microarray often consists of relatively few conditions, with respect to a large number of genes. The weighting of genes and conditions diminishes the bias between the number of genes and conditions, which helps to eliminate unreliable results, such as biclusters with very few conditions. We also applied an alternative index, the average PCC, which impacts the biological meaning, rather than the MSR, to measure the goodness of a bicluster. The analysis of GO enrichment demonstrated that most of our biclusters were significant, with one or more enriched GO terms. When evaluated with the five pre-existing algorithms, BIGA performed well in most of the properties with good balance, although it did not show the best performance for all criteria. A pair-wise comparison of our biclusters with those obtained by the other algorithms revealed the novel aspects of the biclusters that are distinct from those of the other methods. Since biological systems are quite complicated, resulting in high-dimensional data, it is quite difficult to answer all biological questions with a single approach. For new discoveries, we recommend the application of several approaches, including BIGA.

Supplementary data

Table S1

Parameter determination

Goodness of biclusters

GenesConditionsCorrelationBiclustersCoverageOverlap
α
0.172.1522.840.741110.593.53
0.392.2523.650.711640.696.29
0.5102.2224.420.72520.6711.82
τr
0.181.2221.510.733550.7411.97
0.1592.2523.650.711640.696.29
0.2109.8625.070.69570.582.59
0.25128.1332.50.7180.220.53
0.3163450.6710.050
τc
0.60100.6222.170.691450.715.9
0.6592.2523.650.711640.696.29
0.7083.8422.690.741780.617.09

Notes: (A) Impact of gene-weight parameter on the goodness of biclusters (τn = 30, τm = 10, τc = 0.65, τr = 0.15 and β = 0.5). (B) Impact of redundant threshold on the goodness of biclusters (τn = 30, τm = 10, τc = 0.65, and α = 0.3, β = 0.5). (C) Impact of correlation threshold on the goodness of biclusters (τn = 30, τm = 10, τc = 0.15, and α = 0.3, β = 0.5).

Figure S1

Expression profiles of biclusters 1 (A), 2 (B), 3 (C), and 4 (D), in the descending order of the average Pearson’s correlation coefficient.

Note: The x-axis represents the series of conditions; eg, the number 8 denotes the 8th condition.

Table S2

Detailed statistics of resulting biclusters (sorted by descending order of average PCC)

Bicluster IDNumber of genesNumber of conditionsAverage PCCThe minimum adjusted P-value of GO enrichmentNumber of enriched GO termsFive most significant GO termsFive most specific GO termsHighest pairwise simirarity score
147100.87<0.0012GO:0003674 molecular_function
GO:0032991 macromolecular complex
0.044
274280.81<0.0013GO:0003674 molecular_function
GO:0032991 macromolecular complex
GO:0043234 protein complex
0.067
385210.80<0.00114GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0022618 ribonucleoprotein complex assembly
GO:0007114 cell budding
GO:0022618 ribonucleoprotein complex assembly
GO:0032505 reproduction of a single-celled organism
GO:0042257 ribosomal subunit assembly
GO:0043933 macromolecular complex subunit organization
0.070
471320.8012GO:0030529 ribonucleoprotein complex
GO:0032991 macromolecular complex
GO:0005840 ribosome
GO:0044445 cytosolic part
GO:0006412 translation
GO:0022625 cytosolic large ribosomal subunit0.093
574180.800.0011GO:0005737 cytoplasmGO:0005737 cytoplasm0.050
65070.8000.043
779240.80<0.0018GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0005840 ribosome
GO:0009072 aromatic amino acid family metabolic process0.073
852160.7900.032
95640.7900.041
1087210.79<0.0015GO:0003674 molecular_function
GO:0006412 translation
GO:0009987 cellular process
GO:0009058 biosynthetic process
GO:0044249 cellular biosynthetic process
GO:0044249 cellular biosynthetic process
GO:0009058 biosynthetic process
0.068
1172200.79<0.0015GO:0032991 macromolecular complex
GO:0003674 molecular_function
GO:0009987 cellular process
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
0.060
1278260.79<0.0016GO:0032040 small-subunit processome
GO:0030686 90S preribosome
GO:0042254 ribosome biogenesis
GO:0030684 preribosome
GO:0022613 ribonucleoprotein complex biogenesis
GO:0032040 small-subunit processome
GO:0022613 ribonucleoprotein complex biogenesis
GO:0042254 ribosome biogenesis
GO:0030684 preribosome
GO:0030686 90S preribosome
0.074
1374140.79<0.0011GO:0003674 molecular_function0.048
1483330.78<0.00119GO:0044445 cytosolic part
GO:0006412 translation
GO:0022625 cytosolic large ribosomal subunit
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0015934 large ribosomal subunit
GO:0022625 cytosolic large ribosomal subunit
GO:0044249 cellular biosynthetic process
GO:0009058 biosynthetic process
0.080
1586230.78<0.0012GO:0003674 molecular_function
GO:0032991 macromolecular complex
0.056
1649180.78<0.00110GO:0044238 primary metabolic process
GO:0016070 RNA metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0043283 biopolymer metabolic process
GO:0030529 ribonucleoprotein complex
GO:0008152 metabolic process
GO:0016070 RNA metabolic process
GO:0034960 cellular biopolymer metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0044237 cellular metabolic process
0.059
1792230.78<0.00112GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0034621 cellular macromolecular complex subunit organization
GO:0034621 cellular macromolecular complex subunit organization
GO:0034660 ncRNA metabolic process
GO:0006139 “nucleobase, nucleoside, nucleotide and nucleic acid metabolic process”
GO:0016070 RNA metabolic process
GO:0044237 cellular metabolic process
0.072
1877250.78<0.0014GO:0003674 molecular_function
GO:0044445 cytosolic part
GO:0009987 cellular process
GO:0032991 macromolecular complex
0.050
1977210.78<0.0015GO:0003674 molecular_function
GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0030529 ribonucleoprotein complex
GO:0015935 small ribosomal subunit
GO:0015935 small ribosomal subunit0.062
2059120.78<0.0011GO:0044238 primary metabolic process0.046
2184300.77<0.00110GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0005840 ribosome
GO:0005737 cytoplasm0.073
2253110.770.0011GO:0044238 primary metabolic process0.058
2381280.77<0.00111GO:0032991 macromolecular complex
GO:0043283 biopolymer metabolic process
GO:0034960 cellular biopolymer metabolic process
GO:0043234 protein complex
GO:0043170 macromolecule metabolic process
GO:0051246 regulation of protein metabolic process
GO:0034960 cellular biopolymer metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0032268 regulation of cellular protein metabolic process
GO:0043234 protein complex
0.059
2461210.7700.039
2582130.77<0.0011GO:0003674 molecular_function0.045
26103240.76<0.0019GO:0044238 primary metabolic process
GO:0003674 molecular_function
GO:0009987 cellular process
GO:0005840 ribosome
GO:0003735 structural constituent of ribosome
GO:0045182 translation regulator activity
GO:0003743 translation initiation factor activity
GO:0045182 translation regulator activity
GO:0008135 “translation factor activity, nucleic acid binding”
GO:0032268 regulation of cellular protein metabolic process
GO:0043234 protein complex
0.077
2793270.76<0.00119GO:0044238 primary metabolic process
GO:0003735 structural constituent of ribosome
GO:0009987 cellular process
GO:0005840 ribosome
GO:0003735 structural constituent of ribosome
GO:0015935 small ribosomal subunit
GO:0008152 metabolic process
GO:0043229 intracellular organelle
GO:0043226 organelle
GO:0022627 cytosolic small ribosomal subunit
0.098
2865110.76<0.0011GO:0003674 molecular_function0.045
2978320.76<0.0012GO:0003674 molecular_function
GO:0032991 macromolecular complex
0.077
3062190.76<0.0016GO:0009058 biosynthetic process
GO:0044249 cellular biosynthetic process
GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0044249 cellular biosynthetic process
GO:0009058 biosynthetic process
0.056
3189190.76<0.00112GO:0009058 biosynthetic process
GO:0044249 cellular biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0009059 macromolecule biosynthetic process
GO:0044238 primary metabolic process
GO:0006139 “nucleobase, nucleoside, nucleotide and nucleic acid metabolic process”
GO:0034961 cellular biopolymer biosynthetic process
GO:0034645 cellular macromolecule biosynthetic process
GO:0016070 RNA metabolic process
GO:0009059 macromolecule biosynthetic process
0.063
3291300.76<0.00110GO:0017111 nucleoside-triphosphatase activity
GO:0016462 pyrophosphatase activity
GO:0016817 “hydrolase activity, acting on acid anhydrides”
GO:0016818 “hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides”
GO:0044238 primary metabolic process
GO:0017111 nucleoside-triphosphatase activity
GO:0016462 pyrophosphatase activity
GO:0016817 “hydrolase activity, acting on acid anhydrides”
GO:0016818 “hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides”
GO:0034470 ncRNA processing
0.081
33105340.76<0.0018GO:0009058 biosynthetic process
GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0006412 translation
GO:0044445 cytosolic part
GO:0009058 biosynthetic process0.098
34105280.75<0.00116GO:0032991 macromolecular complex
GO:0044267 cellular protein metabolic process
GO:0006412 translation
GO:0009987 cellular process
GO:0043234 protein complex
GO:0044444 cytoplasmic part
GO:0044424 intracellular part
GO:0043234 protein complex
GO:0009058 biosynthetic process
0.088
35110250.75<0.00129GO:0032991 macromolecular complex
GO:0016070 RNA metabolic process
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0005198 structural molecule activity
GO:0019438 aromatic compound biosynthetic process
GO:0006396 RNA processing
GO:0034470 ncRNA processing
GO:0034660 ncRNA metabolic process
GO:0006139 “nucleobase, nucleoside, nucleotide and nucleic acid metabolic process”
0.085
3666160.75<0.0018GO:0032991 macromolecular complex
GO:0003735 structural constituent of ribosome
GO:0033279 ribosomal subunit
GO:0005198 structural molecule activity
GO:0006412 translation
GO:0022627 cytosolic small ribosomal subunit0.069
3771100.750.0011GO:0044085 cellular component biogenesisGO:0044085 cellular component biogenesis0.068
3859140.74<0.0013GO:0003674 molecular_function
GO:0005198 structural molecule activity
GO:0032991 macromolecular complex
0.040
3958160.74<0.00113GO:0044249 cellular biosynthetic process
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0009058 biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0000462 “maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA)”
GO:0030490 maturation of SSU-rRNA
GO:0034961 cellular biopolymer biosynthetic process
GO:0034645 cellular macromolecule biosynthetic process
GO:0022627 cytosolic small ribosomal subunit
0.048
4083360.74<0.0018GO:0044445 cytosolic part
GO:0006412 translation
GO:0043229 intracellular organelle
GO:0043226 organelle
GO:0043228 nonmembrane-bounded organelle
GO:0043229 intracellular organelle
GO:0043226 organelle
0.076
4178230.74<0.0015GO:0032991 macromolecular complex
GO:0043234 protein complex
GO:0003674 molecular_function
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0043234 protein complex0.069
42113260.74<0.00123GO:0044445 cytosolic part
GO:0030529 ribonucleoprotein complex
GO:0005198 structural molecule activity
GO:0033279 ribosomal subunit
GO:0006412 translation
GO:0006913 nucleocytoplasmic transport
GO:0051169 nuclear transport
GO:0005622 intracellular
GO:0005737 cytoplasm
GO:0010608 posttranscriptional regulation of gene expression
0.080
4390220.74<0.00118GO:0032991 macromolecular complex
GO:0022627 cytosolic small ribosomal subunit
GO:0030684 preribosome
GO:0030686 90S preribosome
GO:0030529 ribonucleoprotein complex
GO:0044249 cellular biosynthetic process
GO:0009058 biosynthetic process
0.081
4489250.74<0.0016GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0034621 cellular macromolecular complex subunit organization
GO:0016070 RNA metabolic process
GO:0034621 cellular macromolecular complex subunit organization
GO:0016070 RNA metabolic process
0.061
4592280.74<0.0018GO:0019538 protein metabolic process
GO:0044267 cellular protein metabolic process
GO:0032268 regulation of cellular protein metabolic process
GO:0005737 cytoplasm
GO:0051246 regulation of protein metabolic process
GO:0005737 cytoplasm
GO:0010608 posttranscriptional regulation of gene expression
GO:0051246 regulation of protein metabolic process
GO:0006417 regulation of translation
GO:0032268 regulation of cellular protein metabolic process
0.057
46106280.74<0.00112GO:0009987 cellular process
GO:0006412 translation
GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0044238 primary metabolic process
GO:0022627 cytosolic small ribosomal subunit0.089
47106360.74<0.00114GO:0030529 ribonucleoprotein complex
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0005840 ribosome
GO:0032991 macromolecular complex
GO:0016462 pyrophosphatase activity
GO:0016817 “hydrolase activity, acting on acid anhydrides”
GO:0016818 “hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides”
0.100
48109250.74<0.00123GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0044445 cytosolic part
GO:0009987 cellular process
GO:0005840 ribosome
GO:0005622 intracellular
GO:0022625 cytosolic large ribosomal subunit
GO:0010608 posttranscriptional regulation of gene expression
GO:0051246 regulation of protein metabolic process
GO:0006417 regulation of translation
0.083
4999270.74<0.00124GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0005840 ribosome
GO:0005198 structural molecule activity
GO:0006412 translation
GO:0034961 cellular biopolymer biosynthetic process
GO:0034645 cellular macromolecule biosynthetic process
GO:0022627 cytosolic small ribosomal subunit
GO:0034960 cellular biopolymer metabolic process
GO:0009059 macromolecule biosynthetic process
0.082
5089240.73<0.00110GO:0030529 ribonucleoprotein complex
GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0005840 ribosome
GO:0043228 nonmembrane-bounded organelle
GO:0005488 binding0.074
5186150.73<0.0013GO:0003674 molecular_function
GO:0009987 cellular process
GO:0000166 nucleotide binding
GO:0000166 nucleotide binding0.065
52141350.73<0.00118GO:0006412 translation
GO:0032991 macromolecular complex
GO:0009058 biosynthetic process
GO:0009987 cellular process
GO:0044249 cellular biosynthetic process
GO:0006082 organic acid metabolic process
GO:0019752 carboxylic acid metabolic process
GO:0005737 cytoplasm
GO:0009059 macromolecule biosynthetic process
GO:0043284 biopolymer biosynthetic process
0.119
53107310.73<0.00120GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0005198 structural molecule activity
GO:0007010 cytoskeleton organization
GO:0015935 small ribosomal subunit
GO:0022627 cytosolic small ribosomal subunit
GO:0006417 regulation of translation
GO:0032268 regulation of cellular protein metabolic process
0.062
5468240.730.0016GO:0009987 cellular process
GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0043229 intracellular organelle
GO:0043226 organelle
GO:0043229 intracellular organelle
GO:0043226 organelle
0.045
55128260.73<0.00121GO:0032991 macromolecular complex
GO:0006412 translation
GO:0044267 cellular protein metabolic process
GO:0019538 protein metabolic process
GO:0044238 primary metabolic process
GO:0016043 cellular component organization
GO:0065007 biological regulation
GO:0050789 regulation of biological process
GO:0050794 regulation of cellular process
GO:0009059 macromolecule biosynthetic process
0.089
56101320.73<0.00115GO:0032991 macromolecular complex
GO:0030529 ribonucleoprotein complex
GO:0044445 cytosolic part
GO:0009987 cellular process
GO:0005840 ribosome
GO:0022625 cytosolic large ribosomal subunit
GO:0044424 intracellular part
0.099
57107320.73<0.00111GO:0032991 macromolecular complex
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0043170 macromolecule metabolic process0.091
58111330.72<0.00111GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0019538 protein metabolic process
GO:0006412 translation
GO:0043228 nonmembrane-bounded organelle
GO:0043234 protein complex0.099
5992270.72<0.00111GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0032268 regulation of cellular protein metabolic process
GO:0044445 cytosolic part
GO:0010608 posttranscriptional regulation of gene expression
GO:0016070 RNA metabolic process
GO:0051246 regulation of protein metabolic process
GO:0006417 regulation of translation
GO:0044424 intracellular part
0.106
60111330.72<0.0017GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0044445 cytosolic part
GO:0044238 primary metabolic process
GO:0006412 translation
0.078
6176150.72<0.0012GO:0003674 molecular_function
GO:0009987 cellular process
0.050
6294200.72<0.0016GO:0032991 macromolecular complex
GO:0032268 regulation of cellular protein metabolic process
GO:0044238 primary metabolic process
GO:0051246 regulation of protein metabolic process
GO:0009987 cellular process
GO:0051246 regulation of protein metabolic process
GO:0032268 regulation of cellular protein metabolic process
0.057
6383240.72<0.00113GO:0022627 cytosolic small ribosomal subunit
GO:0032991 macromolecular complex
GO:0015935 small ribosomal subunit
GO:0044445 cytosolic part
GO:0030686 90S preribosome
GO:0030686 90S preribosome
GO:0015935 small ribosomal subunit
GO:0044422 organelle part
GO:0044446 intracellular organelle part
GO:0022627 cytosolic small ribosomal subunit
0.083
64126280.72<0.00139GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0044238 primary metabolic process
GO:0005840 ribosome
GO:0030529 ribonucleoprotein complex
GO:0015934 large ribosomal subunit
GO:0044464 cell part
GO:0034961 cellular biopolymer biosynthetic process
GO:0034645 cellular macromolecule biosynthetic process
GO:0022625 cytosolic large ribosomal subunit
0.094
6545120.7200.045
66100320.72<0.0018GO:0005198 structural molecule activity
GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0006412 translation
GO:0009987 cellular process
0.080
67124290.72<0.00115GO:0032991 macromolecular complex
GO:0043234 protein complex
GO:0009058 biosynthetic process
GO:0009987 cellular process
GO:0043284 biopolymer biosynthetic process
GO:0010608 posttranscriptional regulation of gene expression
GO:0006417 regulation of translation
GO:0009059 macromolecule biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0044424 intracellular part
0.097
68111370.72<0.0019GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0006412 translation
GO:0009987 cellular process
GO:0043228 nonmembrane-bounded organelle
0.099
6951210.7100.059
70106300.71<0.00121GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0044267 cellular protein metabolic process
GO:0019538 protein metabolic process
GO:0005198 structural molecule activity
GO:0034960 cellular biopolymer metabolic process
GO:0009059 macromolecule biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0044260 cellular macromolecule metabolic process
GO:0043234 protein complex
0.065
7146120.7100.047
72126360.71<0.00117GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0016462 pyrophosphatase activity
GO:0016817 “hydrolase activity, acting on acid anhydrides”
GO:0016818 “hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides”
GO:0017076 purine nucleotide binding
GO:0032553 ribonucleotide binding
GO:0032555 purine ribonucleotide binding
GO:0000166 nucleotide binding
GO:0017111 nucleoside-triphosphatase activity
0.101
7387250.71<0.0018GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0030529 ribonucleoprotein complex
GO:0016070 RNA metabolic process
GO:0016070 RNA metabolic process
GO:0043170 macromolecule metabolic process
0.070
74112300.71<0.00118GO:0032991 macromolecular complex
GO:0006412 translation
GO:0044238 primary metabolic process
GO:0044424 intracellular part
GO:0009058 biosynthetic process
GO:0010468 regulation of gene expression
GO:0010556 regulation of macromolecule biosynthetic process
GO:0010608 posttranscriptional regulation of gene expression
GO:0006417 regulation of translation
GO:0044424 intracellular part
0.085
75116310.71<0.00113GO:0032991 macromolecular complex
GO:0005198 structural molecule activity
GO:0044445 cytosolic part
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0005737 cytoplasm
GO:0043234 protein complex
0.093
7668140.71<0.0017GO:0022627 cytosolic small ribosomal subunit
GO:0015935 small ribosomal subunit
GO:0006412 translation
GO:0044445 cytosolic part
GO:0003735 structural constituent of ribosome
GO:0015935 small ribosomal subunit
GO:0022627 cytosolic small ribosomal subunit
0.074
7786200.71<0.0013GO:0003674 molecular_function
GO:0022627 cytosolic small ribosomal subunit
GO:0032991 macromolecular complex
GO:0022627 cytosolic small ribosomal subunit0.052
78104390.71<0.00123GO:0032991 macromolecular complex
GO:0030529 ribonucleoprotein complex
GO:0006412 translation
GO:0044238 primary metabolic process
GO:0008135 “translation factor activity, nucleic acid binding”
GO:0003743 translation initiation factor activity
GO:0045182 translation regulator activity
GO:0008135 “translation factor activity, nucleic acid binding”
GO:0016070 RNA metabolic process
GO:0034960 cellular biopolymer metabolic process
0.108
7990230.71<0.0019GO:0006412 translation
GO:0044267 cellular protein metabolic process
GO:0019538 protein metabolic process
GO:0032991 macromolecular complex
GO:0005840 ribosome
0.060
80108360.71<0.0017GO:0032991 macromolecular complex
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0005198 structural molecule activity
GO:0030529 ribonucleoprotein complex
0.078
8190240.71<0.00111GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0022625 cytosolic large ribosomal subunit
GO:0044238 primary metabolic process
GO:0043283 biopolymer metabolic process
GO:0022625 cytosolic large ribosomal subunit
GO:0043234 protein complex
GO:0043170 macromolecule metabolic process
0.067
82106330.71<0.00121GO:0044238 primary metabolic process
GO:0034960 cellular biopolymer metabolic process
GO:0009987 cellular process
GO:0043283 biopolymer metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0006139 “nucleobase, nucleoside, nucleotide and nucleic acid metabolic process”
GO:0008152 metabolic process
GO:0043229 intracellular organelle
GO:0043226 organelle
GO:0034960 cellular biopolymer metabolic process
0.084
83129310.71<0.00118GO:0032991 macromolecular complex
GO:0006412 translation
GO:0005198 structural molecule activity
GO:0005840 ribosome
GO:0044445 cytosolic part
GO:0005488 binding
GO:0005622 intracellular
GO:0022625 cytosolic large ribosomal subunit
GO:0044422 organelle part
GO:0044446 intracellular organelle part
0.091
84129280.71<0.00122GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0009058 biosynthetic process
GO:0044249 cellular biosynthetic process
GO:0006412 translation
GO:0009059 macromolecule biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0044424 intracellular part
GO:0044237 cellular metabolic process
GO:0044249 cellular biosynthetic process
0.098
8577380.71<0.00112GO:0030529 ribonucleoprotein complex
GO:0044445 cytosolic part
GO:0032991 macromolecular complex
GO:0033279 ribosomal subunit
GO:0043228 nonmembrane-bounded organelle
GO:0005622 intracellular
GO:0022625 cytosolic large ribosomal subunit
0.074
86109280.70<0.0016GO:0009987 cellular process
GO:0006412 translation
GO:0044445 cytosolic part
GO:0044238 primary metabolic process
0.090
8778210.700.0018GO:0010468 regulation of gene expression
GO:0010556 regulation of macromolecule biosynthetic process
GO:0060255 regulation of macromolecule metabolic process
GO:0031326 regulation of cellular biosynthetic process
GO:0009889 regulation of biosynthetic process
GO:0019222 regulation of metabolic process
GO:0060255 regulation of macromolecule metabolic process
GO:0009889 regulation of biosynthetic process
GO:0031323 regulation of cellular metabolic process
GO:0031326 regulation of cellular biosynthetic process
0.055
88100240.70<0.00119GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0006412 translation
GO:0005840 ribosome
GO:0003735 structural constituent of ribosome
GO:0044422 organelle part
GO:0044446 intracellular organelle part
GO:0044260 cellular macromolecule metabolic process
GO:0044237 cellular metabolic process
0.073
8982240.70<0.00113GO:0044445 cytosolic part
GO:0006417 regulation of translation
GO:0010608 posttranscriptional regulation of gene expression
GO:0032268 regulation of cellular protein metabolic process
GO:0051246 regulation of protein metabolic process
GO:0009889 regulation of biosynthetic process
GO:0031323 regulation of cellular metabolic process
GO:0031326 regulation of cellular biosynthetic process
GO:0010468 regulation of gene expression
GO:0010556 regulation of macromolecule biosynthetic process
0.060
9077270.70<0.0015GO:0003674 molecular_function
GO:0005198 structural molecule activity
GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0044445 cytosolic part
0.050
9197220.70<0.00117GO:0044445 cytosolic part
GO:0006412 translation
GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0009059 macromolecule biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0044249 cellular biosynthetic process
0.088
92110280.70<0.0016GO:0009987 cellular process
GO:0032991 macromolecular complex
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
0.090
9394290.70<0.00115GO:0032991 macromolecular complex
GO:0006417 regulation of translation
GO:0010608 posttranscriptional regulation of gene expression
GO:0032268 regulation of cellular protein metabolic process
GO:0051246 regulation of protein metabolic process
GO:0005083 small GTPase regulator activity
GO:0030695 GTPase regulator activity
GO:0005737 cytoplasm
GO:0010608 posttranscriptional regulation of gene expression
GO:0051246 regulation of protein metabolic process
0.067
94113340.70<0.00132GO:0009058 biosynthetic process
GO:0044249 cellular biosynthetic process
GO:0006412 translation
GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0019222 regulation of metabolic process
GO:0060255 regulation of macromolecule metabolic process
GO:0009889 regulation of biosynthetic process
GO:0031323 regulation of cellular metabolic process
GO:0031326 regulation of cellular biosynthetic process
0.075
9594230.70<0.0014GO:0006412 translation
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
0.062
96104310.70<0.00110GO:0006412 translation
GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0016070 RNA metabolic process
GO:0034660 ncRNA metabolic process
GO:0034470 ncRNA processing
GO:0034660 ncRNA metabolic process
GO:0016070 RNA metabolic process
0.107
9751130.70<0.0011GO:0003674 molecular_function0.043
98154320.70<0.00114GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0005840 ribosome
GO:0032991 macromolecular complex
GO:0030529 ribonucleoprotein complex
GO:0005575 cellular_component
GO:0044464 cell part
GO:0010556 regulation of macromolecule biosynthetic process
GO:0005737 cytoplasm
GO:0010608 posttranscriptional regulation of gene expression
0.115
99117300.70<0.00111GO:0005622 intracellular
GO:0009987 cellular process
GO:0044238 primary metabolic process
GO:0006412 translation
GO:0019538 protein metabolic process
GO:0005622 intracellular
GO:0022627 cytosolic small ribosomal subunit
GO:0032268 regulation of cellular protein metabolic process
0.100
100110280.70<0.0018GO:0009987 cellular process
GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0044445 cytosolic part
GO:0009058 biosynthetic process
GO:0044249 cellular biosynthetic process0.092
101139340.70<0.00116GO:0005198 structural molecule activity
GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0006412 translation
GO:0009987 cellular process
GO:0044422 organelle part
GO:0044446 intracellular organelle part
GO:0051246 regulation of protein metabolic process
GO:0006417 regulation of translation
GO:0032268 regulation of cellular protein metabolic process
0.100
10298280.69<0.00148GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0006412 translation
GO:0043284 biopolymer biosynthetic process
GO:0005840 ribosome
GO:0006333 chromatin assembly or disassembly
GO:0006446 regulation of translational initiation
GO:0003743 translation initiation factor activity
GO:0019222 regulation of metabolic process
GO:0045182 translation regulator activity
0.085
10371180.69<0.0011GO:0003674 molecular_function0.053
104105210.69<0.0015GO:0008150 biological_process
GO:0009987 cellular process
GO:0003674 molecular_function
GO:0032991 macromolecular complex
GO:0043234 protein complex
GO:0043234 protein complex0.058
105140320.69<0.00116GO:0032991 macromolecular complex
GO:0043234 protein complex
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0044445 cytosolic part
GO:0005575 cellular_component
GO:0044464 cell part
GO:0010608 posttranscriptional regulation of gene expression
GO:0043226 organelle
GO:0051246 regulation of protein metabolic process
0.098
10641120.6900.035
107101250.69<0.00124GO:0044238 primary metabolic process
GO:0005198 structural molecule activity
GO:0032991 macromolecular complex
GO:0005840 ribosome
GO:0044445 cytosolic part
GO:0034645 cellular macromolecule biosynthetic process
GO:0022625 cytosolic large ribosomal subunit
GO:0034960 cellular biopolymer metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0009059 macromolecule biosynthetic process
0.080
10899210.69<0.0019GO:0032991 macromolecular complex
GO:0019538 protein metabolic process
GO:0044267 cellular protein metabolic process
GO:0044238 primary metabolic process
GO:0006412 translation
GO:0044424 intracellular part0.080
10986120.69<0.0017GO:0044267 cellular protein metabolic process
GO:0009987 cellular process
GO:0019538 protein metabolic process
GO:0032991 macromolecular complex
GO:0043229 intracellular organelle
GO:0043229 intracellular organelle
GO:0043226 organelle
0.039
110118300.69<0.00117GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0044445 cytosolic part
GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0044422 organelle part
GO:0044446 intracellular organelle part
GO:0044424 intracellular part
0.093
11198150.69<0.0015GO:0016043 cellular component organization
GO:0009987 cellular process
GO:0006996 organelle organization
GO:0032991 macromolecular complex
GO:0008150 biological_process
GO:0006996 organelle organization
GO:0016043 cellular component organization
0.041
112157430.69<0.00138GO:0044238 primary metabolic process
GO:0030529 ribonucleoprotein complex
GO:0009987 cellular process
GO:0032991 macromolecular complex
GO:0006412 translation
GO:0015934 large ribosomal subunit
GO:0030686 90S preribosome
GO:0044464 cell part
GO:0034961 cellular biopolymer biosynthetic process
GO:0015935 small ribosomal subunit
0.108
113116340.68<0.00121GO:0009058 biosynthetic process
GO:0032991 macromolecular complex
GO:0044249 cellular biosynthetic process
GO:0006412 translation
GO:0009987 cellular process
GO:0000105 histidine biosynthetic process
GO:0006547 histidine metabolic process
GO:0009075 histidine family amino acid metabolic process
GO:0009076 histidine family amino acid biosynthetic process
GO:0009059 macromolecule biosynthetic process
0.084
11469130.680.0011GO:0009987 cellular process0.053
11596210.68<0.0015GO:0003674 molecular_function
GO:0009987 cellular process
GO:0022627 cytosolic small ribosomal subunit
GO:0044267 cellular protein metabolic process
GO:0019538 protein metabolic process
GO:0022627 cytosolic small ribosomal subunit0.050
1163890.6800.041
117109300.68<0.0019GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0009987 cellular process
GO:0006412 translation
GO:0005198 structural molecule activity
GO:0043234 protein complex0.076
11866170.680.0011GO:0009987 cellular process0.037
119104270.68<0.0015GO:0003674 molecular_function
GO:0009987 cellular process
GO:0044445 cytosolic part
GO:0032991 macromolecular complex
GO:0008150 biological_process
0.072
120122360.68<0.00138GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0033279 ribosomal subunit
GO:0008152 metabolic process
GO:0043283 biopolymer metabolic process
GO:0022613 ribonucleoprotein complex biogenesis
GO:0042254 ribosome biogenesis
GO:0044085 cellular component biogenesis
GO:0034961 cellular biopolymer biosynthetic process
GO:0015935 small ribosomal subunit
0.097
12174160.680.0018GO:0022627 cytosolic small ribosomal subunit
GO:0044445 cytosolic part
GO:0032991 macromolecular complex
GO:0043332 mating projection tip
GO:0044463 cell projection part
GO:0043332 mating projection tip
GO:0044463 cell projection part
GO:0022627 cytosolic small ribosomal subunit
0.089
122126380.68<0.00135GO:0032991 macromolecular complex
GO:0044445 cytosolic part
GO:0006412 translation
GO:0009987 cellular process
GO:0043234 protein complex
GO:0008135 “translation factor activity, nucleic acid binding”
GO:0034961 cellular biopolymer biosynthetic process
GO:0034645 cellular macromolecule biosynthetic process
GO:0043229 intracellular organelle
GO:0044422 organelle part
0.106
12383180.68<0.0013GO:0003674 molecular_function
GO:0043234 protein complex
GO:0032991 macromolecular complex
GO:0043234 protein complex0.053
124119310.67<0.0018GO:0032991 macromolecular complex
GO:0006412 translation
GO:0009987 cellular process
GO:0005488 binding
GO:0044422 organelle part
GO:0005488 binding
GO:0044422 organelle part
GO:0044446 intracellular organelle part
0.093
125133410.67<0.00127GO:0009987 cellular process
GO:0032991 macromolecular complex
GO:0033279 ribosomal subunit
GO:0044238 primary metabolic process
GO:0006412 translation
GO:0015935 small ribosomal subunit
GO:0043229 intracellular organelle
GO:0044422 organelle part
GO:0044446 intracellular organelle part
GO:0043226 organelle
0.092
126132250.67<0.00118GO:0044238 primary metabolic process
GO:0016070 RNA metabolic process
GO:0044237 cellular metabolic process
GO:0009987 cellular process
GO:0008152 metabolic process
GO:0031125 rRNA 3′-end processing
GO:0043628 ncRNA 3′-end processing
GO:0034660 ncRNA metabolic process
GO:0006139 “nucleobase, nucleoside, nucleotide and nucleic acid metabolic process”
GO:0008152 metabolic process
0.080
12757140.6700.042
12851180.67<0.0011GO:0003674 molecular_function0.044
12977250.67<0.0015GO:0009987 cellular process
GO:0043933 macromolecular complex subunit organization
GO:0034621 cellular macromolecular complex subunit organization
GO:0003674 molecular_function
GO:0034622 cellular macromolecular complex assembly
GO:0034622 cellular macromolecular complex assembly
GO:0043933 macromolecular complex subunit organization
GO:0034621 cellular macromolecular complex subunit organization
0.048
13075220.67<0.0014GO:0044238 primary metabolic process
GO:0016070 RNA metabolic process
GO:0003674 molecular_function
GO:0043283 biopolymer metabolic process
GO:0016070 RNA metabolic process0.067
131106260.67<0.0016GO:0003674 molecular_function
GO:0043229 intracellular organelle
GO:0032991 macromolecular complex
GO:0043226 organelle
GO:0044238 primary metabolic process
GO:0043229 intracellular organelle
GO:0043226 organelle
0.076
132133250.67<0.00121GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0044445 cytosolic part
GO:0005198 structural molecule activity
GO:0005488 binding
GO:0005622 intracellular
GO:0044424 intracellular part
GO:0044249 cellular biosynthetic process
0.097
133128350.67<0.00122GO:0032991 macromolecular complex
GO:0006412 translation
GO:0005198 structural molecule activity
GO:0005840 ribosome
GO:0043284 biopolymer biosynthetic process
GO:0034961 cellular biopolymer biosynthetic process
GO:0034645 cellular macromolecule biosynthetic process
GO:0009059 macromolecule biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0043234 protein complex
0.096
134107280.67<0.00119GO:0005198 structural molecule activity
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0044445 cytosolic part
GO:0005840 ribosome
GO:0005737 cytoplasm
GO:0015935 small ribosomal subunit
GO:0022627 cytosolic small ribosomal subunit
GO:0032268 regulation of cellular protein metabolic process
0.074
135109240.66<0.00117GO:0009058 biosynthetic process
GO:0044238 primary metabolic process
GO:0044249 cellular biosynthetic process
GO:0032991 macromolecular complex
GO:0043284 biopolymer biosynthetic process
GO:0003676 nucleic acid binding
GO:0006139 “nucleobase, nucleoside, nucleotide and nucleic acid metabolic process”
GO:0008152 metabolic process
GO:0006417 regulation of translation
GO:0009059 macromolecule biosynthetic process
0.078
13672160.66<0.0019GO:0000462 “maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA)”
GO:0030490 maturation of SSU-rRNA
GO:0022627 cytosolic small ribosomal subunit
GO:0006412 translation
GO:0043228 nonmembrane-bounded organelle
GO:0000462 “maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA)”
GO:0030490 maturation of SSU-rRNA
GO:0022627 cytosolic small ribosomal subunit
0.050
137113240.66<0.00111GO:0044238 primary metabolic process
GO:0030529 ribonucleoprotein complex
GO:0005840 ribosome
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0008152 metabolic process
GO:0044237 cellular metabolic process
0.080
13848120.6600.033
13958130.6600.041
140135370.66<0.00114GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0005488 binding
GO:0044424 intracellular part
GO:0044237 cellular metabolic process
GO:0043170 macromolecule metabolic process
0.101
141103210.66<0.00110GO:0009987 cellular process
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0043229 intracellular organelle
GO:0043226 organelle
GO:0065007 biological regulation
GO:0050789 regulation of biological process
GO:0050794 regulation of cellular process
GO:0043229 intracellular organelle
GO:0043226 organelle
0.063
142164320.66<0.00126GO:0044238 primary metabolic process
GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0006396 RNA processing
GO:0016070 RNA metabolic process
GO:0003824 catalytic activity
GO:0006396 RNA processing
GO:0030684 preribosome
GO:0030686 preribosome
GO:0034470 ncRNA processing
0.091
14390180.66<0.00121GO:0032991 macromolecular complex
GO:0019538 protein metabolic process
GO:0044238 primary metabolic process
GO:0043283 biopolymer metabolic process
GO:0044267 cellular protein metabolic process
GO:0008152 metabolic process
GO:0034960 cellular biopolymer metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0044237 cellular metabolic process
GO:0043234 protein complex
0.064
144101200.66<0.0013GO:0009987 cellular process
GO:0003674 molecular_function
GO:0008150 biological_process
0.052
14512240.66<0.0012GO:0008150 biological_process
GO:0003674 molecular_function
0.045
146121320.66<0.00114GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0009058 biosynthetic process
GO:0044249 cellular biosynthetic process
GO:0009987 cellular process
GO:0009059 macromolecule biosynthetic process
GO:0043284 biopolymer biosynthetic process
GO:0044249 cellular biosynthetic process
0.061
147121300.66<0.0016GO:0003824 catalytic activity
GO:0032991 macromolecular complex
GO:0044238 primary metabolic process
GO:0030684 preribosome
GO:0003824 catalytic activity
GO:0030684 preribosome
0.088
148104220.66<0.00123GO:0044238 primary metabolic process
GO:0034660 ncRNA metabolic process
GO:0034470 ncRNA processing
GO:0031125 rRNA 3′-end processing
GO:0009987 cellular process
GO:0000459 exonucleolytic trimming during rRNA processing
GO:0000467 “exonucleolytic trimming to generate mature 3′-end of 5.8S rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA)”
GO:0000469 cleavages during rRNA processing
GO:0006364 rRNA processing
GO:0016072 rRNA metabolic process
0.070
149140190.66<0.00114GO:0044238 primary metabolic process
GO:0019538 protein metabolic process
GO:0044267 cellular protein metabolic process
GO:0032991 macromolecular complex
GO:0005737 cytoplasm
GO:0044464 cell part
GO:0005737 cytoplasm
GO:0006417 regulation of translation
GO:0032268 regulation of cellular protein metabolic process
GO:0043170 macromolecule metabolic process
0.069
150116300.65<0.00114GO:0019538 protein metabolic process
GO:0032991 macromolecular complex
GO:0044267 cellular protein metabolic process
GO:0044445 cytosolic part
GO:0005198 structural molecule activity
GO:0022625 cytosolic large ribosomal subunit
GO:0043234 protein complex
0.079
15161210.65<0.0011GO:0003674 molecular_function0.051
15262150.65<0.0011GO:0003674 molecular_function0.041
15385270.65<0.0015GO:0016070 RNA metabolic process
GO:0003674 molecular_function
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0034660 ncRNA metabolic process
GO:0034660 ncRNA metabolic process
GO:0016070 RNA metabolic process
0.072
154142330.65<0.00112GO:0030529 ribonucleoprotein complex
GO:0044445 cytosolic part
GO:0032991 macromolecular complex
GO:0033279 ribosomal subunit
GO:0043228 nonmembrane-bounded organelle
GO:0005622 intracellular
GO:0043229 intracellular organelle
GO:0044422 organelle part
GO:0044446 intracellular organelle part
GO:0043226 organelle
0.099
15554120.6500.039
15671150.65<0.0016GO:0043283 biopolymer metabolic process
GO:0044238 primary metabolic process
GO:0034960 cellular biopolymer metabolic process
GO:0043170 macromolecule metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0034960 cellular biopolymer metabolic process
GO:0044260 cellular macromolecule metabolic process
GO:0043170 macromolecule metabolic process
0.052
157103340.65<0.00121GO:0032991 macromolecular complex
GO:0009987 cellular process
GO:0044445 cytosolic part
GO:0043228 nonmembrane-bounded organelle
GO:0043232 intracellular nonmembrane-bounded organelle
GO:0015934 large ribosomal subunit
GO:0022625 cytosolic large ribosomal subunit
GO:0051246 regulation of protein metabolic process
GO:0044424 intracellular part
GO:0032268 regulation of cellular protein metabolic process
0.079
15884190.65<0.0016GO:0005198 structural molecule activity
GO:0005488 binding
GO:0044445 cytosolic part
GO:0009987 cellular process
GO:0032991 macromolecular complex
GO:0005488 binding0.074
159103200.65<0.00110GO:0032991 macromolecular complex
GO:0034621 cellular macromolecular complex subunit organization
GO:0044238 primary metabolic process
GO:0009987 cellular process
GO:0043933 macromolecular complex subunit organization
GO:0065003 macromolecular complex assembly
GO:0034622 cellular macromolecular complex assembly
GO:0043933 macromolecular complex subunit organization
GO:0034621 cellular macromolecular complex subunit organization
0.063
1607470.650.0013GO:0044422 organelle part
GO:0044446 intracellular organelle part
GO:0009987 cellular process
GO:0044422 organelle part
GO:0044446 intracellular organelle part
0.037
1615770.64<0.0011GO:0003674 molecular_function0.048
1628760.63<0.0011GO:0003674 molecular_function0.048
1637550.61<0.0012GO:0032991 macromolecular complex
GO:0003674 molecular_function
0.045
16456100.5400.033

Notes: The steps to select specific GO terms from each cluster. (1) We hypothesise if a GO term appears on only a small number of biclusters (ie, 1 of 4 biclusters), it is specific for the biclusters. (2) We have 164 biclusters. By the proportion test, 1 of 4 biclusters corresponds to 31 of 164 biclusters at 0.05 significance level. (3) Therefore, GO terms appear less than 32 times are specific terms.

Acknowledgments

We would like to thank the Human Genome Center for providing computational resources to analyze all of the data, as well as for a scholarship from the Ministry of Education, Culture, Sports, Science and Technology to Sawannee Sutheeworapong. We would like to acknowledge Prof Kenta Nakai for providing good facilities to Sawannee Sutheeworapong in the early stage of this work. We also thank Dr Takeshi Obayashi for useful discussions in the early stage of this work.

Footnotes

Disclosure

The authors report no conflicts of interest in this work.

Authors’ contributions

SS, KK, and MO contributed to the overall research and the manuscript preparation. KK, MO, and HO were responsible for the project direction and financial support.

References

1. Ben-Dor A, Chor B, Karp R, Yakhini Z. Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol. 2003;10:373–384. [PubMed]
2. Ma X, Salunga R, Tuggle T, et al. Gene expression profiles of human breast cancer progression. Proc Natl Acad Sci U S A. 2003;100:5974–5979. [PubMed]
3. Yamane D, Zahoor MA, Mohamed YM, et al. Microarray analysis reveals distinct signaling pathways transcriptionally activated by infection with bovine viral diarrhea virus in different cell types. Virus Res. 2009;142(1–2):188–199. [PubMed]
4. Wang RS, Wang Y, Zhang XS, Chen L. Inferring transcriptional regulatory networks from high-throughput data. Bioinformatics. 2007;23(22):3056–3064. [PubMed]
5. Hartigan JA, Wong MA. A k-means clustering algorithm. Appl Stat. 1979;28:100–108.
6. Sokal RR, Michener CD. A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull. 1958;38:1409–1438.
7. Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM. Trans Computat Biol Bioinform. 2004;1(1):24–45. [PubMed]
8. Hartigan JA. Direct clustering of a data matrix. J Am Stat Assoc. 1972;67(337):123–129.
9. Cheng Y, Church GM. Biclustering of expression data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. ISMB 2000; San Diego, CA. August 19–12, 2000; AAAI Press; 2000. pp. 93–103.
10. Murali TM, Kasif S. Extracting conserved gene expression motifs from gene expression data. Proceedings of the Pacific Symposium on Biocomputing. PSB 2003; Lihue, HI. January 3–7, 2003; 2003. pp. 77–88. [PubMed]
11. Bergmann S, Ihmels J, Barkai N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys. 2003;67:031902. [PubMed]
12. Peeters R. The maximum edge biclique problem in NP-complete. Discrete Appl Math. 2003;131(3):651–654.
13. Merz P, Zell A. Genetic Algorithms and Grouping Problems. Philadelphia, PA: John Wiley & Sons; 1998.
14. Bleuler S, Prelic A, Zitzler E. An EA framework for biclustering of gene expression data. Proceedings of Congress on Evolutionary Computation; Portland, OR. June 19–23; Jun 19–23, 2004. pp. 166–173.
15. Chakraborty A, Maka H. Biclustering of gene expression data using genetic algorithm. Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB 2005; La Jolla, CA. November 14–14, 2005; pp. 1–8.
16. Divina F, Aguilar-ruiz JS. Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng. 2006;18:590–602.
17. Donale K. Fundamental algorithms. 3rd ed. Vol. 1. Boston, MA: Addison-Wesley; 1997. The Art of Computer Programming; pp. 318–348. Section 2.3.
18. Meffert K, Rotstan N. JGAP-Java Genetic Algorithms and Genetic Programming Package. [Accessed July 10, 2012]. Available from: http://jgap.sf.nt/
19. Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002;18:S136–S144. [PubMed]
20. Tanay A, Sharan R, Kupiec M, Shamir R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci U S A. 2003;102(9):2981–2986. [PubMed]
21. Prelic A, Bleuler S, Zimmermann P, et al. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22(9):1122–1129. [PubMed]
22. Gasch AP, Spellman PT, Kao CM, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000;11:4241–4257. [PMC free article] [PubMed]
23. Aguilar-Ruiz J. Shifting and scaling patterns from gene expression data. Bioinformatics. 2005;21:3840–3845. [PubMed]
24. Pontes B, Divina F, Giraldez R, Aguilar-Ruiz JS. Virtual error: a new measure for evolutionary biclustering. Evol Comput, Machine Learning and Data Mining in Bioinformatics. 2007;4447:217–226.
25. Teng L, Chan L. Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. J Signal Process Syst. 2008;50(3):267–280.
26. Ayadi W, Elloumi M, Hao J. A biclustering algorithm based on a bicluster enumeration tree: application to DNA microarray data. Bio Data Min. 2009;2:9. [PMC free article] [PubMed]
27. Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–328. [PubMed]
28. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. [PubMed]
29. Waltman P, Kacmarczyk T, Bate AR, et al. Multi-species integrative biclustering. Genome Biol. 2010;11:R96. [PMC free article] [PubMed]
30. Berriz GF, King OD, Bryant B, Sander C, Roth FP. Characterizing gene sets with FuncAssociate. Bioinformatics. 2003;19:2502–2504. [PubMed]
31. Wilcoxon F. Individual comparisons by ranking methods. Biometrics Bull. 1945;1(6):80–83.
32. Warner JR. The economics of ribosome biosynthesis in yeast. Trends Biochem Sci. 1999;24:437–440. [PubMed]

Articles from Advances and Applications in Bioinformatics and Chemistry : AABC are provided here courtesy of Dove Press