In all 24 datasets the distribution of Pearson correlation coefficient appeared to be a bell-shaped curve centered at approximately zero. In what follows, we will describe the results of the HC pairs unless otherwise noted. A higher CoER was obtained in the first bin than a long CD range between 980 k – 1000 kbp in all datasets.
In Fig. , the CoERs obtained from the eleven human datasets are plotted as a function of CD: Fig. and show the results of the HC, ZC and NC pairs, respectively. The CoERs from the HC pairs decreased as the CD increased although there were large swings in CoER in a CD range above 100 kbp. These swings were larger in smaller data sets because of stochastic effects and undersampling of the transcriptome. In contrast, in the CD range below 100 kbp, the CoERs from the ZC and NC groups were relatively flat (compare Fig. with and ).
Figure 1 The co-expression rates (CoERs) in the eleven human datasets are plotted as a function of chromosomal distance (CD): the results from the highly-correlated (HC) pairs (A), the zero-correlation (ZC) pairs (B) and the negatively-correlated (NC) pairs (C). (more ...)
The weighted average and standard deviation over the eleven curves in (A), (B) and (C) are shown in (D), (E) and (F), respectively. The average in (D) showed the same tendency of gradual decrease to 0.2, whereas the averages in (E) and (F) showed no distance effect. The weighted averages for all six species are given in Fig. . The phenomena shown in Fig. were consistently observed in all species. These results strongly suggest that co-expression of neighboring genes is common in many eukaryotic species and also that the CD, especially a short CD below 10 kbp, is associated with increased frequency of co-expression.
Figure 2 A comparison of the six species using the weighted average of each species. (A) the highly correlated pairs, (B) the zero correlation pairs and (C) the negatively correlated pairs. Note that worm2 represents the CoERs of the worm dataset without the pairs (more ...)
The results of pair-wise analysis of GO category appeared to be different in the yeast and human datasets (Table ). In yeast, only the HC pairs shared the same category and most of the pairs were not duplicates. Four pairs were found in a long CD range (980 – 1000 kbp). In the human data, more variety was obtained and pairs having the same category were found even in the NC group. A category, GO:5887 (integral to plasma membrane), was seen in the three groups while GO:6954 (inflammatory response) was seen in the HC and ZC. In the long CD range, eleven pairs were found. The results suggest the human genome involves more complicated functional relationships over a substantial CD.
Gene Ontology categories shared by the two genes in pairs with a CD below 20 kbp
Pair-wise protein BLAST
Table summarizes the BLAST results. In yeast, about 780 out of the 10,00 pairs were deemed to be duplicated pairs (E < 0.2). In the worm dataset, 8,370 pairs were available for the analysis and 2,658 (31.8 %) were regarded as duplicates. In the mouse datasets, out of about 3,500 analyzed pairs, 11.2 % were putative duplicates. However, in the three species, most pairs had expected values larger than 1. These results indicate that there are many non-duplicate pairs in the HC group, suggesting that the high CoERs in the HC group were due to not only effects of duplicated pairs but also non-duplicate effects. The distributions of the expected values in the five mouse datasets were almost the same (Table ), suggesting that differences in microarrays were not significant.
Distribution of expected values obtained in pair-wise protein BLAST
Intra- and inter-species comparisons
In Fig. , the standard deviations for the first three bins are relatively large. For example, the CoERs in the first bin in the eleven datasets were in a range between 0.24 and 0.38. To investigate intra-species differences in humans, a multiple comparison with the Ryan procedure was carried out (α = 0.01). Forty-nine out of the 55 possible combinations were not significantly different. In the five mouse datasets, 12,778 HC pairs out of approximately 20,000 used for the BLAST analysis were commonly seen in two or more datasets. The distributions of the expected values in the five mouse datasets were almost the same (Table ). This suggests that there was no significant intra-species difference. Accordingly, the noise in the microarray data and the differences in microarray design appear to have minor influences on our results.
The CoERs in the weighted averages of the six species were compared. The results of the intra- and inter-species comparisons indicate that there are significant differences (p < 0.01) in the CoER in a short CD range (0 – 20 kbp) between any pair among worm, mammal (human, rat and mouse), fruit fly and yeast except two pairs of (worm and rat), and (mouse and fruit fly) (Table ). Although the rat CoERs in the first three bins are almost the same as those of the worm (Fig. ), the results using the normalized distance (Fig. ) strongly suggest that the multicellular organisms except worm show similar CoERs. The CoERs of worm and yeast were much larger than the others for a ND range between 0.3 and 1. In yeast, which is a unic ellular eukaryote with a compact genome, the organization of coordinated gene regulation is probably different from the other species with more dispersed genomes. As previously reported [3
], the worm genome involved much more duplicated pairs than the yeast and mouse (Table ). According to Blumenthal et al. [30
], the worm genome involves at least 1,000 operons, which correspond about 15 % of all C. elegans
genes. After excluding the pairs in the duplicates and operons, the worm CoERs are similar to those of the other multicellular organisms (see worm2 in Fig. ), indicating that the duplicates and operons are the main reason for the larger CoERs in the worm. However, when the ND was used, the worm curves (both with and without duplicates and operons) were similar to the yeast data.
The results of a multiple comparison with the Ryan procedure
Figure 3 The co-expression rates shown in Figure were re-plotted against the normalized distance. A value on the horizontal axis can be smaller than 1 because all possible pairs next to each other were involved in the calculation (i.e. we did not exclude (more ...)
A comparison of Figs. and provides some information on mechanisms behind the distance effect. If the physical distance has the dominant effect (chromatin remodeling is a possible cause), the CoERs in Fig. should be similar across the species. On the other hand, if the effect of ND is major, Fig. should show similar curves. The actual results seem to be the former and suggest that the CD plays the dominant role. In the multicellular organisms, the CoERs were higher than 0.2 up to about 50 kbp whereas the yeast curve was flat above 10 kbp. The mechanism behind this difference is currently not clear, but there are some clues. For example, several factors have been identified as controlling localized gene transcription [31
]. These include the size of euchromatic chromosome territories and the spacing of chromatin "insulators" which provide impedance to non-specific enhancer activity upon neighboring genes [32
]. Variation across species of these factors could explain our findings although this has not been systematically studied to date. Further analysis is required to advance our understanding of the mechanisms.