The simple idea that the functionality of eukaryotic genomes is determined solely by the content of genes and their regulatory regions has been gradually replaced by a more complex view, which recognizes a crucial role for the way in which these functional elements are distributed and organized. The discovery that some groups of genes with particular organizations (normally neighboring genes) have been conserved over long periods confirms that, at least in some cases, proximity between genes is essential for their functionality. The fruitfly Drosophila melanogaster
contains several examples of duplicated genes arranged in such a fashion and that are involved in embryonic patterning; these include the en-inv
] and eyg-toe
] pairs, the achaete-scute
] and iroquois
], and most significantly the Antennapedia
and Bithorax Hox
], whose genomic organization has been conserved since the appearance of metazoans. The identification of substantial overlap in the expression patterns between genes within these groups suggests that these arrangements might be first fixed and subsequently maintained by the need for certain shared regulatory regions.
Beyond these specific examples, a number of large-scale computational studies have attempted to detect and measure the level of gene organization within eukaryotic genomes. These analyses searched for significant correlation between gene order and co-expression, under the assumption that neighboring genes will be expressed in a concerted way (for a review, see [8
]). However, the results of these studies, normally consisting of rather weak correlation signals, are insufficient to provide an understanding of the overall gene organization in eukaryotic genomes. This is the case not only when comparing gene order with co-expression, but also when comparing groups of genes belonging to different functional classes or involved in the same process or pathway [9
]. In contrast to prokaryotes, where functionally related and co-expressed neighboring genes (mostly arranged in operons) are abundant and easily identified, eukaryotic genomes present an apparently much more complex organization, in which genes with no obviously ordered distribution predominate and co-exist with a smaller class of clustered, co-expressed genes. An important limitation of previous analyses is that, because of their global nature, they were unable to identify which genes require a particular genomic arrangement for their function and are, therefore, directly responsible for the detected correlation signal. Furthermore, many large-scale studies have deliberately not considered duplicated genes in order to exclude a disproportionate co-expression signal due to recently duplicated genes [12
], despite the fact that most co-expressed neighboring genes in eukaryotes appear to have arisen by gene duplication. In fact, the identification of these clusters of duplicated genes underlines the importance of gene organization in eukaryotic genomes [8
], and provides important information about how genes evolve after their duplication.
Here, we have combined computational and experimental approaches to identify and characterize all detectable duplicated genes that have been conserved in close proximity in the Drosophila genome. Through analysis of available in situ expression data we have also evaluated the expression pattern of the detected cases in order to determine their level of co-expression. We found that a number of duplicated genes have been retained as tandems over a longer period than would be expected in the absence of selective constraints, and that this gene set is enriched in genes involved in developmental processes as well as those encoding transcriptional regulators. Furthermore, we show that these ancient tandem duplicates show a higher level of co-expression than other genes, even recently duplicated tandem pairs.