In the past few years several bioinformatic tools and approaches have been developed to assist medical genetic researchers in positional candidate disease gene identification (reviewed in [1
]; see also [2
]). Several tools use functional genomics to prioritize candidate genes located within disease-associated genomic loci by evaluating functional relationships between known disease genes and positional candidate genes [6
]. These tools are based on the premise that genes which are involved in the same disease phenotype are likely to be functionally related [1
]. This has indeed been shown to be the case as evidenced by the fact that these tools all perform better than random expectation in the prediction or prioritization of candidate disease genes. Nevertheless, not all types of functional genomic data perform equally well in terms of sensitivity and specificity [2
]. Microarray expression data have wider coverage than other high-throughput genomic data such as protein-protein interactions, as genome-scale expression analyses are readily and routinely performed with them. Additionally, they are less biased toward better studied genes than gene function annotation or literature mining, although the latter approaches fare better at prioritizing disease candidate genes [2
]. Therefore, given the large coverage of co-expression data and their complementarity to functional annotation and literature mining, it is of importance to maximize the disease gene predictive value of this type of data.
Several bioinformatic candidate disease gene prioritization tools already incorporate microarray-based co-expression data [2
]. This approach is based on the assumption that if two genes are functionally related then their expression should vary concordantly across tissues and under different circumstances, and proposes that their expression profiles should therefore be correlated. For candidate disease gene prioritization, the use of co-expression analysis is preferable to the use of tissue-specific gene expression patterns, as it is a better predictor of functional relatedness between genes [13
However, co-expression data can be applied more comprehensively than is currently implemented by these tools. One important and currently underexploited approach is to incorporate co-expression data from other species. One might expect that while human co-expression data are the most relevant for disease gene prioritization, evolutionary conservation of co-expression can be used to enhance the reliability of identified co-expression relationships. The premise is that co-expression relationships that are maintained across phylogenetically distant organisms must be under selective pressure, and should therefore be functional – a premise that has indeed been confirmed in several previous studies [14
]. Though one tool already includes multi-species co-expression data [11
], the improvement in disease gene ranking performance due to the exploitation of evolutionary conservation has not yet been investigated.
We therefore investigated the predictive value of conserved co-expression for candidate disease gene prioritization. To this end we analyzed how well co-expression between known and candidate disease genes could prioritize positional candidate disease genes. We restricted our analysis to known disease genes from genetic diseases containing at least two known causative genes. We constructed artificial loci of 100 candidate genes around the known disease-causing genes, and investigated the tendency of these causative genes to have higher co-expression with other known causative genes compared to the non-causative candidate genes from the same disease loci. Using co-expression data from five eukaryotic species – baker's yeast (Saccharomyces cerevisiae
), nematode worm (Caenorhabditis elegans
), fruit fly (Drosophila melanogaster
), mouse (Mus musculus
) and human – we investigated the effect of evolutionary conservation on the ranking of the disease gene pairs, finding that evolutionary conservation of co-expression does indeed improve disease gene ranking. Therefore, exploiting evolutionary conservation could potentially improve the performance of co-expression data in existing disease candidate gene prioritization tools [2
], which might in turn improve the prioritization of less well-studied genes.