We have examined the relationship between microsynteny conservation and the density of orthologs of human disease genes in the mouse genome. We found a correlation between regions of conserved microsynteny and the location of mouse orthologs of human disease genes, which is consistent for variations in the window size used in our analyses. The correlation we observe suggests that regions of the mouse genome with a high density of disease gene orthologs undergo less rearrangement than regions of the genome with fewer disease gene orthologs. Genes associated with human disease are often orthologous to essential genes in other organisms [42
]. Previous studies from both mammals [29
] and other eukaryotes [24
] have shown that essential genes are located in highly conserved genomic regions. Thus, disease-related genes, which perform essential functions, are more likely to be found in conserved regions of the genome.
Several studies have found that at the sequence level, human disease genes are more conserved than non-disease genes [26
]. The sequence conservation of human disease genes with essential C. elegans
orthologs is higher than those disease genes whose orthologs are not lethal when mutated [44
]. Interestingly, genes with high polymorphism among humans, but no divergence between humans and chimpanzees, are highly associated with Mendelian disease [45
]. Similarly, human disease genes with weak negative selection, where mutant alleles persist in the population, are more likely to cause diseases with Mendelian inheritance [46
]. Mendelian disease genes are more constrained evolutionarily than disease genes with non-Mendelian inheritance patterns [45
]. Together, these observations support our finding that the mouse orthologs of human disease genes are preferentially found in genomic regions with high microsynteny conservation.
Recombination may be mutagenic due to the possibility of unequal crossing-over. Thus faulty recombination events in regions with essential genes are likely to be deleterious to the survival of the organism and may thus be selected against during mammalian evolution. Studies of the human genome support this link between low recombination and essential genes. Regions of the human genome with high linkage disequilibrium, and thus low recombination, are enriched for genes associated with essential cellular functions such as response to DNA damage, cell cycle progression, or DNA and RNA metabolism [47
]. Genes that show variation in populations, such as immune response genes, are often found in regions with low linkage disequilibrium, suggesting that recombination in these regions is not deleterious to the organism [47
]. Likewise, human genes found in mutation cold spots tend to be genes involved in essential cellular processes, while those in mutation hot spots include immune response genes [48
]. These findings extend to non-coding sequences as well, as human genomic regions that are highly conserved with the pufferfish have been found to contain enhancers for developmental genes [49
The correlation between disease gene density and microsynteny conservation, although significant, is not perfect. Discrepancies may come from several sources. For example, annotation of human disease genes is incomplete. Many housekeeping genes, which are likely to be essential for mammalian development, are not annotated as human disease genes, probably because mutations in these genes are lethal early in development, and thus humans with mutations are not viable [50
]. The genomic region between 55 - 75 Mb on mouse chromosome 3 shows high conservation but a low density of disease gene orthologs. However, the genes Wwtr1
are located in this genomic region. A mouse knock-out of Wwtr1
displays a phenotype resembling human polycystic kidney disease [51
], and the mouse knock-out of Shox2
is lethal with cleft palate [52
], strongly suggesting that these genes are linked to human disease, although neither is annotated as a disease gene in OMIM.
Likewise, many genes that are annotated as human disease genes may not be strictly essential for survival, and thus these genes are not expected to have conserved microsynteny. The genomic region between 85 - 105 Mb on mouse chromosome 12 has a high density of disease gene orthologs but low conservation. Mutations in the human gene SERPINA10
, whose ortholog is located in this region, are associated with susceptibility to deep vein thromboses [53
]. Although SERPINA10
is annotated in OMIM as a disease gene, it is unlikely that inherited mutations in SERPINA10
present a challenge to survival of the individual, suggesting that SERPINA10
does not represent an essential gene. Finally, many diseases, especially cancers, are caused by translocation events that produce chimeric proteins. While a genomic region may have a great density of disease loci due to translocations, these regions would not show microsynteny conservation, as they are high in rearrangements.
Discrepancies between microsynteny conservation and the density of disease-related gene orthologs may also arise because other factors contribute to selective pressure on genome evolution. For example, previous studies have suggested that mammalian genes are clustered into groups based on co-expression [54
]. It is proposed that gene expression is therefore an evolutionary constraint on genome organization, although the effect is weak as gene clusters are found only slightly more often than by chance [55
]. There is also evidence that many over-lapping gene pairs exist in mammalian genomes, and that these gene pairs are conserved in multiple species, probably because recombination or mutation in these regions of the genome would cause deleterious mutations in both genes [56
]. Alternative mechanisms for the presence of essential genes constraining genome structure have also been proposed [24