Probably, the single major conclusion from all the comparative analyses of genome organizations in prokaryotes and eukaryotes is the lack of uniformity and the plurality of evolution patterns, and underlying mechanisms. With regard to genome architecture, what is true of E. coli definitely does not apply to the elephant or even to the fly. The operonic principle of gene arrangement in prokaryotes is the only indisputably strong trend of genome organization but it only affects the short-range gene order. Demonstrable long-range trends definitely exist, such as the preferential positioning of prokaryotic genes on the leading strand or clustering of coexpressed genes in eukaryotes. However, all these trends are “statistical”, i.e., relatively weak, and also, highly variable even between genomes of relatively close organisms. In line with this lack of overriding trends in genome organization, synteny is not a trait that is generally conserved over long evolutionary distances (that is, such distances at which amino acid sequences of most proteins substantially diverge). Major exceptions, such as the partial conservation of the ribosomal superoperon in bacteria and archaea, and of the homeobox gene clusters in animals, are notable and can be attributed to functional constraints. However, these cases encompass only a small fraction of genes in genomes and only affect relatively short-range synteny. In prokaryotes, where inversions around the origin point are common, and so is HGT, complete deterioration of long-range synteny is often observed even between organisms that share nearly complete sets of highly conserved orthologous genes (). Although, apparently owing to the absence of origin-centered inversions and low incidence of HGT, there is more synteny conservation in eukaryotes, almost none of it carries across phyla, and there definitely are no pan-eukaryotic gene clusters that would be comparable in their level of conservation to the ribosomal operons or ATPase operons in prokaryotes. Thus, in general, genome architecture is a highly variable, volatile feature of organisms.
What are, then, the evolutionary forces that shape genome architecture? Of course, there are multiple ones. Clearly, genome organization is neither random – no genome is simply an arbitrary string of genes - nor a fully optimized design selected to encode the optimal phenotype. The principal explanatory framework for understanding evolution of genome organization can be drawn from the population-genetic theory of evolution of genomic complexity that was recently expounded by Lynch (Lynch, 2007
, Lynch and Conery, 2003
). The theory maintains that genetic changes leading to an increase of complexity such as gene duplications or intron insertions are slightly deleterious and can be fixed only when purifying selection in a population is weak. Therefore, substantial genome complexification is possible only during population bottlenecks, given that the strength of purifying selection is proportional to the effective population size. Under this concept, genomic complexity is not adaptive but is brought about by neutral population-genetic processes under conditions when purifying selection is ineffective. Complexification starts off as a “genomic syndrome” although complex features subsequently become subject to adaptive selection. By contrast, in “highly successful”, large populations, purifying selection is intense, so that the prevailing mode of evolution in these prokaryotes is genome contraction. Most of the prokaryotic genomes and genomes of many unicellular eukaryotes do not pass the “complexification threshold”, the result being compact, streamlined genomes with a relatively small number of genes, short intergenic regions, and few selfish elements. By contrast, the genomes of multicellular eukaryotes are beyond the threshold, so fixation of multiple duplications as well as proliferation of transposable elements (TEs), the latter also facilitated by sex (Lynch, 2007
), become possible.
Of course, all these trends are far from being hard principles, and there are bacterial genomes with more than 12,000 genes (Schneiker et al., 2007
) as well as genomes of unicellular eukaryotes (e.g., Chlamydomonas
(Merchant et al., 2007
(Carlton et al., 2007
)) that are at least as complex by any criteria as the genomes of multicellular animals or plants. Furthermore, some prokaryotic genomes (e.g., the crenarchaeon Sulfolobus solfataricus
(She et al., 2001
)) and genomes of unicellular eukaryotes (e.g., Trichomonas vaginalis
(Carlton et al., 2007
)) are among those with the highest content of TEs. Apparently, the evolution of even these, relatively small genomes depends on the balance between the pressure of purifying selection, itself dependent on the population size and mutation rate, the intensity of recombination processes, and the activity of selfish genetic elements.
Where in the evolution of genome architecture can we see clear imprints of selection, in particular, positive selection? It seems that selection is an important factor in the evolution of operons. Operons can easily form by chance, in a completely neutral fashion, through genome compactification (streamlining) which leads to the formation of tightly spaced strings of codirectional genes, directons (Salgado et al., 2000
, Wolf et al., 2001
). Those of the randomly assembled operons that consist of functionally linked genes provide a selective advantage to their carriers owing to the possibility of co-expression and co-regulation, so such operons are fixed in evolution and often become widespread via HGT. This view of operon evolution incorporates the selfish operon hypothesis according to which operons are maintained as selfish elements via HGT (Lawrence, 1999
, Lawrence, 1997
, , 2003
) but also includes a distinct effect of positive selection that is amplified by HGT. So operons can be reasonably viewed as partially selfish elements whose survival depends both on their selective value for the carrier organisms and on random HGT.
The role of HGT in the persistence of operons is indirectly but, in my view, strongly supported by the fact that no strings of genes homologous to prokaryotic operons are detectable in eukaryotic genomes (Y.I. Wolf and EVK, unpublished results). Regardless of the exact scenario for the origin of eukaryotes, the genome of the last common ancestor of the extant eukaryotes must have acquired diverse operons, at least, as part of the DNA transferred from the mitochondrial endosymbiont, and possibly, also from the archaeal (under the symbiotic hypotheses of eukaryotic origin (Embley and Martin, 2006
, Martin and Koonin, 2006
)) or protoeukaryotic (under the archaezoan or related hypotheses (Kurland et al., 2006
, Poole and Penny, 2007
)) host. The lack of any traces of such inherited operons in eukaryotic genomes suggests a ratchet-type scenario of operon elimination: once an operon is gone, in the absence of appreciable HGT, the loss is virtually irreversible.
Conversely, reconstruction of the evolutionary dynamics of operons in nematodes yielded a “easy come, slow go” scenario, with the rate of gain substantially exceeding the rate of loss (Qian and Zhang, 2008
). Thus, it appears that operons that are randomly created by recombination are subsequently maintained by purifying selection.
In multicellular eukaryotes, the relatively small population size and relatively low characteristic mutation rates translate into comparatively weak purifying selection, so that various degrees of genome enlargement and complexification become possible. Hence the formation of large clusters of tandemly duplicated genes, a feature that can be viewed as an increase in genome ordering. However, the counter trend is also apparent, namely, the increased activity of transposable elements that leads to an increase in genomic disorder. In vertebrates, this mobilization of transposable elements is particularly dramatic so that the genomes consist mostly of TE-derived sequences (Makalowski, 2000
). In an already familiar pattern that is a crucial part of the neutral paradigm of the evolution of genomic complexity (Lynch, 2007
), the TEs comprise an important source for recruitment (exaptation) of new regulatory and, possibly, even structural sequences (Jordan et al., 2003
, Thornburg et al., 2006
To what extent gene clustering in eukaryotes is affected by selection and what the targets of this potential selection are remain widely open questions. As such, co-expression of adjacent genes cannot be considered evidence of selection because, when genes are located in the same chromatin domain, up- or down-regulation of one gene can accidentally cause a concordant change in the expression of the other owing to the effect of chromatin remodeling (Spellman and Rubin, 2002
). Such co-expression does not necessarily confer any benefits on the organism and might not be subject to selection (Hurst et al., 2004
). Clustering of genes that are directly functionally associated, such as enzymes in the same pathway (Lee and Sonnhammer, 2003
), is hard to explain without invoking selection. However, the lack of significant evolutionary conservation of such clusters is surprising and suggests that either the selective pressure that leads to fixation and persistence of these clusters is quite weak, or that relative importance of clustering (and the ensuing co-regulation) changes rapidly in the course of evolution (or both).