The two modes of genome evolution: the transitional mode
It should be recalled that a transitional (or shifting) mode in genome evolution was originally indicated by the gaussian analysis of buoyant density profiles of DNA (and DNA fractions) from cold- and warm-blooded vertebrates [
11]. In this mode, large changes occurr in isochore patterns. More specifically, GC-rich isochores were found to be absent or scarce in the carp and xenopus genome, respectively, compared to the genomes of human, mouse and chicken [
4], and similar differences were seen in orthologous genes [
12]. Interestingly, the transitional mode of the genome evolution could also be observed at the cytogenetic level because the compositional heterogeneity of genomes is reflected in the chromosomal banding patterns (see ref [
1] for a review). These findings indicate the existence of correlations linking compositional heterogeneity, chromatin structure and banding patterns. The transitional mode of evolution could now be checked on several vertebrate genomes at the sequence level with a much higher degree of precision.
The compositional differences of the genomes of human and xenopus were originally attributed to the different body temperature of warm- and cold-blooded vertebrates [
13]. This "thermodynamic stability hypothesis" accounted for the higher GC level of DNA and RNA. Moreover, it was noted that GC-rich codons preferentially encode aminoacids that confer thermal stability to the corresponding proteins. This latter point was recently confirmed by showing [
9] that, out of 18,795 human genes, those located in GC-rich isochores have an increased level of GC-rich codons leading to higher levels of stabilizing aminoacids (such as arginine and alanine) and lower levels of destabilizing aminoacids (such as lysine, isoleucine and asparagine). Expectedly, the opposite was found in genes located in GC-poor isochores.
The thermodynamic stability hypothesis is now supported by several new findings (i) the isochore patterns of anolis, xenopus and fishes (except for pufferfish; but see (iv) below) lack the GC-richest isochores present in the human pattern; (ii) the predominant GC-poor isochores of opossum might be related, at least in part, to the lower body temperature (32°C) of this marsupial; (iii) the small shift to the GC-rich side of the isochore distribution of chicken and the presence of a small GC-richest H4 isochore family might be related to the higher body temperature (41° – 43.5°C) [
14] of birds compared to mammals; (iv) the shift to the GC-rich side of the tetraodon genome, a fish living in tropical freshwater contrasts with the isochore pattern of fugu, a fish (from the same family) living at a lower temperature in the Pacific Ocean [
14] (see also Additional File
9 and Supplementary Figure S6 from ref. [
6]); (v) the isochore patterns of reptiles, a class of vertebrates known to be characterized by different body temperatures and different thermal regulations cover a broad spectrum; indeed, genomes may either be even more compositionally homogeneous than the xenopus genome (e.g., the anolis genome; see Figure ), or show the presence of GC-rich isochores (as in the case of
Testudo graeca and
Crocodylus niloticus [
15]); the latter point was recently confirmed by comparing GC
3 (the GC level of third codon position) of orthologous genes from
Alligator mississippiensis, human and chicken [
16]; (vi) both mammals and birds, two classes of vertebrates derived at different times from different ancestral reptiles (Therapsids about 220 Mya and Dinosaurs, about 150 Million years ago, respectively, [
17]), showed the formation of the same families of GC-rich isochores (compare the human and the chicken patterns of Figures and ), a clear indication of a convergent compositional evolution; likewise, a convergent evolution may be the explanation for the similarity of GC
3 values of orthologous genes from alligator and chicken; indeed, there is no compelling reason to consider common descent from archosaurs as the explanation [
16], given the large phylogenetic distance [
18,
19], the complex endo-ectothermic evolution of crocodiles [
20], and the contrasting data on the cold- or warm-bloodedness of the immediate ancestors of birds, dinosaurs; (vii) the excess of AT → GC over GC → AT changes observed in the genes of
Gillichthys seta, a fish living at 40°C, compared to the orthologous genes of
Gillicthys mirabilis, a congeneric fish living at 20°C [
21]; interestingly, the former one was characterized by positive selection on some genes and by an expansion of a GC-rich minisatellites in gene-rich regions.
The explanation why only the gene-rich regions of the genome and not the whole genomes underwent a GC increase was provided by the finding that those regions have an open chromatin structure ([
22], as also shown by accessibility to DNAse I [
23], and to apoptotic and MNase degradation [
24]), whereas the gene-poor regions could be stabilized by their own compact chromatin. This point is supported by the finding that, when the body temperature change is very rapid, as in the case of the divergence of
G. seta from
G. mirabilis (<0.66–0.75 Million years ago), the gene-rich regions of the genome are stabilized by the regional expansion of a very GC-rich minisatellite (see above).
Since the thermal stability hypothesis is based on general physical-chemical properties, it would be expected to be valid very widely. This is, indeed, the case as shown by the correlation of GC levels of paired sequences (stems) of ribosomal 18S RNAs with body temperature for vertebrates ranging from mammals to polar fishes (differences being seen even between eutherians, 37°C body temperature, and both marsupials and monotremes, 32°C body temperature, [
25]) and by the correlation of GC levels and optimal growth temperatures of prokaryotes [
26].
While body temperature seems to be a major determinant of the compositional properties of genome, other factors may also play a role. This is clearly indicated by the different isochore patterns of fishes. In this case not only temperature, but other environmental factors such as salinity, oxygen level, pH etc. are possibly involved. The compositional differences found between eutherians and monotremes that have different body temperatures (37° vs 32°) require further investigations to be understood. We know, however, that CpG and 5 mC values of monotremes are intermediate between the low values of eutherians and the high values of fishes [[
1,
27] and present work] (see Figure ), as expected from their body temperature.
It should be stressed that, while the original observations pointed to a shifting mode of genome evolution in the case of the compositional transition between cold- and warm-blooded vertebrates, which is now confirmed on a sequence basis, the present results indicate the existence of a shifting mode even within cold- (e.g., fishes) and warm-blooded vertebrates (e.g., marsupials vs. eutherians).
The two modes of genome of genome evolution: the conservative mode
The other, conservative, mode, in which the isochore patterns are maintained over evolutionary time, was found in eutherian genomes that displayed the "general compositional pattern" (e.g., human, chimp and dog genomes; as opposed to the mouse pattern; see below). Some differences in the relative amounts of isochore families were observed, but they were within narrow limits and appeared to be essentially due to differences in the relative amounts of interspersed sequences, as well as to insertions/deletions. Moreover, when isochores from MHC loci of human and mouse [
28], or from synthenic chromosome regions of human and dog were examined (see Figure from ref. [
2] for an example), a high degree of conservation was found. Incidentally, the narrower isochore pattern of mouse was interpreted as due to an increased mutation rate [
29,
30] and a poor repair mechanism [
31], two phenomena leading to some decrease of compositional heterogeneity.
The conservative mode was found in the present work to be further characterized by two remarkable properties that concerned the conservation in each isochore family of all vertebrate genomes investigated of (i) the average isochore size (with some limitations; see below); and (ii) the GC levels and dinucleotide frequencies. The conservation of the average isochore size may be correlated with the isochore role in chromosome organization. Indeed, it should be recalled here that the number of isochores estimated by us for the human genome, ~3200, is in agreement with the maximum number, 3000, of the highest resolution bands as assessed by Yunis et al. [
32] and that the boundaries of isochores coincide with those of chromosomal bands as obtained at the resolution of 850 bands (see Figure from ref. [
5]). Moreover, isochores have been observed to coincide with replication units [
33].
As far as the larger size of the GC-poorest isochore families of vertebrates is concerned, this may be due to the preferred insertion in these families of interspersed repeated sequences, as well as to sequence expansion phenomena [
1]. Unfortunately, the presence of gaps (in medaka) or their surprising absence (in stickleback) may also contribute a possibly important artefactual component to the large size of GC-poor isochores [
6]. This implies that more complete sequence data will be needed in order to obtain reliable assessments of the GC-poorest isochore size of medaka, stickleback (and also of zebrafish, opossum and chimpanzee).
The conservation of GC level and dinucleotide frequencies of isochore families can be understood by recalling that these frequencies were consistently different in the different isochore families from the human genome [
9]. Such differences are likely to influence protein/DNA interactions and, therefore, chromatin structure, possibly through nucleosome positioning [
34]. In turn, the existence of five isochore families suggested that a discrete number of chromatin structures are present in eutherian mammals. The different DNase accessibility of chromatin corresponding to isochores from different families [
23,
24] may be viewed as an indication along this line.
The conservative mode of evolution was originally explained by "negative selection acting at a regional (isochore) level to eliminate any strong deviation from the presumably functionally optimal composition of isochores" [
35]. A number of findings, accumulated during the past twenty years [
1,
2] and those presented in this paper, support this hypothesis.
An alternative proposal for the formation and maintenance of isochores was that "biased gene conversion (BGC) is probably the most likely cause of isochores" [
36]. This proposal has found a large number of supporters (see for example refs. [
37,
38]. While nobody disputes the existence and the importance of BGC, the link with the formation and maintenance of isochores has been the object of a debate. Indeed, there are some major problems with such a link. The first problem is that the randomness of a neutral process such as BGC and its changes in evolutionary time would lead to a tremendous variability of compositional patterns in vertebrate genomes. One would not expect, for instance, the conservation of isochore patterns in eutherian orders that have diverged about one hundred million years ago and have changed about half of the nucleotide that form their genomes [
2]. The second problem is that entire vertebrate classes, orders and families (such as the class of amphibians, the vast majority of fish orders and a number of reptilian families) do not show the formation of GC-rich isochores and just show a conservative mode of evolution. The third problem is the lack of evidence, or even of models and hypotheses, concerning the expansion process from the rare, small-size BGC events (in the hundreds of bp scale) [
39] to megabase regions.
In other words, if isochores were originating from BGC events, one should not expect the conservation of GC levels, sizes and (at least in Eutherians and chicken) of the relative amounts of isochore families, nor the very high similarity of GC and GC3 levels in orthologous genes from eutherians and birds. Instead, one should see differences in compositional patterns, and such differences should concern individual classes, orders and families of vertebrates.