|Home | About | Journals | Submit | Contact Us | Français|
Previous investigations from our laboratory were largely focused on the genome organization of vertebrates. We showed that these genomes are mosaics of isochores, megabase-size DNA sequences that are fairly homogeneous in base composition yet belong to a small number of families that cover a wide compositional spectrum. A question raised by these results concerned how far back in evolution an isochore organization of the eukaryotic genome arose.
The present investigation deals with the compositional patterns of the invertebrates for which full genome sequences, or at least scaffolds, are available. We found that (i) a mosaic of isochores is the long-range organization of all the genomes that we investigated; (ii) the isochore families from the invertebrate genomes matched the corresponding families of vertebrates in GC levels; (iii) the relative amounts of isochore families were remarkably different for different genomes, except for those from phylogenetically close species, such as the Drosophilids.
This work demonstrates not only that an isochore organization is present in all metazoan genomes analyzed that included Nematodes, Arthropods among Protostomia, Echinoderms and Chordates among Deuterostomia, but also that the isochore families of invertebrates share GC levels with the corresponding families of vertebrates.
Recent investigations from our laboratory  showed that the isochore families of all vertebrate genomes explored are essentially conserved in GC levels (as well as in dinucleotide patterns) and in isochore sizes (with some exceptions in this case). Moreover, at least in eutherian mammals and birds, even the relative amounts of isochore families (i.e. the compositional patterns) are largely conserved. It is well established by our previous work that a number of very basic genome properties, such as the distribution of genes and interspersed repeats, DNA methylation, gene expression, replication timing and recombination, are different in GC-poor and GC-rich isochore families (see [2,3], for reviews). These results obviously support the idea of isochores being a "fundamental level of genome organization" , at least in vertebrates. We also suggested that the conservation of GC levels of isochore families may underlie chromatin structures , and that the conservation of isochore size may be associated with their role in the structure and replication of chromosomes .
The above points raise the question as to how general the isochore structure is in the genomes of metazoa. This is a pertinent question because evidence for a compositional heterogeneity was already obtained in our previous work on the genomes of trypanosomes, plasmodium and drosophila [see ref. ]
Here we approached this problem by investigating the compositional compartmentalization of genomes from invertebrates in all cases in which either full sequences or at least scaffolds were available. The case of Apis mellifera will be presented elsewhere because the presence in this genome of a very GC-poor family (in addition to the other isochore families), which was absent from other insects, required a special analysis. Expectedly, the present analysis broadens our view of the structure and evolution of isochores. In fact, it demonstrates that an isochore organization endowed with a number of shared features is found in the genomes of all metazoans explored in the present work.
Figure Figure1,1, which displays a simplified tree of life (derived from ) of metazoans, in order to outline the phylogenetic position of the genomes investigated. Figure Figure11 is shown to indicate the phylogenetic positions of invertebrate genomes explored. These comprised Nematodes, Arthropods, Echinoderma and Chordata. As shown in Additional File 1, the sizes of the genomes explored covered a 2.5-fold range, 95 to 240 Mb which, added to the 8-fold range of vertebrates, 400 to 3200 Mb, expand the overall range of investigated genome sizes up to a 33-fold value.
Table Table1A1A presents the relative amounts of the isochore families of all invertebrates investigated (along with those of human and three fishes that are shown for the sake of comparison). The salient features that appear in the isochore patterns (Figures (Figures22 and and3)3) will now be outlined. The isochore families of Ciona intestinalis, a Urochordate, which is the closest ancestral species to vertebrates  are presented in Figure Figure22 (along with those of three representative vertebrates, human, chicken and zebrafish) in order to show the existence of a major L1 and a minor L2 isochore family. The genome of the nematode Caenorhabditis elegans (Figure (Figure2)2) also displays a very GC-poor genome, consisting essentially of L1 isochores with only a very small amount (less than 10%) of L2 isochores.
Among insects (Figure (Figure3),3), the three Drosophila species studied and Anopheles gambiae display three isochore families: a minor L2 family, a predominant H1 family and a H2 family which is barely represented in Drosophila, but is rather abundant in Anopheles. A very minor (about 1%) L1 component appears to correspond to chromosome 4 in Drosophilids and is barely represented (0.5%) in A. gambiae. The isochore families of insects are shown along those of a fish, stickleback.
The compositional patterns of scaffolds or contigs from Branchiostoma floridae, Strongylocentrotus purpuratus, Aedes aegypti, Tribolium castaneum and Daphnia pulex are shown in Figure Figure4.4. Compositional distribution were generally narrow, covering a range of about 5% GC, with the exception of T. castaneum in which case the range was about 10% GC, the center of distribution being lower (33%) than in the other cases.
The average GC levels of the isochore families from the invertebrates investigated are very close to each other and to the corresponding values of vertebrates (see Table Table1B1B).
As far as dinucleotide patterns are concerned, C. intestinalis showed a remarkable observed/expected (O/E) pattern in that complementary dinucleotides belonged in two classes, higher and lower than statistical expectation, respectively (see Figure Figure5).5). Indeed, AA/TT, AC/GT, CA/TG, CC/GG were in the more frequent class, whereas AG/CT, GA/TC were in the less frequent class. The dinucleotides from all these pairs showed the same frequency. In contrast, TA and CG were less frequent than their complementary dinucleotides AT and GC. Moreover, AT/TA belonged to the less frequent class whereas CG/GC were close to statistical expectation.
Some features of the C. intestinalis, a Urochordate, were also found in C. elegans, a Nematode. Indeed in C. elegans AA/TT, CC/GG and CA/TG were also high, whereas AG/CT were also low, but, in contrast, GA/TC were low and AC/GT were high. Trinucleotides frequencies, showed the expected similarities and differences (see Additional File 2).
AA/TT, CC/GC and CA/TG were also high in the insects (and in vertebrates; see ref. ; for sake of comparison with human see also Additional File 3), whereas AG/CT was low in insects (but not in vertebrates), and AT/TA were low (TA especially) in all cases (see Figure Figure66).
Very interestingly, CG was remarkably low in S. purpuratus and B. floridae. The 0.6 values attained are, however, still much higher than the 0.2 values reached by mammals (see Figure Figure5B5B).
The average size from the most represented families of isochores (see Table Table1C)1C) is about 0.5 Mb (megabases) for C. intestinalis, but higher values were found for C. elegans (3.4 Mb) for Drosophilids (1.6-2 Mb) and A. gambiae (1.3 Mb). The GC-poorest isochore families comprise two size groups, a large one and a small one, as in the case of vertebrates [9-11]. In particular, C. elegans showed a number of extremely long GC-poor stretches (see Discussion). In contrast, the GC-richest isochores are characterized by one size group, the small one (see Additional File 4).
Figure Figure77 displays the gene densities of the isochore families from D. melanogaster and A. gambiae along with those of two vertebrates, human and stickleback, that are presented for the sake of comparison. In both cases, gene density increased with increasing GC of isochore families. Unfortunately, the data for C. intestinalis and C. elegans did not permit to establish a reliable ratio between gene densities in the L1 and L2 families because of the small amounts of DNA and number of genes of the minority family. In the case of the genomes only represented by scaffolds, genes were localized on the scaffolds and gene densities were shown to follow the general trend (see Figure Figure44).
C. intestinalis and C. elegans essentially show a very predominant L1 isochore family and a minor L2 family, a compositional pattern mimicking that of zebrafish. The case of C. intestinalis is of interest because a previous investigation by analytical ultracentrifugation in a CsCl density gradient had shown a remarkable homogeneity at an average molecular weight of 100 kb . The apparent discrepancy can, however, be explained by the fact that the CsCl investigation dealt with random fragments, whereas the present one with 100 kb unique segments. The latter show an average standard deviation of 1.3% GC (Cammarano R., Ph.D. Thesis) a value very slightly above the average standard deviation of 100 kb segments from human isochores from the L1 and L2 families. The dinucleotide patterns (observed/expected) present some significant differences (e.g. TA being much lower than AT in C. elegans, but not in C. intestinalis), which are also expectedly shown by trinucleotides.
Among insects, the Drosophilids exhibit similar isochore patterns that are intermediate between those of medaka and stickleback with a major H1 family and two minor L2 and H1 families. Another point of interest is the close similarity of GC levels of the isochore families as assessed on Drosophilids and Anopheles (see Table Table1B).1B). The compositional pattern of A. gambiae, although being mainly represented by H1 isochores, shows a substantial amount of H2 isochores. In fact, the GC-richer isochores of A. gambiae have probably been underestimated because of the presence of a large number of 100-kb GC-rich stretches that were pooled with the flanking regions because of the procedure used in assessing isochores (see below).
In the other cases, certainly a number of factors played a role and led to different compositional patterns. Indeed, we already noted that while the large evolutionary changes in isochore patterns occurring between mammals/birds and amphibians/fishes mainly depend upon body temperature, definitely other factors play a role as well in the case of fishes.
The average sizes exhibited by different isochore families of invertebrates showed a greater variability compared to those of the corresponding families from vertebrates . This may be due, however, to artefactual reasons, such as gaps, but also to the experimental approach used. Indeed, in the human genome , isochores were taken to be at least 200 kb in size, a condition linked to the need of assessing standard deviations of the 100 Kb segments used in the analysis. Expectedly, this occasionally led to standard deviation higher than 3 % GC within a given isochore. Since however this only concerned ~7% of the human genome such "transition isochores" were accepted.
In the case of C. elegans long stretches of DNA very low in GC and belonging to L1 isochores were present and interspersed short L2 isochores were neglected. If one accepts in this case isochores reaching a low size value of 100 Kb, the very long L1 structures are resolved into shorter stretches and the high size values are brought back to the 0.5-1 Mb range. For instance, in the case of C. elegans the large size (3.37 Mb) estimated according to the criteria of Costantini et al. , is reduced to 1.00 Mb, if 100 Kb segments belonging to the GC range of L2 isochores (and averaging 0.23 Mb in size) are considered separately. This considerations also applies to the large size of major families of insects as indicated by thin 100-kb lines appearing in the GC profiles.
The major conclusions of the present investigations are the following: (i) an isochore structure appears to be general for all metazoans explored; this raises the question whether, in fact, all eukaryotic genomes are characterized by an isochore structure; current work on plants and unicellular eukaryotes should clarify this point; (ii) the isochore families are generally characterized by GC levels that were identical or very close to those of vertebrates; (iii) differences in dinucleotide patterns (observed/expected values) were found among invertebrates, as well as between invertebrates and vertebrates; in the latter case, the most salient feature was the CpG shortage which is due to the methylation of C in CpG followed by its deamination to T; this feature was also found in S. purpuratus and B. floridae even if at a lower extent compared to mammals; (iv) the average size of isochore shows a certain variability, which is apparently due at least to a large extent to artefactual reasons, as discussed in the preceding section; (v) no correlation was found between isochore size and genome size in spite of the very large genome size range explored so far; this practical independence of isochore size on genome size stresses their possible correlation with the structure and replication of chromosomes, as suggested by Costantini and Bernardi ; (vi) the relative amounts of isochore families are different in different genomes, a situation due in our opinion to the different environmental factors that play a role in determining compositional patterns of genome (for example, if Anopheles has a higher body temperature than Drodophilids it could explain the higher amounts of GC-rich isochores in A. gambiae); (vii) gene concentration increases with increasing GC of isochore families, as previously found for vertebrates.
These conclusions are in keeping with some previous suggestions : that (i) the high similarity of GC levels of isochore families may be due to their composition being linked to chromatin structure; (ii) the increasing variability in isochore patterns from warm- to cold-blooded vertebrates and to invertebrates may be correlated with the increasing dependence from environmental factors that affect genome organization and functions; (iii) the distribution of genes seems to be dictated by the need of a certain genomic context, whose composition influences the transcriptional activity, and also the structure and function of the encoded proteins.
The sequences of the eukaryotic genomes as well as of the genes analyzed in this study were downloaded from different websites (see Additional File 5). Partial, putative, synthetic construct, predicted, not experimental, hypothetical protein, r-RNA, t-RNA, ribosomal and mitochondrial genes were eliminated and then the cleanup program  was applied for ridding nucleotide sequence databases of redundancies. For the remaining genes a script implemented by us was used in order to identify the coding sequences beginning with a start codon and ending with a stop codon. The coordinates of genes on the chromosomes were retrieved from the website used for downloading the chromosomes.
The entire chromosomal sequences of the finished genome assembly were partitioned into non-overlapping 100-kb windows, and their GC levels were calculated using the program draw_chromosome_gc.pl [14,15] (http://genomat.img.cas.cz; see Additional File 6).
The methodology used for isochore mapping was described by Costantini et al. . Briefly, isochores are defined by two sets of data, GC levels and their standard deviations. Indeed, when genome sequences are scanned using four window sizes ranging from 12.5 to 100 kb, the final choice of the window is determined by the plateau values reached by the standard deviations . Compositional jumps significantly larger than the standard deviations of isochore GC (1-2%) separate one isochore from the contiguous ones (see Additional File 7 for the coordinates, sizes, GC levels and GC standard deviations of the isochores identified in the analyzed genomes). When isochores are put in bins of 0.5 % or 1% GC, families appear as well defined peaks. In the case of the human genome, five isochore families were found which ranged in GC from 34% to 58% and decreased in relative amounts with increasing GC. These families were also detected by using four different approaches that are based, however, on our boundaries of isochore families [1,16].
In the case of Branchiostoma floridae, Strongylocentrotus purpuratus, Aedes aegypti, Tribolium castaneum and Daphnia pulex, the genomes have not yet been assembled and only the sequences of scaffolds or contigs (3032 scaffolds, 5477 scaffolds, 4758 supercontigs, 176 contigs, 5192 scaffolds, respectively) are available.
As far as the name of each isochore is concerned, we used here, as in previous work, a convention in which the first number in the name represents the chromosome number, the following two letters are the initials of the scientific (latin) name of the Eukaryotes under consideration, and the last number identifies the isochore.
Gb: (gigabases); GC: (molar fraction of guanine and cytosine in DNA); kb: (kilobases); Mb: (megabases).
The authors declare that they have no competing interests.
RC- Downloaded and analyzed the genome and the gene sequences of the Eukaryotes under analysis, performed the analysis on the dinucleotide frequencies and analyzed the data. MC- Conceived the research, identified and mapped the isochores on the analyzed chromosomes and drafted the manuscript. GB- Designed the research and wrote the paper. All contributed to the writing of the manuscript, and all read and approved the final manuscript.
Genome sizes. The figure shows the genome sizes of the invertebrates investigated.
Observed/expected frequencies for trinucleotides in C. intestinalis and C. elegans. Observed/expected frequencies for trinucleotides in 100-kb DNA segments in the isochore families from C. intestinalis and C. elegans.
Observed/expected frequencies for dinucleotides in human. Observed/expected frequencies for dinucleotides in 100-kb DNA segments in the human isochore families.
Isochore sizes. Size distributions of the isochores in multicellular eukaryotes. A vertical line at 3 Mb is reported as a reference.
Website and properties for genome and gene sequences. Genome websites, number of chromosomes, average GC, window sizes at which isochore borders are defined, gene websites and number of genes used to calculate gene density are reported for multicellular eukaryotes.
Overview of multicellular eukaryotic chromosomes. Compositional overview of multicellular eukaryotes The color-coded map shows 100 kb moving window plots using the program draw_chromosome_gc.pl [14,15]http://genomat.img.cas.cz. The color code spans the spectrum of GC levels in six steps, indicated by broken horizontal lines, from ultramarine blue (GC-poorest L1 isochores) to red (GC-richest H3 isochores).
Isochores in invertebrate genomes under analysis. Coordinates, sizes, GC levels and GC standard deviations of the isochores identified in the invertebrate genomes under analysis.
We thank Fabio Auletta and Giuseppe Torelli for their bioinformatic support.
The funding that has supported the work came from the authors Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.