The major function of exons to present mRNA and to code proteins was discovered around 40 years ago [1
]. During the last decade there was a breakthrough in understanding the function of introns [2
]. The intron sequences were once considered to be junk DNA [5
], however, people have recently realized that some of them may be functional [6
]. These DNAs may harbor a variety of elements that regulate transcription, e.g., untranslated RNAs [8
] and splicing control elements [9
]. Due to their functional properties, at least a fraction of intronic regions are likely to be evolving under the influence of natural selection, mostly purifying selection [7
]. In addition, the structural units, the length and the GC content of first exons and introns, are likely to associate with functional elements in large genomes of complex organisms [10
]. The varieties of functional elements in introns are revealed to associate with the function of adjacent exons [11
]. Therefore, the evolution of the exon-intron structure of eukaryotic genes becomes the emerged topic.
Recent studies have produced data that shed light on the pattern of intron properties, e.g., the variations of length, GC content, ordinal position in a gene (first intron, second intron, and so on) and divergence of intron sequences. Various factors are revealed to influence the intron size [6
]. For example, the insertion of transposable elements alters the size of introns [13
]. Similarly, the frequency and size of deletion events [14
] leads to changes in intron size and the presence of regulatory elements and RNA genes influences the length [15
]. Alternative splicing can also change intron/exon size [16
]. As a result, the factors controlling gene expression and regulation impose a selective constraint on intron size [17
]. Correlations among intron divergence, intron ordinal position and intron length were revealed, suggesting that the structure of introns may be under selection as well [7
]. However, the relationship between intron length and GC content appears to be complicated. Gazave et al. [7
] showed that there was a strong negative correlation among intron length and GC content and divergence in primates, whereas Haddrill et al. [18
] found that the class of long introns had higher GC content and lower divergence than that of short introns in fruit fly.
Different from introns, there is little data about the patterns of exon properties or variations of exon-intron architecture. Only a few studies included the basic statistical analyses, such as the distribution of exon length, the average number of exons per gene from eukaryotic model organisms [19
], and the chromosomal distributions of exons [20
]. Therefore, a systematic investigation of the properties of both exons and introns will provide a framework for understanding the mechanisms determining exon-intron architecture. The availability of multiple, complete eukaryotic genome sequences makes it possible to examine many fundamental evolutionary questions on the genome scale. Here, we performed an extensive analysis of relationships among length, ordinal position, GC content and divergence of both introns and exons on 13 eukaryotic genomes – six mammals, two plant species, two fish species, chicken, fruit fly and worm. We selected these genomic comparisons because they covered a wide range of eukaryotic species.
Our data revealed three consistent patterns, which present in almost all of the genomes we analyzed. Elucidation of these common patterns provides a basis for understanding the factors responsible for organization of the eukaryotic genomes, and for describing the exon-intron architectures.