Non-coding DNA makes up a large proportion of the genomes of most eukaryotes, yet little is known about its functional significance and the forces affecting its evolution. The identification of functional regions of the genome has tended to concentrate on coding DNA, yet the recent shift in focus towards non-coding DNA has revealed that introns and intergenic sequences may be subject to considerable levels of selective constraint, implying that they contain functional elements [1
]. No consistent patterns have emerged from the relatively few studies that have thus far investigated levels of constraint on intron DNA sequences; some studies conclude that such DNA is evolving under little or no selective constraint, while others find considerable levels of constraint (for examples, see [3
]). Moreover, the mode of evolution for such types of sequence is still unclear.
Several recent studies have attempted to estimate the proportion of sites within introns that is subject to selective constraint. For example, Jareborg et al.
] estimate that 23% of intronic sites in mouse-rat genome comparisons are evolutionarily conserved. Similarly, Shabalina and Kondrashov [12
] estimate (conservatively) that 17% of nucleotide sites within introns are selectively constrained between Caenorhabditis elegans
and Caenorhabditis briggsae
; this was at least in part due to their function in splicing, because constraint appeared to be higher at the edges of introns. Likewise, Bergman and Kreitman [3
] estimate that 22-26% of non-coding sequences (intergenic and intronic) are highly constrained between Drosophila melanogaster
and Drosophila virilis
. In contrast to these studies, Halligan et al.
] found that most intronic sites (excluding those necessary for correct splicing) in Drosophila
were evolving approximately 17% faster than fourfold synonymous sites. They concluded that these sites were effectively evolving free from selective constraint. The discrepancies among previous studies suggest that no clear conclusions can yet be drawn regarding the levels of selective constraint in non-coding intronic DNA.
Intron size is one possible factor that may explain these conflicting results. Comeron and Kreitman [13
] and others have noted an asymmetrical distribution of intron lengths in D. melanogaster
; a large number of short introns clustered around a minimal intron length and a broader distribution of longer introns (median intron size of 86 base-pairs (bp), mean intron size of 1411 bp; [14
]). Based on multi-species data for 15 introns (13 short and 2 long), Parsch [15
] showed that there were significantly fewer substitutions per site in the two longer introns. He suggested that this pattern may be due to the presence of a greater number of regulatory elements that are subject to purifying selection in longer introns.
If regulatory elements occur frequently in introns, and these are of some minimal size, it follows that size may be an important factor in intron evolution. In agreement with this prediction, Marais et al.
] noted a marginally significant (P
= 0.03) negative correlation between intron divergence and size for first introns (but not other introns) in the dataset of Halligan et al.
]. Marais et al.
] suggested that this correlation between divergence and length may be expected for first introns because they are on average two times longer than other introns [17
] and also tend to contain more known regulatory elements, at least in mammals [8
]. Because the dataset used consisted mostly of short introns, it is unclear whether the pattern they observed is specific to first introns (due to an association between first introns and regulatory elements) and whether the relationship between divergence and size is primarily driven by the fact that first introns are longer. Here we revisit the relationship between intron length and evolutionary constraint (as measured by levels of divergence between D. melanogaster
and D. simulans
) by combining published data for 225 intron fragments sampled from a much broader distribution of intron lengths and positions within genes.