An important feature of the density functions based on the null hypothesis is a prominent peak of very short intergenic distances predicted for each of the four genomes. Thus, very short intergenic distances are expected to dominate density profiles of observed intergenic distances even if gene locations were distributed uniformly along chromosomes. This fact is exemplified by the B. distachyon
and rice genomes in which short intergenic distances dominate the density profiles, yet genes are not clustered in those genomes. Observation of a prominent peak of short distances in an empirical density profile, such as that observed by Choulet et al. 
is not in itself sufficient to draw a conclusion regarding the existence of gene clustering.
Comparisons of the density functions based on the null hypothesis with the non-parametric density functions estimated from observed distances between neighboring genes divided the four genomes into two contrasting pairs. The first pair consisted of the small B. distachyon and rice genomes. In those genomes the nonparametric and exponential density functions did not significantly differ from each other. The second pair consisted of the larger sorghum and Ae. tauschii genomes, for which the two density functions differed significantly.
Variation in genome size in grasses is principally caused by the accumulation or loss of TEs. Genome expansion takes place principally by the accumulation of LTR retroelements, which have the tendency to insert into LTR retroelements resident in the chromosomes 
. Genome contraction takes place principally by DNA deletion caused by unequal homologous recombination between LTRs of the same or related LTR retrotransposons and by illegitimate recombination within TE and other non-essential DNA 
. Most of the observations that emerged here can be attributed to TE dynamics and the nonrandom insertion of LTR retroelements. In small, gene-dense genomes, such as those of B. distachyon
and rice, the overall retroelement content is lower, 21.4% in B. distachyon
and 26% in rice 
, than in the larger genomes, 54% in sorghum and at least 51% in the Ae. tauschii
. Genes are distributed uniformly and there is very little gene clustering in the small genomes. In the nine studied regions, LTR retrotransposons or remnants of LTR retrotransposons were present in only 12% of rice gene pairs (Table S2
) compared to 54% of Ae. tauschii
gene pairs (Table S1
). When comparing only orthologous gene pairs, 12% of the rice gene pairs but 61% of the Ae. tauschii
gene pairs were separated by sequences with homology to LTR retrotransposons (Table S6
). All of the Ae. tauschii
inter-insular spaces contained retrotransposons. This is consistent with the hypothesis that expansion of grass genomes, which takes place predominantly by the accumulation of LTR retroelements 
, occurs principally in regions already containing LTR retroelements. As a result of this nested insertion of retroelements, genes that are separated by LTR retroelements will be pushed further apart from each other during genome expansion, which will create large arrays of LTR retroelements characteristic of inter-insular space 
. Genes that are not separated by LTR retroelements will tend to remain near each other and form insulae. Due to the LTR retroelement dynamics, gene distribution is largely homogeneous in small grass genomes but assumes insular organization as a consequence of genome expansion. As illustrated by the comparison of orthologous regions in the sorghum and Ae. tauschii
genomes, insulae become less gene dense and separated by greater spans of nested TEs as a genome expands.
The distal chromosome regions are typically more gene-dense than the proximal chromosome regions in the Triticeae chromosomes. Regression analysis showed that gene number per insula and gene density within an insula were similar along Ae. tauschii
chromosomes but that inter-insular distances were shorter in distal, high-recombination regions compared to proximal, low-recombination regions. Therefore, the increase in gene density toward the distal regions of Ae. tauschii
is largely due to shortening of inter-insular distances.
Insulae in the distal gene-rich regions of wheat chromosome 3B were reported to contain larger numbers of genes than insulae in the proximal, gene-poor chromosome regions 
. This was not observed in the Ae. tauschii
genome. The two-fold larger threshold assumed to delimit intra- and inter-insular spaces in this study (81 kb) compared to the 3B study (43 kb), and the exclusion of pseudogenes from our analysis could have caused this difference. Our observation that insulae contain similar numbers of genes in distal chromosome regions compared to the less gene-dense proximal chromosome regions is paralleled by similar numbers of genes per insula in the genome of sorghum (3.7 genes per insula) compared to the less gene-dense genome of Ae. tauschii
(3.9 genes per insula).
The observation that the greater gene density in the distal regions of chromosome 3B is due to genes that are not syntenic with rice and B. distachyon
is probably unrelated to the insular dynamics. It is more likely a reflection of a high proportion of novel genes in those regions 
due to greater incidence of duplicated genes in the distal regions of wheat chromosomes and those of wheat diploid ancestors 
If the evolution of gene insulae was driven entirely by insertions of LTR retroelements, it would be counter-intuitive to expect conservation of insulae over long periods of time. This is particularly true if the remarkably high rate of turnover of intergenic spaces in grass genomes is taken into account. In the Triticeae genomes, for example, virtually the entire intergenic space equivalent to nearly 90% of the genome, is replaced within four to five million years 
. The four species studied here belong to grass subfamilies Pooideae (Ae. tauschii
and B. distachyon
), Ehrhartoideae (rice) and Panicoideae (sorghum). The Panicoideae subfamily diverged about 52.5 million years ago (MYA) from the ancestor of the Pooideae and Ehrhartoideae subfamilies 
. Within the subfamily Pooideae, Ae. tauschii
and B. distachyon
diverged 35.8 MYA and their common ancestor diverged from rice 47.3 MYA 
. In spite of the antiquity of these divergence times, comparison of the four genomes suggested that heterogeneity in gene distribution is conserved to some degree. Neighboring genes that are close to each other in one genome tend to be close to each other in other genomes and, vice versa
, neighboring genes that are far apart in one genome show a tendency to be far apart in other genomes. This observation remained true even when comparing intergenic distances in Ae. tauschii
with those in maize (Table S5
), a Panicoideae species with a haploid genome size of 2,500 Mb (P
The evolutionary conservation of gene distribution over long spans of time argues for a functional factor playing a role in the evolution of insulae, in addition to the process based on the dynamics of LTR retroelement insertions. It is possible that some gene pairs do not tolerate separation by LTR retroelements. Co-location of functionally related or co-expressed genes or genes encoding proteins in the same biochemical pathways was reported in a number of plants 
. Such functional constrains may represent another factor playing a role in the formation and conservation of insulae.