Comparative DNA sequence data are needed from large genomes to better understand structural and functional features that influence genome size evolution. This study demonstrates that DNA sequence data can be sampled efficiently from the large genome of the Mexican axolotl using 454 DNA sequencing. It was possible to assemble
de novo short-DNA sequence reads (50–300 bp) from shot gun sequenced BACs into complete contigs, and then use this information to reveal the structure of genic regions of the genome. The results show that axolotl genic regions encode novel genes and make a significant contribution to genome size. In particular, axolotl introns are 5–10× longer than introns in other vertebrates and this maybe typical of salamander genomes [
28].
Many different ideas have been proposed to explain genome size variation among organisms. The simplest explanation is a change in the ratio of DNA that codes for proteins versus non-protein coding DNA [
29]. Although variation in gene number maybe important, this distinction is too simple because non-protein coding DNA has been shown in recent years to encode a diversity of functional elements. For example, protein-coding sequences (exons) are associated with introns that encode a diversity of regulatory elements and non-coding RNAs that affect transcription, translation, and chromatin structure [
30-
32]. In order to understand the relationship between genome size and regulatory complexity, it is therefore critical to consider the proportion of DNA that resides in transcribed (genic) versus non-transcribed (i.e. intergenic) DNA. Changes in genome size that occur over relatively short evolutionary timeframes may not result in a correlated expansion of genic regions (i.e. introns), presumably due to greater evolutionary consrtaint [
33,
34]. However, positive correlations are observed between genome size and the number and length of introns at a broader evolutionary scale [
35-
37]. Correlations observed at this broader scale are presumably the outcome of drift and selection as population sizes and functional constraints fluctuate over millions of generations [
37]. Salamanders are particularly interesting in this regard because they present a situation wherein large genomes are the rule rather than the exception. Very large genomes have likely been maintained within this group at least since the divergence of the ancestral salamander lineage >160 million years ago [
38,
39]. Thus, salamanders can provide novel insight into the evolutionary potential of vertebrate genomes over deep, evolutionary time.
At this point we can only speculate about the reasons why large introns evolved in
A. mexicanum. In general, introns tend to be longer in genes that have tissue specific or developmentally relevant functions, than introns in house keeping or widely expressed genes [
40-
42]. This pattern may reflect evolution of complex transcriptional regulatory mechanisms [
43-
46]. It is possible that salamanders maintain large introns in-part because they encode information necessary to accomplish unique developmental processes. In particular, salamanders are capable of complex tissue regeneration, and a single genome can express both a metamorphic and paedomorphic outcome [
47,
48]. These processes involve transcriptional activation and silencing of thousands of genes that may depend upon transcriptional binding sites and ncRNAs within introns. That large salamander introns might have a functional role is supported by the absence of shared repetitive sequences among introns and the prediction of numerous miRNA and snoRNA genes in axolotl introns. It is also possible that long introns indirectly moderate cellular and developmental processes by influencing transcription and mitotic rates [
49,
50]. We note that the predicted repetitive DNAs and ncRNAs only account for a small proportion of total intron size. Characterization of additional axolotl genes, and in particular genes that function in regeneration and metamorphosis, will help optimize searches for other functional and structural elements (e.g. matrix attachment sites or unknown functional classes) that are associated with large intron size, including "junk" DNA.