The field of comparative genomics is constantly enriched by the addition of newly sequenced genomes: by the end of 2010, about 1,300 bacterial and 150 eukaryotic genomes had been sequenced http://www.genomesonline.org
with various degrees of precision and coverage. In particular, there is a great interest in mammalian genomes, given their proximity to humans and, hence, their potential power for generating biomedically-relevant data. Identification of conserved elements has been a central focus of comparative analyses and the driving force behind initiatives such as the 'Multiple Mammalian Genomes for Comparative Annotation'
, initially including 24 mammalian species. The recent development of next-generation sequencing technologies [1
] allows the comparative genomics community to contemplate the possibility of incorporating high-coverage full genome sequences from many non-classical model organisms for a better understanding of how biological diversity and complexity evolved. For example, the 'Evolution of the Human Proteome'
initiative aims at sequencing the genome of nine additional chordate species to complete the coverage of major lineages of the chordate phylogeny and uncover the genomic changes that correlate with key morphological and physiological transitions http://www.genome.gov/25521740
Among under-represented groups, in terms of genome sequence data, are the major lineages of Sauropsida, which diverged 200 to 280 million years ago: Testudines (turtles), Lepidosauria (the tuatara, lizards, and snakes) and Archosauria (crocodiles and birds). Even if we exclude the 10,000 extant species of birds, Sauropsida still includes over 8,000 species (compared to 5,400 species of mammals) that display a remarkable range of life histories, sex-determining systems, reproductive modes, physiologies, and body plans [4
]. For example, in squamates, limb reduction has evolved independently at least 25 times [5
], and viviparity at least 100 times [6
less than 15 times each in bony fishes, cartilaginous fishes, and amphibians, once in mammals, and never in birds); shifts between genetic and temperature-dependent sex determination have occurred multiple times as well [7
]; and some lizards even exhibit ovulation of tiny eggs and placental nutrition of embryos [8
]. Hence, comparative genomic analyses incorporating reptilian genomes promise to uncover evolutionary novelties more diverse in many respects than those revealed by genomic comparisons among mammals. Furthermore, non-avian reptilian genomes would greatly improve the comparison between mammals and birds by incorporating major missing nodes between these two lineages [9
]. Thus far, only the genome of the green anole lizard (Anolis carolinensis
) and a handful of birds (the chicken, Gallus gallus
; the zebra finch, Taeniopygia guttata
; the duck, Anas platyrhynchos
; and the turkey, Meleagris gallopavo
) have been fully sequenced. Model reptilian species, whose genome should be sequenced in priority, need to be chosen pragmatically [11
] by incorporating criteria such as phylogenetic position, nature of the ancestral/derived states of key morphological/physiological characters, level of diversity within the corresponding higher taxon, ease with which the species can be handled, housed and bred, and protection status.
Even if next-generation methods make the sequencing of a complex genome possible in a matter of weeks, such a project remains very costly and requires much additional time for assembly and annotation. For species that are considerably divergent from existing high-quality genomes, gene identification and annotation greatly benefits from transcriptome data. Again, next-generation sequencing will probably become the method of choice for generating high-quality transcriptome data and supplant other methods such as serial analysis of gene expression (SAGE), sequencing of expressed sequence tags (ESTs), substractive hybridization, differential display, and even microarrays (at least for non-model species). Indeed, next-generation sequencing of transcriptomes has recently proven to be highly valuable for producing functional genome sequences, as well as gene polymorphism and expression data [13
]. In addition, software has been developed for handling the massive amount of sequence data and for de novo
assembling of contigs without the need of reference genomes [18
Besides large-scale EST libraries available for several organs of the anole lizard (including a brain library, dbEST library #23338, yet to be analyzed), reptilian transcriptomes so far are quite limited: a few snake venom-gland partial transcriptomes (each consisting of 600 to 1,000 ESTs generally clustering into about 300 unique sequences [20
]), a heart transcriptome of the Burmese python consisting of about 2,800 mRNAs [23
], 3,064 assembled unique sequences of Alligator missipiensis
analyzed for their GC-content [24
], and 833 assembled unique sequences available for the red-eared slider turtle, with a few related to brain development [25
]. A notable very recent exception is a garter snake large-scale multi-individual and multi-organ transcriptome [27
], which identified about 13,000 snake genes on the basis of homology assignment with other vertebrates, as well as thousands of transcripts of unidentified protein-coding genes.
Here, we used 454 technology for sequencing brain transcriptomes in four reptilian and one avian species: (i) the Nile crocodile (Crocodylus niloticus
), whose development has recently been described [28
], (ii) the oviparous Corn snake (Elaphe guttata
), as a better alternative (in the Evo-Devo context [11
]) to the viviparous common garter snake (Thamnophis sirtalis
), (iii) the Bearded dragon (Pogona vitticeps
), a lizard of the Agamidae family that diverged approximately 150 mya from the Iguanidae [29
] to which Anolis
belongs, (iv) the red-eared turtle (Trachemys scripta
), and (v) the chicken (G. gallus
) as a reference for the performed analyses. We chose to focus on the brain for one primary reason: it exhibits one of the most complex (that is
, diverse) transcriptomes of all organs in vertebrates [30
]; hence, it is a tissue of choice for sequencing a maximum number of transcripts while reducing the need for normalization. Note also that reptilian species have been incorporated in comparative analyses of the vertebrate brain [32
] aimed at understanding the evolution of the sensory and cognitive novelties associated with the vertebrate central nervous system [33
], a topic beyond the scope of the present paper.
We generated over 3,000,000 reads which were fed into an automated and publicly-available pipeline, 'LANE runner
', that performs iterative BLAST searches and consensus assemblies. A total of 20 to over 31 thousand genes were identified per species, including transcripts that might be lineage specific. This new reptilian comparative transcriptomics dataset (available at http://www.reptilian-transcriptomes.org
) should prove a useful resource as reptiles are becoming important new models for comparative genomics (for example
]), ecology (for example
]), and evolutionary developmental genetics (for example
]). We also identify thousands of both microsatellite loci and SNPs which can be used in quantitative and population genetic analyses. Finally, we built the longest (2,012,759 amino acids (aa)) reptilian multiple alignment of homologous sequences to date (found in all five lineages of Sauropsida, three mammals, and two outgroup taxa) and performed extensive phylogenetic analyses for investigating the long-standing question of the turtle lineage position within the phylogeny of Amniotes. Although phylogenetic results must be taken with caution, as sequencing errors in low coverage transcriptomes could generate artifacts during phylogeny inference, maximum likelihood analyses of a large dataset (about 250 thousand characters per species) void of paralogs hint at archosaurian affinities of Testudines.