Selection of libraries for EST sequencing
Eleven cDNA libraries were constructed using a variety of tissues (Table ). Pilot sequencing of randomly selected clones revealed that the majority of the non-normalized libraries were moderate to highly redundant for relatively few transcripts. For example, hemoglobin-like transcripts represented 15–25% of the sampled clones from cDNA libraries V1, V2, and V6. Accordingly, we chose to focus our sequencing efforts on the non-normalized MATH library as well as the normalized AG library, which had lower levels of redundancy (5.5 and 0.25% globins, respectively). By concentrating our sequencing efforts on these two libraries we obtained transcripts deriving primarily from regenerating larval tissues in A. mexicanum and several non-regenerating larval tissues in A. t. tigrinum.
Tissues selected to make cDNA libraries.
EST sequencing and clustering
A total of 46,064 cDNA clones were sequenced, yielding 39,982 high quality sequences for A. mexicanum
and A. t. tigrinum
(Table ). Of these, 3,745 corresponded to mtDNA and were removed from the dataset; complete mtDNA genome data for these and other ambystomatid species will be reported elsewhere. The remaining nuclear ESTs for each species were clustered and assembled separately. We included in our A. mexicanum
assembly an additional 16,030 high quality ESTs that were generated recently for regenerating tail and neurula stage embryos [20
]. Thus, a total of 32,891 and 19,376 ESTs were clustered for A. mexicanum
and A. t. tigrinum
, respectively. Using PaCE clustering and CAP3 assembly, a similar number of EST clusters and contigs were identified for each species (Table ). Overall contig totals were 11,190 and 9,901 for A. mexicanum
and A. t. tigrinum
respectively. Thus, although 13,515 more A. mexicanum
ESTs were assembled, a roughly equivalent number of contigs were obtained for both species. This indicates that EST development was more efficient for A. t. tigrinum,
presumably because ESTs were obtained primarily from the normalized AG library; indeed, there were approximately twice as many ESTs on average per A. mexicanum
contig (Table ). Thus, our EST project yielded an approximately equivalent number of contigs for A. mexicanum
and A. t. tigrinum
, and overall we identified > 21,000 different contigs. Assuming that 20% of the contigs correspond to redundant loci, which has been found generally in large EST projects [21
], we identified transcripts for approximately 17,000 different ambystomatid loci. If ambystomatid salamanders have approximately the same number of loci as other vertebrates (e.g. [22
]), we have isolated roughly half the expected number of genes in the genome.
EST summary and assembly results.
Identification of vertebrate sequences similar to Ambystoma contigs
We searched all contigs against several vertebrate databases to identify sequences that exhibited significant sequence similarity. As our objective was to reliably annotate as many contigs as possible, we first searched against 19,804 sequences in the NCBI human RefSeq database (Figure ), which is actively reviewed and curated by biologists. This search revealed 5619 and 4973 "best hit" matches for the A. mexicanum and A. t. tigrinum EST datasets at a BLASTX threshold of E = 10-7. The majority of contigs were supported at more stringent E-value thresholds (Table ). Non-matching contigs were subsequently searched against the Non-Redundant (nr) Protein database and Xenopus tropicalus and X. laevis UNIGENE ESTs (Figure ). These later two searches yielded a few hundred more 'best hit' matches, however a relatively large number of ESTs from both ambystomatid species were not similar to any sequences from the databases above. Presumably, these non-matching sequences were obtained from the non-coding regions of transcripts or they contain protein-coding sequences that are novel to salamander. Although the majority are probably of the former type, we did identify 3,273 sequences from the non-matching set that had open reading frames (ORFs) of at least 200 bp, and 911 of these were greater than 300 bp.
Results of BLASTX and TBLASTX searches to identify best BLAST hits for Ambystoma contigs searched against NCBI human RefSeq, nr, and Xenopus Unigene databases.
Ambystoma contig search of NCBI human RefSeq, nr, and Xenopus Unigene databases.
The distribution of ESTs among contigs can provide perspective on gene expression when clones are randomly sequenced from non-normalized cDNA libraries. In general, frequently sampled transcripts may be expressed at higher levels. We identified the 20 contigs from A. mexicanum and A. t. tigrinum that contained the most assembled ESTs (Table ). The largest A. t. tigrinum contigs contained fewer ESTs than the largest A. mexicanum contigs, probably because fewer overall A. t. tigrinum clones were sequenced, with the majority selected from a normalized library. However, we note that the contig with the most ESTs was identified for A. t. tigrinum: delta globin. In both species, transcripts corresponding to globin genes were sampled more frequently than all other loci. This may reflect the fact that amphibians, unlike mammals, have nucleated red blood cells that are transcriptionally active. In addition to globin transcripts, a few other house-keeping genes were identified in common from both species, however the majority of the contigs were unique to each list. Overall, the strategy of sequencing cDNAs from a diverse collection of tissues (from normalized and non-normalized libraries) yielded different sets of highly redundant contigs. Only 25% and 28% of the A. mexicanum and A. t. tigrinum contigs, respectively, were identified in common (Figure ). We also note that several hundred contigs were identified in common between Xenopus and Ambystoma; this will help facilitate comparative studies among these amphibian models.
Top 20 contigs with the most assembled ESTs.
Figure 2 Venn diagram of BLAST comparisons among amphibian EST projects. Values provided are numbers of reciprocal best BLAST hits (E<10-20) among quality masked A. mexicanum and A. t. tigrinum assemblies and a publicly available X. tropicalis EST assembly (more ...)
For the 10,592 contigs that showed significant similarity to sequences from the human RefSeq database, we obtained Gene Ontology (23) information to describe ESTs in functional terms. Although there are hundreds of possible annotations, we chose a list of descriptors for molecular and biological processes that we believe are of interest for research programs currently utilizing salamanders as model organisms (Table ). In all searches, we counted each match between a contig and a RefSeq sequence as identifying a different ambystomatid gene, even when different contigs matched the same RefSeq reference. In almost all cases, approximately the same number of matches was found per functional descriptor for both species. This was not simply because the same loci were being identified for both species, as only 20% of the total number of searched contigs shared sufficient identity (BLASTN; E<10-80 or E<10-20) to be potential homologues. In this sense, the sequencing effort between these two species was complementary in yielding a more diverse collection of ESTs that were highly similar to human gene sequences.
Functional annotation of contigs
Informatic searches for regeneration probes
The value of a salamander model to regeneration research will ultimately rest on the ease in which data and results can be cross-referenced to other vertebrate models. For example, differences in the ability of mammals and salamanders to regenerate spinal cord may reflect differences in the way cells of the ependymal layer respond to injury. As is observed in salamanders, ependymal cells in adult mammals also proliferate and differentiate after spinal cord injury (SCI) [24
]; immediately after contusion injury in adult rat, ependymal cell numbers increase and proliferation continues for at least 4 days [[26
]; but see [27
]]. Rat ependymal cells share some of the same gene expression and protein properties of embryonic stem cells [28
], however no new neurons have been observed to derive from these cells in vivo
after SCI [29
]. Thus, although endogenous neural progenitors of the ependymal layer may have latent regenerative potential in adult mammals, this potential is not realized. Several recently completed microarray analyses of spinal cord injury in rat now make it possible to cross-reference information between amphibians and mammals. For example, we searched the complete list of significantly up and down regulated genes from Carmel et al. [30
] and Song et al. [31
] against all Ambystoma
ESTs. Based upon amino acid sequence similarity of translated ESTs (TBLASTX; E
), we identified DNA sequences corresponding to 69 of these 164 SCI rat genes (Table ). It is likely that we have sequence corresponding to other presumptive orthologues from this list as many of our ESTs only contain a portion of the coding sequence or the untranslated regions (UTR), and in many cases our searches identified closely related gene family members. Thus, many of the genes that show interesting expression patterns after SCI in rat can now be examined in salamander.
Ambystoma contigs that show sequence similarity to rat spinal cord injury genes.
Similar gene expression programs may underlie regeneration of vertebrate appendages such as fish fins and tetrapod limbs. Regeneration could depend on reiterative expression of genes that function in patterning, morphogenesis, and metabolism during normal development and homeostasis. Or, regeneration could depend in part on novel genes that function exclusively in this process. We investigated these alternatives by searching A. mexicanum limb regeneration ESTs against UNIGENE zebrafish fin regeneration ESTs (Figure ). This search identified 1357 significant BLAST hits (TBLASTX; E<10-7) that corresponded to 1058 unique zebrafish ESTs. We then asked whether any of these potential regeneration homologues were represented uniquely in limb and fin regeneration databases (and not in databases derived from other zebrafish tissues). A search of the 1058 zebrafish ESTs against > 400,000 zebrafish ESTs that were sampled from non-regenerating tissues revealed 43 that were unique to the zebrafish regeneration database (Table ). Conceivably, these 43 ESTs may represent transcripts important to appendage regeneration. For example, our search identified several genes (e.g. hspc128, pre-B-cell colony enhancing factor 1, galectin 4, galectin 8) that may be expressed in progenitor cells that proliferate and differentiate during appendage regeneration. Overall, our results suggest that regeneration is achieved largely through the reiterative expression of genes having additional functions in other developmental contexts, however a small number of genes may be expressed uniquely during appendage regeneration.
Results of BLASTN and TBLASTX searches to identify best BLAST hits for A. mexicanum regeneration ESTs searched against zebrafish EST databases. A total of 14,961 A. mexicanum limb regeneration ESTs were assembled into 4485 contigs for this search.
Ambystoma limb regeneration contigs that show sequence similarity to zebrafish fin regeneration ESTs
DNA sequence polymorphisms within and between A. mexicanum and A. t. tigrinum
The identification of single nucleotide polymorphisms (SNPs) within and between orthologous sequences of A. mexicanum
and A. t. tigrinum
is needed to develop DNA markers for genome mapping [32
], quantitative genetic analysis [33
], and population genetics [34
]. We estimated within species polymorphism for both species by calculating the frequency of SNPs among ESTs within the 20 largest contigs (Table ). These analyses considered a total of 30,638 base positions for A. mexicanum
and 18,765 base positions for A. t. tigrinum
. Two classes of polymorphism were considered in this analysis: those occurring at moderate (identified in 10–30% of the EST sequences) and high frequencies (identified in at least 30% of the EST sequences). Within the A. mexicanum
contigs, 0.49% and 0.06% of positions were polymorphic at moderate and high frequency, while higher levels of polymorphism were observed for A. t. tigrinum
(1.41% and 0.20%). Higher levels of polymorphism are expected for A. t. tigrinum
because they exist in larger, out-bred populations in nature.
To identify SNPs between species, we had to first identify presumptive, interspecific orthologues. We did this by performing BLASTN searches between the A. mexicanum
and A. t. tigrinum
assemblies, and the resulting alignments were filtered to retain only those alignments between sequences that were one another's reciprocal best BLAST hit. As expected, the number of reciprocal 'best hits' varied depending upon the E
value threshold, although increasing the E
threshold by several orders of magnitude had a disproportionately small effect on the overall total length of BLAST alignments. A threshold of E<
yielded 2414 alignments encompassing a total of 1.25 Mbp from each species, whereas a threshold of E<
yielded 2820 alignments encompassing a total of 1.32 Mbp. The percent sequence identity of alignments was very high among presumptive orthologues, ranging from 84–100% at the more stringent E
threshold of E<
. On average, A. mexicanum
and A. t. tigrinum
transcripts are estimated to be 97% identical at the nucleotide level, including both protein coding and UTR sequence. This estimate for nuclear sequence identity is surprisingly similar to estimates obtained from complete mtDNA reference sequences for these species (96%, unpublished data), and to estimates for partial mtDNA sequence data obtained from multiple natural populations [16
]. These results are consistent with the idea that mitochondrial mutation rates are lower in cold versus warm-blooded vertebrates [35
]. From a resource perspective, the high level of sequence identity observed between these species suggests that informatics will enable rapidly the development of probes between these and other species of the A. tigrinum
Extending EST resources to other ambystomatid species
Relatively little DNA sequence has been obtained from species that are closely related to commonly used model organisms, and yet, such extensions would greatly facilitate genetic studies of natural phenotypes, population structures, species boundaries, and conservatism and divergence of developmental mechanisms. Like many amphibian species that are threatened by extinction, many of these ambystomatid salamanders are currently in need of population genetic studies to inform conservation and management strategies [e.g. [13
]]. We characterized SNPs from orthologous A. mexicanum
and A. t. tigrinum
ESTs and extended this information to develop informative molecular markers for a related species, A. ordinarium
. Ambystoma ordinarium
is a stream dwelling paedomorph endemic to high elevation habitats in central Mexico [36
]. This species is particularly interesting from an ecological and evolutionary standpoint because it harbors a high level of intraspecific mitochondrial variation, and as an independently derived stream paedomorph, is unique among the typically pond-breeding tiger salamanders. As a reference of molecular divergence, Ambystoma ordinarium
shares approximately 98 and 97% mtDNA sequence identity with A. mexicanum
and A. t. tigrinum
To identify informative markers for A. ordinarium, A. mexicanum and A. t. tigrinum EST contigs were aligned to identify orthologous genes with species-specific sequence variations (SNPs or Insertion/Deletions = INDELs). Primer pairs corresponding to 123 ESTs (Table ) were screened by PCR using a pool of DNA template made from individuals of 10 A. ordinarium populations. Seventy-nine percent (N = 97) of the primer pairs yielded amplification products that were approximately the same size as corresponding A. mexicanum and A. t. tigrinum fragments, using only a single set of PCR conditions. To estimate the frequency of intraspecific DNA sequence polymorphism among this set of DNA marker loci, 43 loci were sequenced using a single individual sampled randomly from each of the 10 populations, which span the geographic range of A. ordinarium. At least one polymorphic site was observed for 20 of the sequenced loci, with the frequency of polymorphisms dependent upon the size of the DNA fragment amplified. Our results suggest that the vast majority of primer sets designed for A. mexicanum / A. t. tigrinum EST orthologues can be used to amplify the corresponding sequence in a related A. tigrinum complex species, and for small DNA fragments in the range of 150–500 bp, approximately half are expected to have informative polymorphisms.
EST loci used in a population-level PCR amplification screen in A. ordinarium
Comparative gene mapping
Salamanders occupy a pivotal phylogenetic position for reconstructing the ancestral tetrapod genome structure and for providing perspective on the extremely derived anuran Xenopus (37) that is currently providing the bulk of amphibian genome information. Here we show the utility of ambystomatid ESTs for identifying chromosomal regions that are conserved between salamanders and other vertebrates. A region of conserved synteny that corresponds to human chromosome (Hsa) 17q has been identified in several non-mammalian taxa including reptiles (38) and fishes (39). In a previous study Voss et al. (40) identified a region of conserved synteny between Ambystoma and Hsa 17q that included collagen type 1 alpha 1 (Col1a1), thyroid hormone receptor alpha (Thra), homeo box b13 (Hoxb13), and distal-less 3 (Dlx3) (Figure ). To evaluate both the technical feasibility of mapping ESTs and the likelihood that presumptive orthologues map to the same synteny group, we searched our assemblies for presumptive Hsa 17 orthologues and then developed a subset of these loci for genetic linkage mapping. Using a joint assembly of A. mexicanum and A. t. tigrinum contigs, 97 Hsa 17 presumptive orthologues were identified. We chose 15 genes from this list and designed PCR primers to amplify a short DNA fragment containing 1 or more presumptive SNPs that were identified in the joint assembly (Table ). All but two of these genes were mapped, indicating a high probability of mapping success using markers developed from the joint assembly of A. mexicanum and A. t. tigrinum contigs. All 6 ESTs that exhibited 'best hits' to loci within the previously defined human-Ambystoma synteny group did map to this region (Hspc009, Sui1, Krt17, Krt24, Flj13855, and Rpl19). Our results show that BLAST-based definitions of orthology are informative between salamanders and human. All other presumptive Hsa 17 loci mapped to Ambystoma chromosomal regions outside of the previously defined synteny group. It is interesting to note that two of these loci mapped to the same ambystomatid linkage group (Cgi-125, Flj20345), but in human the presumptive orthologues are 50 Mb apart and distantly flank the syntenic loci in Figure . Assuming orthology has been assigned correctly for these loci, this suggests a dynamic history for some Hsa 17 orthologues during vertebrate evolution.
Comparison of gene order between Ambystoma linkage group 1 and an 11 Mb region of Hsa17 (37.7 Mb to 48.7 Mb). Lines connect the positions of putatively orthologous genes.
Presumptive human chromosome 17 loci that were mapped in Ambystoma
Ambystomatid salamanders are classic model organisms that continue to inform biological research in a variety of areas. Their future importance in regenerative biology and metamorphosis will almost certainly escalate as genome resources and other molecular and cellular approaches become widely available. Among the genomic resources currently under development (see [41
]) are a comparative genome map, which will allow mapping of candidate genes, QTL, and comparative anchors for cross-referencing the salamander genome to fully sequenced vertebrate models. In closing, we reiterate a second benefit to resource development in Ambystoma
. Genome resources in Ambystoma
can be extended to multiple, closely related species to explore the molecular basis of natural, phenotypic variation. Such extensions can better inform our understanding of ambystomatid biodiversity in nature and draw attention to the need for conserving such naturalistic systems. Several paedomorphic species, including A. mexicanum
, are on the brink of extinction. We can think of no better investment than one that simultaneously enhances research in all areas of biology and draws attention to the conservation needs of model organisms in their natural habitats.