|Home | About | Journals | Submit | Contact Us | Français|
Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish.
We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3%) had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5), of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds.
BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.
Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. Such an approach was initially demonstrated by Fujiyama et al.  for the construction of the human-chimpanzee comparative map. In these closely related primate species, approximately 98% of chimpanzee BAC end sequences (BES) had significant BLAST hits to the human genome sequence allowing putative orthologues to be identified . A similar approach was used for the construction of the human-mouse comparative map . Subsequently, this approach was extensively used in mammals including construction of the human-cattle, the human-horse, and the human-porcine comparative maps [3-5]. Most recently, this approach was utilized one step further for the construction of the comparative genome contig (CGC)-based physical map of the sheep genome , where CGC is established based on anchorage of the sheep BES onto the genome sequences of dog, cow, and human. These successes depended on high percentage of BLAST hits and/or high levels of genome collinearity.
Five teleost fish genomes have been fully sequenced http://www.ensembl.org/index.html including zebrafish (Danio rerio, from the order of Cypriniformes), Japanese pufferfish (Fugu rubripes, from the order of Tetraodontiformes), green spotted pufferfish (Tetraodon nigroviridis, from the order of Tetraodontiformes), medaka (Oryzias latipes, from the order of Beloniformes), and three-spined stickleback (Gasterosteus aculeatus, from the order of Gasterosteiformes), while whole genome sequencing is also underway for tilapia http://www.cichidgenome.org; http://www.broad.mit.edu/science/projects/mammals-models/vertebrates-invertebrates/tilapia/tilapia-genome-sequencing-project. The availability of these whole genome sequences lends great opportunities for comparative genome analysis. Recently, major genomic resources have been developed from a number of fish species such as Atlantic salmon (Salmo salar) [7-9], rainbow trout (Oncorhynchus mykiss) [10,11], tilapia [12,13], gilthead sea bream (Sparus auratus) [14-17], European sea bass (Dicentrarchus labrax) [18,19], and channel catfish (Ictalurus punctatus) (for a review, see [20,21]).
Catfish is the major aquaculture species in the United States. It is one of the six species included in the U.S. National Animal Genome Project NRSP-8. A number of genome resources have been developed in catfish including a large number of molecular markers [22-25], genetic linkage maps [26-28], several hundred thousands of ESTs ([29-33]; Z. Liu, unpublished data), microarray platforms [34-38], BAC libraries [39,40], and BAC-based physical maps [41,42]. To enable BAC end sequence-based comparative genome analysis, we previously reported generation of 20,366 BES in catfish . In spite of the great value of those BES for the characterization of genome repeat structures  and for the identification of microsatellite markers, our previous comparative genome analysis using BES revealed very limited conservation between the catfish and zebrafish genomes. Of the 141 mate-paired BES with genes on both ends of the BAC inserts, only 34 (24.1%) were found in nearby genomic locations in the zebrafish genome, suggesting high levels of chromosomal rearrangements . Such findings were in strong contrast to the situations found between medaka-sea bream, Tetraodon-sea bream, medaka-stickleback, Tetraodon-medaka, stickleback-sea bream, Tetraodon-stickleback genome comparisons where almost complete genome collinearities were found . We speculated that our earlier inability to discover large extent of genome collinearity between catfish and zebrafish could be a result of the low numbers of BES and the lack of a physical map. Therefore, in this study, we extended our efforts in BAC end sequencing and generated additional 43,021 BES, bringing the total to 63,387 (25,676 mate-paired). Using these catfish BES and its BAC contig-based physical map , genetic linkage mapping of BAC end-anchored microsatellites, and the genome sequence of zebrafish, here we conducted extensive comparative genome analysis. We report the identification of conserved syntenies and demonstrate the construction of super scaffolds of contigs by genetic linkage mapping of BAC end-associated microsatellites.
As shown in Table Table1,1, a total of 42,240 BAC inserts (6.13× clone-coverage of the channel catfish genome) were sequenced from both ends, resulting in 63,387 BES ≥ 200 bp in length (75% overall success rate), including 20,366 BES we previously reported . Mate-paired BES were produced from 25,676 BAC clones, while only a single BES was obtained from 12,035 clones. The BES were of high quality as the Q20 length ranged from 200 to 810 bp, with an average Q20 read length of 596 bp. All these BES have been deposited into the GenBank GSS database with consecutive accession numbers of [GenBank:FI857756-FI900776]. A total of 37,784,877 bp of genomic sequences was generated from this study, representing approximately 4% of the catfish genome. Analysis using the 37,784,877 bp BES resulted in 11.91% of base pairs masked using the Danio repeat database, with the most abundant type of repeat being the DNA transposons. We previously reported the assessment of repetitive elements in the catfish genome and the additional 43,021 BES generated in this study confirmed our previous findings in general . These BES [GenBank:DX083364-DX103729] were also used for comparative genome analysis in this study.
TBLASTX searches using the 63,387 catfish BES against the ENSEMBL zebrafish cDNA database with chromosome information resulted in 5,066 significant hits (Table (Table2).2). Of the 5,066 significant hits, 2,197 unique zebrafish genes were hit by a single BES while 1,024 unique zebrafish genes were hit by two or more catfish BES, making a total of 3,221 unique zebrafish genes with significant hits from the catfish BES. The 3,221 genes cover all 25 zebrafish chromosomes, with the largest number of gene hits being located on chromosome 5 (224 significant hits), followed by chromosome 7 (191 significant hits), chromosome 20 (171 significant hits), chromosome 6 (151 significant hits) and chromosome 19 (134 significant hits); and the smallest number of gene hits on chromosome 24 with 78 hits (Table (Table2).2). The number of gene hits on various chromosomes was approximately proportional to the sizes of the zebrafish chromosomes with some exceptions. When the size of chromosomes was taken into consideration, chromosome 25 had the largest number of gene hits with 3.5 hits per Mb or one hit per 286 kb on average, followed by chromosome 5, 4, 20, 19, and 22 with 3.2, 3.1, 3.0, 2.9, and 2.9 hits per Mb, respectively (Table (Table22).
One particular finding of these BLAST searches is the observation of many highly repetitive genes. Out of 3,221 unique genes, 1,024 genes had hits from two or more BES. A single gene identity had hits from as many as 31 BES. A total of 14 genes had hits from at least 10 BES each (Table (Table3);3); an additional 139 genes had hits from 4-9 BES each; 230 genes had hits from 3 BES each, and 641 genes had hits from 2 BES each (Table (Table3).3). Some of the genes with hits from multiple BES may represent a whole array of related genes with similar functional domains. For instance, 18 BES hit NOD3-like gene of channel catfish, which was just recently characterized; NOD3 gene existed as a single copy gene in the catfish genome , and apparently the multiple BES contained many related genes harboring domains present within the NOD3 gene. Theoretically, a fraction of genes should have hits by more than one BES, simply because of the genome coverage of the BAC clones. We believe that overlapping (including identical) BAC clones does account for some of the observed hits of genes by more than one BES (data not shown), especially for those with 2-3 BES hits. However, the mathematical chances do not support multiple BES hits of a single gene unless the gene itself is repetitive in the catfish genome. Additional research is warranted to fully understand the nature of these genes/sequences in the catfish genome, but clearly many of these represent classes of repetitive gene families such as DNA polymerase gene that had hits from 31 BES.
Among the teleost genomes with high sequence coverage, zebrafish is the most closely related species to catfish . Our initial BLAST searches of the catfish BES against the genome of the T. nigroviridis generated many fewer significant hits compared to those against the zebrafish genome. Therefore, we concentrated our comparative analysis efforts with the zebrafish genome in this study.
Conserved syntenies are most often established by comparing genome sequences of related species. However, the whole genome sequence is not yet available from catfish. In the absence of the whole genome sequence, we attempted to establish microsyntenies based on physical linkage of gene sequences. With the genome resources available in catfish, we have taken three approaches. First, if the genes were identified from both ends of a single BAC clone, they are physically linked with a distance of the BAC clone insert size. If the same two genes are found linked in the zebrafish genome in the same genome neighborhood, a microsynteny can then be established. These genes from mate-paired BES are physically linked with the average distances between them being the average insert size of the catfish BAC library, i.e., 161 kb. From the 63,387 BES, a total of 25,676 mate-paired BES were identified. Of these, 760 mate-paired BES had significant BLASTN hits against the zebrafish genome sequence. However, only 194 of the 760 significant hit pairs were on the same zebrafish chromosome, allowing syntenic comparison. Further tBLASTX searches against the ENSEMBL zebrafish cDNA database allowed identification of 95 mate-paired BES with genes on both sides. The genomic locations of these 95 mate-paired genes were determined from the zebrafish genome sequence. Fifty pairs were found to be present in neighboring genomic locations within one million base pairs, while the other 45 were present in more distant locations (> 1 Mb) on the same chromosomes. The vast majority of the 50 mate-paired genes were found to be within 500 kb on the zebrafish genome sequence; only 2 of the 50 pairs had a distance of 500-920 kb (Table (Table4),4), suggesting conserved syntenies of the involved genes.
We previously reported the relatively high levels of local region conservation. For instance, many genes within the bordering mate-paired genes were well conserved among catfish, zebrafish, and Tetraodon, as determined by direct sequencing of the catfish BAC DNA using primers predicted from known genes in zebrafish or Tetraodon . We did not extend this part of the study, but all known genomic information suggested high levels of local genome conservation.
In addition to the 50 microsyntenies, we attempted to determine if significant gene hits in the same catfish BAC contigs also fall on the same chromosome locations comparable to the contig sizes. As shown in Table Table2,2, of the contigs with gene hits, 1,754 contigs had only one gene hit, while 472 contigs had two or more gene hits within each contig. Because the genes in the same contig are physically linked, their linkage in a comparable distance in the zebrafish genome would indicate a conserved synteny. As shown in Figure Figure1,1, ,2,2, ,3,3, ,44 &5, the vast majority of gene hits within the same contigs were found to be located on the same zebrafish chromosomes with comparable distances as estimated from the catfish BAC contigs. Using such an approach, a total of 336 conserved microsyntenies was identified (Table (Table2).2). Presence of multiple gene hits within large BAC contigs would allow identification of extended large conserved syntenic regions. Many of the microsyntenies were conserved with extended genomic distance to span over several million base pairs (Figure (Figure1,1, ,2,2, ,3,3, ,44 &5, for additional details, see Additional file 1). For instance, large conserved syntenies were identified from chromosomes 12, 13, 14, 22, 23, 24, and 25 (Figure (Figure1,1, ,2,2, ,3,3, ,44 &5). In spite of the identification of some relatively large conserved syntenic regions, the vast majority of the identified syntenies were microsyntenies. Such highly segmented microsyntenies are not very useful for genome-wide comparative analysis. However, if scaffolds can be established by determining the relationships among the microsyntenies, large-scale genome comparison should be possible. We, therefore, used two zebrafish chromosomes as the query to demonstrate if super scaffolds can be established. Chromosome 7, one of the chromosomes with the highest number of significant gene hits, and chromosome 13, one of the chromosomes with a large number of contigs having two or more hits (indicative of high level of syntenic conservation), were chosen for further analysis using genetic linkage mapping.
In order to extend the scope of conserved microsyntenies, microsyntenies identified on zebrafish chromosomes 7 and 13 were genetically mapped to determine their chromosomal locations in the catfish genome. There were 373 significant BLASTN hits to zebrafish chromosome 13 involving 178 unique catfish BAC contigs; and 505 significant hits to zebrafish chromosome 7 involving 314 unique catfish BAC contigs. We, therefore, first identified microsatellites from these involved catfish BAC contigs, and then mapped them to the linkage groups when the microsatellites were polymorphic in the resource family. A total of 548 pairs of microsatellite primers were tested, of which 296 from 188 contigs (the details of the polymorphic markers are shown in the Additional file 2) were polymorphic in the resource family. Further analysis using JoinMap 4.0 allowed mapping of 290 microsatellite markers, of which 161 microsatellites were from BES with significant similarity to zebrafish chromosome 7, and 129 microsatellites were from BES with significant similarity to zebrafish chromosome 13.
Mapping of microsatellites from contigs with hits to zebrafish chromosome 13 allowed identification of a highly conserved chromosome between catfish and zebrafish. As shown in Figure Figure6,6, of the 129 microsatellites from BES with high similarities to the zebrafish chromosome 13, 57 microsatellites from 43 contigs were mapped into a single linkage group, spanning approximately 90 centi-Morgans, suggesting the conservation of a large segment of this chromosome. However, the entire chromosome is not conserved. The 129 microsatellites were mapped to a total of 24 linkage groups, with seven of the 24 linkage groups containing 4-12 markers (see Additional file 2).
Similarly but to a much lesser extent, microsatellites from BES with similarities to the zebrafish chromosome 7 were mapped to three major linkage groups (Figure (Figure7).7). Once again, many smaller syntenic regions were mapped to various linkage groups, suggesting high levels of local conservation and low levels of chromosomal conservation. Nonetheless, the significant aspect of this is that scaffolds can be established by linking various contigs together through linkage mapping. This will allow integration of genetic linkage and physical maps once microsatellites are identified from most contigs of the physical map. Such scaffolds should guide genome sequence assembly in the future, and should also provide molecular length measurements of various polymorphic markers along the genome of catfish, providing guidance for the development of the SNP chip technology in catfish. Apparently, SNP chips constructed from evenly distributed SNPs provide the best coverage of the catfish genome when conducting the whole genome association studies.
Genetic linkage mapping of BAC end-anchored microsatellites provided a level of validation of the physical map. Discrepancies were found between the BAC assemblage and the linkage map. Of the 75 contigs with at least 2 markers, 54 contigs were mapped properly into the same linkage groups. However, 18 contigs were mapped into different linkage groups (see Additional file 2). Of these 18 contigs, 12 are large contigs with at least 40 BACs. Apparently, such discrepancy is indicative of mistakes in the BAC assemblage. Mapping additional BAC end-anchored microsatellites is under way to integrate the genetic linkage and physical maps, and to correct any additional mistakes in the assembly of the physical map .
Some highly conserved chromosomes or chromosomal regions exist between catfish and zebrafish. High levels of local conservation were found, but a high level of chromosomal shuffling and rearrangements exists between catfish and zebrafish genomes. Comparative genome analysis using zebrafish genome sequence is highly useful for regional comparisons, but not so useful at the chromosomal levels. The significance of comparative genome analysis in catfish is that it will allow more cost-effective structural genomic analysis, but more importantly, orthologues established through comparative genome analysis should facilitate functional assignment of genes. Given that functional genomics is more difficult with non-model fish species, inference from orthologues should be one of the most efficient and reliable approaches for functional analysis of the catfish genome.
Overall, the evolutionary syntenic conservation appeared to be relatively low between the catfish genome and the genomes of the zebrafish. This indicates many chromosome breakage and rearrangements among the fish genomes occurred during evolution. These findings are consistent with our previous findings that high levels of conservation were found within small genomic regions, whereas high levels of large-scale genome reshuffling were evident when comparing the genomes of catfish and zebrafish [26,40]. These conclusions, however, are based on the assumption that the zebrafish genome assembly is correct. Apparently, due to the assembly mistakes in the zebrafish genome, some of the syntenic breaks may be due to the still poor assembly of the zebrafish genome. We also acknowledge that comparative genome analysis using a partial bank of sequences in catfish and a more complete databank in zebrafish could potentially lead to a bias. Caution should be exercised when establishing concrete syntenic relations. Such limitations themselves justify the need for whole genome sequencing in catfish.
The CHORI-212 Channel Catfish BAC library  was used for BAC-end sequencing. BAC culture and sequencing reactions were conducted using standard protocols, and as previously described [25,40]. Briefly, BAC clones were transferred from 384-well plates to 96-well culture blocks containing 1.5 ml of 2× YT medium with 12.5 μg/ml chloramphenicol and grown at 37°C overnight with shaking at 300 rpm. The blocks were centrifuged at 2000 × g for 10 min in an Eppendorf 5804R bench top centrifuge to collect bacteria. The culture supernatant was decanted and the blocks were inverted and tapped gently on paper towels to remove remaining liquid. BAC DNA was isolated using the Perfectprep™ BAC 96 kit (Eppendorf, Westbury, NY) according to the manufacturer's specifications. BAC DNA was collected in 96-well plates and stored at -20°C until usage.
Sequencing of channel catfish BAC ends was conducted using the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA), with modifications. Each sequencing reaction mix contained 2 μl of 5× sequencing buffer, 2 μl of primer (3 pmol/μl), 1.5 μl BigDye v3.1 dye terminator, and 4.5 μl of BAC DNA. BAC clones were sequenced from both ends using the primers T7 (5'-TAATACGACTCACTATAGGG-3') and SP6 (5'-ATTTAGGTGACACTATAG-3'). Cycle sequencing was carried out in 96-well plate format using PTC-200 thermal cyclers (MJ Research/Bio-Rad, Hercules, CA) under the following thermal profile: an initial denaturing at 95°C for 5 min, followed by 100 cycles of 95°C for 30 s, 53°C for 10 s, and 60°C for 4 min. Products were purified using ethanol/EDTA precipitation according to the BigDye protocol (Applied Biosystems), with the following modifications. After thermal cycling, 1 μl of 125 mM EDTA and 30 μl chilled (-80°C) 100% ethanol were added to each reaction. Plates were gently mixed and incubated at room temperature for 15 min. Plates were then centrifuged at 2,250 × g at 4°C for 30 min, followed by washing in 30 μl of 70% ethanol at 2,000 × g for 15 min. Ethanol was decanted and 8 μl Hi-Di™ formamide (Applied Biosystems) was added to each well to re-suspend DNA. Products were denatured at 95°C for 5 min and sequenced on a 3130xl genetic analyzer (Applied Biosystems).
The raw BES base calling were conducted by using Phred [48,49] with Q20 as a cut-off. Lucy program  was used to remove the vector sequences and short sequence less than 200 bp. Repeats were masked using REPEATMASKER  before BLAST analysis. In order to anchor the catfish BES to the zebrafish genome, TBLASTX searches of the repeat-masked BES were conducted against the ENSEMBLE zebrafish cDNA database (Assembly 7).
In the absence of the whole genome sequence, we attempted to establish microsyntenies based on physical linkage of gene sequences. First, if the genes were identified from both sides of a single BAC clone (mate-paired BES), then they are physically linked with a distance of the BAC clone insert size. If the same two genes were found to be linked on the zebrafish genome in the same genome neighborhood, a microsynteny was established.
Initially, BES were analyzed by BLASTN (E-value ≤ -5) for the identification of mate-pairs with significant hits on both sides of the BAC insert. Mate-paired BES were analyzed by tBLASTX (E-value ≤ -5) for the identification of genes on both sides of the BAC insert. After identification, the two mate-paired genes in each BAC were used as queries to search for their chromosomal locations on the zebrafish genome. Conserved microsyntenies were declared when the mate-paired genes existed within a distance of 1.0 Mb within the zebrafish genome.
Syntenies were also established using genes within contiguous sequences (contigs) based on the catfish physical map . Genes identified from BES were located along the catfish physical map. Genes identified within the same contig and located on the same zebrafish chromosome with comparable distances as estimated from the catfish BAC contig, an extended synteny was established.
In order to assess the scope of microsyntenies, two zebrafish chromosomes, chromosome 7 and 13, were chosen for analysis. Chromosome 7 had the largest number of significant hits and chromosome 13 had a large number of contigs having two or more hits (suggestive of high level of syntenic conservation). Syntenies were established using microsatellite-based linkage mapping. A total of 548 microsatellite loci in the contigs which had significant BLASTN hits to the zebrafish chromosome 7 and 13 were tested using a hybrid catfish resource family, F1-2 (female blue-channel catfish hybrid) × Ch-6 (male channel catfish) with 64 progeny.
Microsatellites were identified and analyzed using Msatfinder  and Vector NTI 10.0 (Invitrogen, Carlsbad, CA) as we previously described . Polymerase chain reaction (PCR) primers were designed using Msatfinder . Mononucleotide repeats were manually excluded. PCR amplification was conducted as previously described . Briefly, each microsatellite PCR reaction contained 1× PCR buffer, 2 mM MgCl2, 0.2 mM of each dNTP, 4 ng upper primer, 6 ng lower primer, 1 pmol labeled primer, and 0.25 U of JumpStart Taq polymerase (Sigma, St. Louis, MO), and 20 ng genomic DNA. PCR amplification was carried out using a touchdown program with the following thermal profile: an initial denaturation at 94°C for 3.5 min, followed by 94°C for 30 s, 57°C for 30 s, and 72°C for 30 s for 20 cycles as the first step, and at 94°C for 30 s, 53°C for 30 s, and 72°C for 30 s for 15 cycles as the second step. A final extension was performed at 72°C for 10 minutes. The PCR products were analyzed on 7% sequencing gels using the 4300 DNA Analyzer (LI-COR® Biosciences, Lincoln, NE). After gel electrophoresis, loci were manually genotyped to determine allele segregation patterns and polymorphisms in the resource family.
The catfish linkage map was constructed using JoinMap version 4.0 software as we previously described  using the cross-pollinating (CP) coding scheme, which handles the data containing various genotype configurations with unknown linkage phases . Linkage between markers was examined by estimating LOD scores for recombination rate, and map distances were calculated using the Kosambi mapping function. Significance of marker linkage was determined at a final LOD threshold of 3.0.
The BES generated from this study have been deposited in GenBank and were assigned accession numbers from [GenBank:FI857756] to [GenBank:FI900776] and the existing BES from [GenBank:DX083364] to [GenBank:DX103729] were also used for comparative genome analysis in this study.
HL and YJ contributed equally and their contribution accounts for the major part of this study. SW and JA participated in data analysis and manuscript preparation. PN and BS assisted in developing microsatellite markers. PX assisted in culturing the BAC clone and extracted DNA. HK contributed major part in linkage mapping analysis. ZL supervised the entire study and prepared the manuscript. All authors read and approved the final manuscript.
tBLASTX search results. A portion of the tBLASTX search results used for the identification of microsyntenies. Query ID are sequence identification of catfish BAC end sequences used as queries; Subject ID is the identification of the sequence with significant tBLASTX hits at the E-values (E-values are coded in this Table. For instance, 0.29 means e-29, 0.06 means e-6). Chromosome number is provided in the column Chr, with starting position (Chr SS) and ending position (Chr SE) provided. The potential gene identities are detailed under Description.
Primers used to map the BAC end associated microsatellites. Ctg_ID is contig number on the catfish physical map; Contig size refers to the number of BAC clones within the contig; BAC_ID is BAC identification number; Upper and lower primer sequences, and the linkage groups the microsatellites were mapped are given in column E, F, and G, respectively.
This project was supported by grants from USDA NRI Animal Genome Tools and Resources Program (USDA/NRICGP award# 2009-35205-05101, and partially award #2006-35616-16685).