Haemophilus parasuis (H. parasuis) is the etiological agent of Glässer's disease in pigs. Currently, the molecular basis of this infection is largely unknown. The innate immune response is the first line of defense against the infectious disease. Systematical analysis on host innate immune response to the infection is important for understanding the pathogenesis of the infectious microorganisms.
A total of 428 differentially expressed (DE) genes were identified in the porcine alveolar macrophages (PAMs) 6 days after H. parasuis infection. These genes were principally related to inflammatory response, immune response, microtubule polymerization, regulation of transcript and signal transduction. Through the pathway analysis, the significant pathways mainly concerned with cell adhesion molecules, cytokine-cytokine receptor interaction, complement and coagulation cascades, toll-like receptor signaling pathway, MAPK signaling pathway, suggesting that the host took different strategies to activate immune and inflammatory response upon H. parasuis infection. The global interactions network and two subnetworks of the proteins encoded by DE genes were analyzed by using STRING. Further immunostimulation analysis indicated that mRNA levels of S100 calcium-binding protein A4 (S100A4) and S100 calcium-binding protein A6 (S100A6) in porcine PK-15 cells increased within 48 h and were sustained after administration of lipopolysaccharide (LPS) and Poly (I:C) respectively. The s100a4 and s100a6 genes were found to be up-regulated significantly in lungs, spleen and lymph nodes in H. parasuis infected pigs. We firstly cloned and sequenced the porcine coronin1a gene. Phylogenetic analysis showed that poCORONIN 1A belonged to the group containing the Bos taurus sequence. Structural analysis indicated that the poCORONIN 1A contained putative domains of Trp-Asp (WD) repeats signature, Trp-Asp (WD) repeats profile and Trp-Asp (WD) repeats circular profile at the N-terminus.
Our present study is the first one focusing on the response of porcine alveolar macrophages to H. parasuis. Our data demonstrate a series of genes are activated upon H. parasuis infection. The observed gene expression profile could help screening the potential host agents for reducing the prevalence of H. parasuis and further understanding the molecular pathogenesis associated with H. parasuis infection in pigs.
Microsporidian Nosema bombycis has received much attention because the pébrine disease of domesticated silkworms results in great economic losses in the silkworm industry. So far, no effective treatment could be found for pébrine. Compared to other known Nosema parasites, N. bombycis can unusually parasitize a broad range of hosts. To gain some insights into the underlying genetic mechanism of pathological ability and host range expansion in this parasite, a comparative genomic approach is conducted. The genome of two Nosema parasites, N. bombycis and N. antheraeae (an obligatory parasite to undomesticated silkworms Antheraea pernyi), were sequenced and compared with their distantly related species, N. ceranae (an obligatory parasite to honey bees).
Our comparative genomics analysis show that the N. bombycis genome has greatly expanded due to the following three molecular mechanisms: 1) the proliferation of host-derived transposable elements, 2) the acquisition of many horizontally transferred genes from bacteria, and 3) the production of abundnant gene duplications. To our knowledge, duplicated genes derived not only from small-scale events (e.g., tandem duplications) but also from large-scale events (e.g., segmental duplications) have never been seen so abundant in any reported microsporidia genomes. Our relative dating analysis further indicated that these duplication events have arisen recently over very short evolutionary time. Furthermore, several duplicated genes involving in the cytotoxic metabolic pathway were found to undergo positive selection, suggestive of the role of duplicated genes on the adaptive evolution of pathogenic ability.
Genome expansion is rarely considered as the evolutionary outcome acting on those highly reduced and compact parasitic microsporidian genomes. This study, for the first time, demonstrates that the parasitic genomes can expand, instead of shrink, through several common molecular mechanisms such as gene duplication, horizontal gene transfer, and transposable element expansion. We also showed that the duplicated genes can serve as raw materials for evolutionary innovations possibly contributing to the increase of pathologenic ability. Based on our research, we propose that duplicated genes of N. bombycis should be treated as primary targets for treatment designs against pébrine.
Gene duplication; Horizontal gene transfer; Host-derived transposable element; Host adaptation; Microsporidian; Silkworms
Comparative mapping is a powerful tool to study evolution of genomes. It allows transfer of genome information from the well-studied model species to non-model species. Catfish is an economically important aquaculture species in United States. A large amount of genome resources have been developed from catfish including genetic linkage maps, physical maps, BAC end sequences (BES), integrated linkage and physical maps using BES-derived markers, physical map contig-specific sequences, and draft genome sequences. Application of such genome resources should allow comparative analysis at the genome scale with several other model fish species.
In this study, we conducted whole genome comparative analysis between channel catfish and four model fish species with fully sequenced genomes, zebrafish, medaka, stickleback and Tetraodon. A total of 517 Mb draft genome sequences of catfish were anchored to its genetic linkage map, which accounted for 62% of the total draft genome sequences. Based on the location of homologous genes, homologous chromosomes were determined among catfish and the four model fish species. A large number of conserved syntenic blocks were identified. Analysis of the syntenic relationships between catfish and the four model fishes supported that the catfish genome is most similar to the genome of zebrafish.
The organization of the catfish genome is similar to that of the four teleost species, zebrafish, medaka, stickleback, and Tetraodon such that homologous chromosomes can be identified. Within each chromosome, extended syntenic blocks were evident, but the conserved syntenies at the chromosome level involve extensive inter-chromosomal and intra-chromosomal rearrangements. This whole genome comparative map should facilitate the whole genome assembly and annotation in catfish, and will be useful for genomic studies of various other fish species.
Catfish; Genome; Comparative mapping; Linkage mapping; Conserved synteny
While multiple replication origins have been observed in archaea, considerably less is known about their evolutionary processes. Here, we performed a comparative analysis of the predicted (proved in part) orc/cdc6-associated replication origins in 15 completely sequenced haloarchaeal genomes to investigate the diversity and evolution of replication origins in halophilic Archaea.
Multiple orc/cdc6-associated replication origins were predicted in all of the analyzed haloarchaeal genomes following the identification of putative ORBs (origin recognition boxes) that are associated with orc/cdc6 genes. Five of these predicted replication origins in Haloarcula hispanica were experimentally confirmed via autonomous replication activities. Strikingly, several predicted replication origins in H. hispanica and Haloarcula marismortui are located in the distinct regions of their highly homologous chromosomes, suggesting that these replication origins might have been introduced as parts of new genomic content. A comparison of the origin-associated Orc/Cdc6 homologs and the corresponding predicted ORB elements revealed that the replication origins in a given haloarchaeon are quite diverse, while different haloarchaea can share a few conserved origins. Phylogenetic and genomic context analyses suggested that there is an original replication origin (oriC1) that was inherited from the ancestor of archaea, and several other origins were likely evolved and/or translocated within the haloarchaeal species.
This study provides detailed information about the diversity of multiple orc/cdc6-associated replication origins in haloarchaeal genomes, and provides novel insight into the evolution of multiple replication origins in Archaea.
Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies. In order to provide the best genome coverage for the analysis of performance and production traits, a large number of relatively evenly distributed SNPs are needed. Gene-associated SNPs may fulfill these requirements of large numbers and genome wide distribution. In addition, gene-associated SNPs could themselves be causative SNPs for traits. The objective of this project was to identify large numbers of gene-associated SNPs using high-throughput next generation sequencing.
Transcriptome sequencing was conducted for channel catfish and blue catfish using Illumina next generation sequencing technology. Approximately 220 million reads (15.6 Gb) for channel catfish and 280 million reads (19.6 Gb) for blue catfish were obtained by sequencing gene transcripts derived from various tissues of multiple individuals from a diverse genetic background. A total of over 35 billion base pairs of expressed short read sequences were generated. Over two million putative SNPs were identified from channel catfish and almost 2.5 million putative SNPs were identified from blue catfish. Of these putative SNPs, a set of filtered SNPs were identified including 342,104 intra-specific SNPs for channel catfish, 366,269 intra-specific SNPs for blue catfish, and 420,727 inter-specific SNPs between channel catfish and blue catfish. These filtered SNPs are distributed within 16,562 unique genes in channel catfish and 17,423 unique genes in blue catfish.
For aquaculture species, transcriptome analysis of pooled RNA samples from multiple individuals using Illumina sequencing technology is both technically efficient and cost-effective for generating expressed sequences. Such an approach is most effective when coupled to existing EST resources generated using traditional sequencing approaches because the reference ESTs facilitate effective assembly of the expressed short reads. When multiple individuals with different genetic backgrounds are used, RNA-Seq is very effective for the identification of SNPs. The SNPs identified in this report will provide a much needed resource for genetic studies in catfish and will contribute to the development of a high-density SNP array. Validation and testing of these SNPs using SNP arrays will form the material basis for genome association studies and whole genome-based selection in catfish.
Salmonella paratyphi C, like S. typhi, is adapted to humans and causes typhoid fever. Previously we reported different genome structures between two strains of S. paratyphi C, which suggests that S. paratyphi C might have a plastic genome (large DNA segments being organized in different orders or orientations on the genome). As many but not all host-adapted Salmonella pathogens have large genomic insertions as well as the supposedly resultant genomic rearrangements, bacterial genome plasticity presents an extraordinary evolutionary phenomenon. Events contributing to genomic plasticity, especially large insertions, may be associated with the formation of particular Salmonella pathogens.
We constructed a high resolution genome map in S. paratyphi C strain RKS4594 and located four insertions totaling 176 kb (including the 90 kb SPI7) and seven deletions totaling 165 kb relative to S. typhimurium LT2. Two rearrangements were revealed, including an inversion of 1602 kb covering the ter region and the translocation of the 43 kb I-CeuI F fragment. The 23 wild type strains analyzed in this study exhibited diverse genome structures, mostly as a result of recombination between rrn genes. In at least two cases, the rearrangements involved recombination between genomic sites other than the rrn genes, possibly homologous genes in prophages. Two strains had a 20 kb deletion between rrlA and rrlB, which is a highly conservative region and no deletion has been reported in this region in any other Salmonella lineages.
S. paratyphi C has diverse genome structures among different isolates, possibly as a result of large genomic insertions, e.g., SPI7. Although the Salmonella typhoid agents may not be more closely related among them than each of them to other Salmonella lineages, they may have evolved in similar ways, i.e., acquiring typhoid-associated genes followed by genome structure rearrangements. Comparison of multiple Salmonella typhoid agents at both single sequenced genome and population levels will facilitate the studies on the evolutionary process of typhoid pathogenesis, especially the identification of typhoid-associated genes.
Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives.
We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting.
Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they become available. Bolbase is freely available at http://ocri-genomics.org/bolbase.
Brassica oleracea; Database; Genome sequence; Synteny; Comparative genomics
MicroRNAs (miRNAs) and other types of small regulatory RNAs play critical roles in the regulation of gene expression at the post-transcriptional level in plants. Cotton is one of the most economically important crops, but little is known about the roles of miRNAs during cotton fiber elongation.
Here, we combined high-throughput sequencing with computational analysis to identify small RNAs (sRNAs) related to cotton fiber elongation in Gossypium hirsutum L. (G. hirsutum). The sequence analysis confirmed the expression of 79 known miRNA families in elongating fiber cells and identified 257 novel miRNAs, primarily derived from corresponding specific loci in the Gossypium raimondii Ulbr. (G. raimondii) genome. Furthermore, a comparison of the miRNAomes revealed that 46 miRNA families were differentially expressed throughout the elongation period. Importantly, the predicted and experimentally validated targets of eight miRNAs were associated with fiber elongation, with obvious functional relationships with calcium and auxin signal transduction, fatty acid metabolism, anthocyanin synthesis and the xylem tissue differentiation. Moreover, one tasiRNA was also identified, and its target, ARF4, was experimentally validated in vivo.
This study not only facilitated the discovery of 257 novel low-abundance miRNAs in elongating cotton fiber cells but also revealed a potential regulatory network of nine sRNAs important for fiber elongation. The identification and characterization of miRNAs in elongating cotton fiber cells might promote the further study of fiber miRNA regulation mechanisms and provide insight into the importance of miRNAs in cotton.
Cotton; Comparative miRNAome analysis; Fiber cell elongation; High-throughput sequencing; miRNAs; tasiRNA
Cytokinins (CKs) have significant roles in various aspects of plant growth and development, and they are also involved in plant stress adaptations. The fine-tuning of the controlled CK levels in individual tissues, cells, and organelles is properly maintained by isopentenyl transferases (IPTs) and cytokinin oxidase/dehydrogenases (CKXs). Chinese cabbage is one of the most economically important vegetable crops worldwide. The whole genome sequencing of Brassica rapa enables us to perform the genome-wide identification and functional analysis of the IPT and CKX gene families.
In this study, a total of 13 BrIPT genes and 12 BrCKX genes were identified. The gene structures, conserved domains and phylogenetic relationships were analyzed. The isoelectric point, subcellular localization and glycosylation sites of the proteins were predicted. Segmental duplicates were found in both BrIPT and BrCKX gene families. We also analyzed evolutionary patterns and divergence of the IPT and CKX genes in the Cruciferae family. The transcription levels of BrIPT and BrCKX genes were analyzed to obtain an initial picture of the functions of these genes. Abiotic stress elements related to adverse environmental stimuli were found in the promoter regions of BrIPT and BrCKX genes and they were confirmed to respond to drought and high salinity conditions. The effects of 6-BA and ABA on the expressions of BrIPT and BrCKX genes were also investigated.
The expansion of BrIPT and BrCKX genes after speciation from Arabidopsis thaliana is mainly attributed to segmental duplication events during the whole genome triplication (WGT) and substantial duplicated genes are lost during the long evolutionary history. Genes produced by segmental duplication events have changed their expression patterns or may adopted new functions and thus are obtained. BrIPT and BrCKX genes respond well to drought and high salinity stresses, and their transcripts are affected by exogenous hormones, such as 6-BA and ABA, suggesting their potential roles in abiotic stress conditions and regulatory mechanisms of plant hormone homeostasis. The appropriate modulation of endogenous CKs levels by IPT and CKX genes is a promising approach for developing economically important high-yielding and high-quality stress-tolerant crops in agriculture.
Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed.
A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits. The selection regions were not distributed randomly or uniformly throughout the genome. Instead, clusters of selection hotspots in certain genomic regions were observed. Moreover, a set of candidate genes (4.38% of the total annotated genes) significantly affected by selection underlying soybean domestication and genetic improvement were identified.
Given the uniqueness of the soybean germplasm sequenced, this study drew a clear picture of human-mediated evolution of the soybean genomes. The genomic resources and information provided by this study would also facilitate the discovery of genes/loci underlying agronomically important traits.
Artificial selection; Evolution; Genetic diversity; Population genomics; Soybean
Bacteria are currently classified into arbitrary species, but whether they actually exist as discrete natural species was unclear. To reveal genomic features that may unambiguously group bacteria into discrete genetic clusters, we carried out systematic genomic comparisons among representative bacteria.
We found that bacteria of Salmonella formed tight phylogenetic clusters separated by various genetic distances: whereas over 90% of the approximately four thousand shared genes had completely identical sequences among strains of the same lineage, the percentages dropped sharply to below 50% across the lineages, demonstrating the existence of clear-cut genetic boundaries by a steep turning point in nucleotide sequence divergence. Recombination assays supported the genetic boundary hypothesis, suggesting that genetic barriers had been formed between bacteria of even very closely related lineages. We found similar situations in bacteria of Yersinia and Staphylococcus.
Bacteria are genetically isolated into discrete clusters equivalent to natural species.
Natural species; Salmonella; Genetic boundary
Comparative genomics is a powerful tool to transfer genomic information from model species to related non-model species. Channel catfish (Ictalurus punctatus) is the primary aquaculture species in the United States. Its existing genome resources such as genomic sequences generated from next generation sequencing, BAC end sequences (BES), physical maps, linkage maps, and integrated linkage and physical maps using BES-associated markers provide a platform for comparative genomic analysis between catfish and other model teleost fish species. This study aimed to gain understanding of genome organizations and similarities among catfish and several sequenced teleost genomes using linkage group 8 (LG8) as a pilot study.
With existing genome resources, 287 unique genes were identified in LG8. Comparative genome analysis indicated that most of these 287 genes on catfish LG8 are located on two homologous chromosomes of zebrafish, medaka, stickleback, and three chromosomes of green-spotted pufferfish. Large numbers of conserved syntenies were identified. Detailed analysis of the conserved syntenies in relation to chromosome level similarities revealed extensive inter-chromosomal and intra-chromosomal rearrangements during evolution. Of the 287 genes, 35 genes were found to be duplicated in the catfish genome, with the vast majority of the duplications being interchromosomal.
Comparative genome analysis is a powerful tool even in the absence of a well-assembled whole genome sequence. In spite of sequence stacking due to low resolution of the linkage and physical maps, conserved syntenies can be identified although the exact gene order and orientation are unknown at present. Through chromosome-level comparative analysis, homologous chromosomes among teleosts can be identified. Syntenic analysis should facilitate annotation of the catfish genome, which in turn, should facilitate functional inference of genes based on their orthology.
Comparative mapping; Synteny; Genome; Chromosome; Linkage map; Physical map; Catfish; Fish
Aquaculture is the quickest growing sector in agriculture. However, QTL for important traits have been only identified in a few aquaculture species. We conducted QTL mapping for growth traits in an Asian seabass F2 family with 359 individuals using 123 microsatellites and 22 SNPs, and performed association mapping in four populations with 881 individuals.
Twelve and nine significant QTL, as well as 14 and 10 suggestive QTL were detected for growth traits at six and nine months post hatch, respectively. These QTL explained 0.9-12.0% of the phenotypic variance. For body weight, two QTL intervals at two stages were overlapped while the others were mapped onto different positions. The IFABP-a gene located in a significant QTL interval for growth on LG5 was cloned and characterized. A SNP in exon 3 of the gene was significantly associated with growth traits in different populations.
The results of QTL mapping for growth traits suggest that growth at different stages was controlled by some common QTL and some different QTL. Positional candidate genes and association mapping suggest that the IFABP-a is a strong candidate gene for growth. Our data supply a basis for fine mapping QTL, marker-assisted selection and further detailed analysis of the functions of the IFABP-a gene in fish growth.
Single nucleotide polymorphism; Growth trait; Candidate gene; Quantitative trait locus
The fertile and sterile plants were derived from the self-pollinated offspring of the F1 hybrid between the novel restorer line NR1 and the Nsa CMS line in Brassica napus. To elucidate gene expression and regulation caused by the A and C subgenomes of B. napus, as well as the alien chromosome and cytoplasm from Sinapis arvensis during the development of young floral buds, we performed a genome-wide high-throughput transcriptomic sequencing for young floral buds of sterile and fertile plants.
In this study, equal amounts of total RNAs taken from young floral buds of sterile and fertile plants were sequenced using the Illumina/Solexa platform. After filtered out low quality data, a total of 2,760,574 and 2,714,441 clean tags were remained in the two libraries, from which 242,163 (Ste) and 253,507 (Fer) distinct tags were obtained. All distinct sequencing tags were annotated using all possible CATG+17-nt sequences of the genome and transcriptome of Brassica rapa and those of Brassica oleracea as the reference sequences, respectively. In total, 3231 genes of B. rapa and 3371 genes of B. oleracea were detected with significant differential expression levels. GO and pathway-based analyses were performed to determine and further to understand the biological functions of those differentially expressed genes (DEGs). In addition, there were 1089 specially expressed unknown tags in Fer, which were neither mapped to B. oleracea nor to B. rapa, and these unique tags were presumed to arise basically from the added alien chromosome of S. arvensis. Fifteen genes were randomly selected and their expression levels were confirmed by quantitative RT-PCR, and fourteen of them showed consistent expression patterns with the digital gene expression (DGE) data.
A number of genes were differentially expressed between the young floral buds of sterile and fertile plants. Some of these genes may be candidates for future research on CMS in Nsa line, fertility restoration and improved agronomic traits in NR1 line. Further study of the unknown tags which were specifically expressed in Fer will help to explore desirable agronomic traits from wild species.
MicroRNAs (miRNAs) have been implicated in the regulation of milk protein synthesis and development of the mammary gland (MG). However, the specific functions of miRNAs in these regulations are not clear. Therefore, the elucidation of miRNA expression profiles in the MG is an important step towards understanding the mechanisms of lactogenesis.
Two miRNA libraries were constructed from MG tissues taken from a lactating and a non-lactating Holstein dairy cow, respectively, and the short RNA sequences (18–30 nt) in these libraries were sequenced by Solexa sequencing method. The libraries included 885 pre-miRNAs encoding for 921 miRNAs, of which 884 miRNAs were unique sequences and 544 (61.5%) were expressed in both periods. A custom-designed microarray assay was then performed to compare miRNA expression patterns in the MG of lactating and non-lactating dairy cows. A total of 56 miRNAs in the lactating MG showed significant differences in expression compared to non-lactating MG (P<0.05). Integrative miRNA target prediction and network analysis approaches were employed to construct an interaction network of lactation-related miRNAs and their putative targets. Using a cell-based model, six miRNAs (miR-125b, miR-141, miR-181a, miR-199b, miR-484 and miR-500) were studied to reveal their possible biological significance.
Our study provides a broad view of the bovine MG miRNA expression profile characteristics. Eight hundred and eighty-four miRNAs were identified in bovine MG. Differences in types and expression levels of miRNAs were observed between lactating and non-lactating bovine MG. Systematic predictions aided in the identification of lactation-related miRNAs, providing insight into the types of miRNAs and their possible mechanisms in regulating lactation.
Bud dormancy is a critical developmental process that allows perennial plants to survive unfavorable environmental conditions. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms regulating bud dormancy in this species are unknown. Because genomic information for pear is currently unavailable, transcriptome and digital gene expression data for this species would be valuable resources to better understand the molecular and biological mechanisms regulating its bud dormancy.
We performed de novo transcriptome assembly and digital gene expression (DGE) profiling analyses of ‘Suli’ pear (Pyrus pyrifolia white pear group) using the Illumina RNA-seq system. RNA-Seq generated approximately 100 M high-quality reads that were assembled into 69,393 unigenes (mean length = 853 bp), including 14,531 clusters and 34,194 singletons. A total of 51,448 (74.1%) unigenes were annotated using public protein databases with a cut-off E-value above 10-5. We mainly compared gene expression levels at four time-points during bud dormancy. Between Nov. 15 and Dec. 15, Dec. 15 and Jan. 15, and Jan. 15 and Feb. 15, 1,978, 1,024, and 3,468 genes were differentially expressed, respectively. Hierarchical clustering analysis arranged 190 significantly differentially-expressed genes into seven groups. Seven genes were randomly selected to confirm their expression levels using quantitative real-time PCR.
The new transcriptomes offer comprehensive sequence and DGE profiling data for a dynamic view of transcriptomic variation during bud dormancy in pear. These data provided a basis for future studies of metabolism during bud dormancy in non-model but economically-important perennial species.
‘Suli’ pear (Pyrus pyrifolia white pear group); Transcriptome; Bud dormancy; RNA-Seq
The evolutionary history and relationships of the mud shrimps (Crustacea: Decapoda: Gebiidea and Axiidea) are contentious, with previous attempts revealing mixed results. The mud shrimps were once classified in the infraorder Thalassinidea. Recent molecular phylogenetic analyses, however, suggest separation of the group into two individual infraorders, Gebiidea and Axiidea. Mitochondrial (mt) genome sequence and structure can be especially powerful in resolving higher systematic relationships that may offer new insights into the phylogeny of the mud shrimps and the other decapod infraorders, and test the hypothesis of dividing the mud shrimps into two infraorders.
We present the complete mitochondrial genome sequences of five mud shrimps, Austinogebia edulis, Upogebia major, Thalassina kelanang (Gebiidea), Nihonotrypaea thermophilus and Neaxius glyptocercus (Axiidea). All five genomes encode a standard set of 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a putative control region. Except for T. kelanang, mud shrimp mitochondrial genomes exhibited rearrangements and novel patterns compared to the pancrustacean ground pattern. Each of the two Gebiidea species (A. edulis and U. major) and two Axiidea species (N. glyptocercus and N. thermophiles) share unique gene order specific to their infraorders and analyses further suggest these two derived gene orders have evolved independently. Phylogenetic analyses based on the concatenated nucleotide and amino acid sequences of 13 protein-coding genes indicate the possible polyphyly of mud shrimps, supporting the division of the group into two infraorders. However, the infraordinal relationships among the Gebiidea and Axiidea, and other reptants are poorly resolved. The inclusion of mt genome from more taxa, in particular the reptant infraorders Polychelida and Glypheidea is required in further analysis.
Phylogenetic analyses on the mt genome sequences and the distinct gene orders provide further evidences for the divergence between the two mud shrimp infraorders, Gebiidea and Axiidea, corroborating previous molecular phylogeny and justifying their infraordinal status. Mitochondrial genome sequences appear to be promising markers for resolving phylogenetic issues concerning decapod crustaceans that warrant further investigations and our present study has also provided further information concerning the mt genome evolution of the Decapoda.
Mud shrimps; Mitochondrial genome; Gene order; Evolution; Phylogenetics
Upon the completion of whole genome sequencing, thorough genome annotation that associates genome sequences with biological meanings is essential. Genome annotation depends on the availability of transcript information as well as orthology information. In teleost fish, genome annotation is seriously hindered by genome duplication. Because of gene duplications, one cannot establish orthologies simply by homology comparisons. Rather intense phylogenetic analysis or structural analysis of orthologies is required for the identification of genes. To conduct phylogenetic analysis and orthology analysis, full-length transcripts are essential. Generation of large numbers of full-length transcripts using traditional transcript sequencing is very difficult and extremely costly.
In this work, we took advantage of a doubled haploid catfish, which has two sets of identical chromosomes and in theory there should be no allelic variations. As such, transcript sequences generated from next-generation sequencing can be favorably assembled into full-length transcripts. Deep sequencing of the doubled haploid channel catfish transcriptome was performed using Illumina HiSeq 2000 platform, yielding over 300 million high-quality trimmed reads totaling 27 Gbp. Assembly of these reads generated 370,798 non-redundant transcript-derived contigs. Functional annotation of the assembly allowed identification of 25,144 unique protein-encoding genes. A total of 2,659 unique genes were identified as putative duplicated genes in the catfish genome because the assembly of the corresponding transcripts harbored PSVs or MSVs (in the form of pseudo-SNPs in the assembly). Of the 25,144 contigs with unique protein hits, around 20,000 contigs matched 50% length of reference proteins, and over 14,000 transcripts were identified as full-length with complete open reading frames. The characterization of consensus sequences surrounding start codon and the stop codon confirmed the correct assembly of the full-length transcripts.
The large set of transcripts assembled in this study is the most comprehensive set of genome resources ever developed from catfish, which will provide the much needed resources for functional genome research in catfish, serving as a reference transcriptome for genome annotation, analysis of gene duplication, gene family structures, and digital gene expression analysis. The putative set of duplicated genes provide a starting point for genome scale analysis of gene duplication in the catfish genome, and should be a valuable resource for comparative genome analysis, genome evolution, and genome function studies.
Brassica oleracea encompass a family of vegetables and cabbage that are among the most widely cultivated crops. In 2009, the B. oleracea Genome Sequencing Project was launched using next generation sequencing technology. None of the available maps were detailed enough to anchor the sequence scaffolds for the Genome Sequencing Project. This report describes the development of a large number of SSR and SNP markers from the whole genome shotgun sequence data of B. oleracea, and the construction of a high-density genetic linkage map using a double haploid mapping population.
The B. oleracea high-density genetic linkage map that was constructed includes 1,227 markers in nine linkage groups spanning a total of 1197.9 cM with an average of 0.98 cM between adjacent loci. There were 602 SSR markers and 625 SNP markers on the map. The chromosome with the highest number of markers (186) was C03, and the chromosome with smallest number of markers (99) was C09.
This first high-density map allowed the assembled scaffolds to be anchored to pseudochromosomes. The map also provides useful information for positional cloning, molecular breeding, and integration of information of genes and traits in B. oleracea. All the markers on the map will be transferable and could be used for the construction of other genetic maps.
Cabbage; Brassica; Genetic linkage map; SSR; SNP; Genome
Lymphocytes act as a major component of the adaptive immune system, taking very crucial responsibility for immunity. Differences in proportions of T-cell subpopulations in peripheral blood among individuals under same conditions provide evidence of genetic control on these traits, but little is known about the genetic mechanism of them, especially in swine. Identification of the genetic control on these variants may help the genetic improvement of immune capacity through selection.
To identify genomic regions responsible for these immune traits in swine, a genome-wide association study was conducted. A total of 675 pigs of three breeds were involved in the study. At 21 days of age, all individuals were vaccinated with modified live classical swine fever vaccine. Blood samples were collected when the piglets were 20 and 35 days of age, respectively. Seven traits, including the proportions of CD4+, CD8+, CD4+CD8+, CD4+CD8−, CD4−CD8+, CD4−CD8− and the ratio of CD4+ to CD8+ T cells were measured at the two ages. All the samples were genotyped for 62,163 single nucleotide polymorphisms (SNP) using the Illumina porcineSNP60k BeadChip. 40833 SNPs were selected after quality control for association tests between SNPs and each immune trait considered based on a single-locus regression model. To tackle the issue of multiple testing in GWAS, 10,000 permutations were performed to determine the chromosome-wise and genome-wise significance levels of association tests. In total, 61 SNPs with chromosome-wise significance level and 3 SNPs with genome-wise significance level were identified. 27 significant SNPs were located within the immune-related QTL regions reported in previous studies. Furthermore, several significant SNPs fell into the regions harboring known immunity-related genes, 14 of them fell into the regions which harbor some known T cell-related genes.
Our study demonstrated that genome-wide association studies would be a feasible way for revealing the potential genetics variants affecting T-cell subpopulations. Results herein lay a preliminary foundation for further identifying the causal mutations underlying swine immune capacity in follow-up studies.
T lymphocyte subpopulations; Genome-wide association study; Swine
The ovine Major Histocompatibility Complex (MHC) harbors genes involved in overall resistance/susceptibility of the host to infectious diseases. Compared to human and mouse, the ovine MHC is interrupted by a large piece of autosome insertion via a hypothetical chromosome inversion that constitutes ~25% of ovine chromosome 20. The evolutionary consequence of such an inversion and an insertion (inversion/insertion) in relation to MHC function remains unknown. We previously constructed a BAC clone physical map for the ovine MHC exclusive of the insertion region. Here we report the construction of a high-density physical map covering the autosome insertion in order to address the question of what the inversion/insertion had to do with ruminants during the MHC evolution.
A total of 119 pairs of comparative bovine oligo primers were utilized to screen an ovine BAC library for positive clones and the orders and overlapping relationships of the identified clones were determined by DNA fingerprinting, BAC-end sequencing, and sequence-specific PCR. A total of 368 positive BAC clones were identified and 108 of the effective clones were ordered into an overlapping BAC contig to cover the consensus region between ovine MHC class IIa and IIb. Therefore, a continuous physical map covering the entire ovine autosome inversion/insertion region was successfully constructed. The map confirmed the bovine sequence assembly for the same homologous region. The DNA sequences of 185 BAC-ends have been deposited into NCBI database with the access numbers HR309252 through HR309068, corresponding to dbGSS ID 30164010 through 30163826.
We have constructed a high-density BAC clone physical map for the ovine autosome inversion/insertion between the MHC class IIa and IIb. The entire ovine MHC region is now fully covered by a continuous BAC clone contig. The physical map we generated will facilitate MHC functional studies in the ovine, as well as the comparative MHC evolution in ruminants.
Ovine; MHC; OLA; Physical map; BAC; Comparative mapping
Multi-objective optimization (MOO) involves optimization problems with multiple objectives. Generally, theose objectives is used to estimate very different aspects of the solutions, and these aspects are often in conflict with each other. MOO first gets a Pareto set, and then looks for both commonality and systematic variations across the set. For the large-scale data sets, heuristic search algorithms such as EA combined with MOO techniques are ideal. Newly DNA microarray technology may study the transcriptional response of a complete genome to different experimental conditions and yield a lot of large-scale datasets. Biclustering technique can simultaneously cluster rows and columns of a dataset, and hlep to extract more accurate information from those datasets. Biclustering need optimize several conflicting objectives, and can be solved with MOO methods. As a heuristics-based optimization approach, the particle swarm optimization (PSO) simulate the movements of a bird flock finding food. The shuffled frog-leaping algorithm (SFL) is a population-based cooperative search metaphor combining the benefits of the local search of PSO and the global shuffled of information of the complex evolution technique. SFL is used to solve the optimization problems of the large-scale datasets.
This paper integrates dynamic population strategy and shuffled frog-leaping algorithm into biclustering of microarray data, and proposes a novel multi-objective dynamic population shuffled frog-leaping biclustering (MODPSFLB) algorithm to mine maximum bicluesters from microarray data. Experimental results show that the proposed MODPSFLB algorithm can effectively find significant biological structures in terms of related biological processes, components and molecular functions.
The proposed MODPSFLB algorithm has good diversity and fast convergence of Pareto solutions and will become a powerful systematic functional analysis in genome research.
Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money.
To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others.
On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.
gene selection; microarray; classification; supervised-learning; similarity
Speckles in ultrasound imaging affect image quality and can make the post-processing difficult. Speckle reduction technologies have been employed for removing speckles for some time. One of the effective speckle reduction technologies is anisotropic diffusion. Anisotropic diffusion technology can remove the speckles effectively while preserving the edges of the image and thus has drawn great attention from image processing scientists. However, the proposed methods in the past have different disadvantages, such as being sensitive to the number of iterations or low capability of preserving the details of the ultrasound images. Thus a detail preserved anisotropic diffusion speckle reduction with less sensitive to the number of iterations is needed. This paper aims to develop this kind of technologies.
In this paper, we propose a robust detail preserving anisotropic diffusion filter (RDPAD) for speckle reduction. In order to get robust diffusion, the proposed method integrates Tukey error norm function into the detail preserving anisotropic diffusion filter (DPAD) developed recently. The proposed method could prohibit over-diffusion and thus is less sensitive to the number of iterations
The proposed anisotropic diffusion can preserve the important structure information of the original image while reducing speckles. It is also less sensitive to the number of iterations. Experimental results on real ultrasound images show the effectiveness of the proposed anisotropic diffusion filter.
Recent advances in next-generation sequencing technologies have drastically increased throughput and significantly reduced sequencing costs. However, the average read lengths in next-generation sequencing technologies are short as compared with that of traditional Sanger sequencing. The short sequence reads pose great challenges for de novo sequence assembly. As a pilot project for whole genome sequencing of the catfish genome, here we attempt to determine the proper sequence coverage, the proper software for assembly, and various parameters used for the assembly of a BAC physical map contig spanning approximately a million of base pairs.
A combination of low sequence coverage of 454 and Illumina sequencing appeared to provide effective assembly as reflected by a high N50 value. Using 454 sequencing alone, a sequencing depth of 18 X was sufficient to obtain the good quality assembly, whereas a 70 X Illumina appeared to be sufficient for a good quality assembly. Additional sequencing coverage after 18 X of 454 or after 70 X of Illumina sequencing does not provide significant improvement of the assembly. Considering the cost of sequencing, a 2 X 454 sequencing, when coupled to 70 X Illumina sequencing, provided an assembly of reasonably good quality. With several software tested, Newbler with a seed length of 16 and ABySS with a K-value of 60 appear to be appropriate for the assembly of 454 reads alone and Illumina paired-end reads alone, respectively. Using both 454 and Illumina paired-end reads, a hybrid assembly strategy using Newbler for initial 454 sequence assembly, Velvet for initial Illumina sequence assembly, followed by a second step assembly using MIRA provided the best assembly of the physical map contig, resulting in 193 contigs with a N50 value of 13,123 bp.
A hybrid sequencing strategy using low sequencing depth of 454 and high sequencing depth of Illumina provided the good quality assembly with high N50 value and relatively low cost. A combination of Newbler, Velvet, and MIRA can be used to assemble the 454 sequence reads and the Illumina reads effectively. The assembled sequence can serve as a resource for comparative genome analysis. Additional long reads using the third generation sequencing platforms are needed to sequence through repetitive genome regions that should further enhance the sequence assembly.