Search tips
Search criteria

Results 1-25 (65)

Clipboard (0)

Select a Filter Below

Year of Publication
author:("Hu, songjiang")
1.  The genomes of four tapeworm species reveal adaptations to parasitism 
Nature  2013;496(7443):57-63.
Tapeworms cause debilitating neglected diseases that can be deadly and often require surgery due to ineffective drugs. Here we present the first analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115-141 megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have species-specific expansions of non-canonical heat shock proteins and families of known antigens; specialised detoxification pathways, and metabolism finely tuned to rely on nutrients scavenged from their hosts. We identify new potential drug targets, including those on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
PMCID: PMC3964345  PMID: 23485966
HSP70; parasitism; Cestoda; cysticercosis; echinococcosis; Platyhelminthes
2.  RiceWiki: a wiki-based database for community curation of rice genes 
Nucleic Acids Research  2013;42(D1):D1222-D1228.
Rice is the most important staple food for a large part of the world’s human population and also a key model organism for biological studies of crops as well as other related plants. Here we present RiceWiki (, a wiki-based, publicly editable and open-content platform for community curation of rice genes. Most existing related biological databases are based on expert curation; with the exponentially exploding volume of rice knowledge and other relevant data, however, expert curation becomes increasingly laborious and time-consuming to keep knowledge up-to-date, accurate and comprehensive, struggling with the flood of data and requiring a large number of people getting involved in rice knowledge curation. Unlike extant relevant databases, RiceWiki features harnessing collective intelligence in community curation of rice genes, quantifying users' contributions in each curated gene and providing explicit authorship for each contributor in any given gene, with the aim to exploit the full potential of the scientific community for rice knowledge curation. Based on community curation, RiceWiki bears the potential to make it possible to build a rice encyclopedia by and for the scientific community that harnesses community intelligence for collaborative knowledge curation, covers all aspects of biological knowledge and keeps evolving with novel knowledge.
PMCID: PMC3964990  PMID: 24136999
3.  Regulation of MIR Genes in Response to Abiotic Stress in Hevea brasiliensis 
Increasing demand for natural rubber (NR) calls for an increase in latex yield and also an extension of rubber plantations in marginal zones. Both harvesting and abiotic stresses lead to tapping panel dryness through the production of reactive oxygen species. Many microRNAs regulated during abiotic stress modulate growth and development. The objective of this paper was to study the regulation of microRNAs in response to different types of abiotic stress and hormone treatments in Hevea. Regulation of MIR genes differs depending on the tissue and abiotic stress applied. A negative co-regulation between HbMIR398b with its chloroplastic HbCuZnSOD target messenger is observed in response to salinity. The involvement of MIR gene regulation during latex harvesting and tapping panel dryness (TPD) occurrence is further discussed.
PMCID: PMC3821574  PMID: 24084713
gene expression; miRNA; MIR gene; abiotic stress; rubber tree; tapping panel dryness
4.  Dose-finding study on adjuvant chemotherapy with S-1 plus oxaliplatin for gastric cancer 
Gastric cancer (GC) is the fourth most common type of cancer, accounting for an estimated one million new cases annually worldwide. Locally advanced GC often recurs, even following curative surgical resection. Therefore, there is a need for an effective adjuvant chemotherapy regimen. The aim of this trial was to investigate the maximum tolerated dose (MTD) of S-1 when administered in combination with oxaliplatin in postoperative GC patients. Oxaliplatin was administered at a fixed dose of 130 mg/m2 on day 1. S-1 was administered from day 1 to 14 of a 3-week cycle and escalated by 10 mg/m2/day from 60 to 80 mg/m2/day. A total of 15 patients were enrolled in this study. No dose-limiting toxicities (DLTs) occurred at level 1 (S-1, 60 mg/m2; n=3). One case of DLT (grade 3 vomiting) occurred at level 2 (S-1, 70 mg/m2; n= 6), whereas 2 cases of grade 3 vomiting were observed at level 3 (S-1, 80 mg/m2; n=6). Based on these results, the MTD of S-1 was initially determined to be 70 mg/m2. Furthermore, we observed that cytochrome P450 2A6 (CYP2A6) 41349640C>G was associated with severe neutropenia (C/C vs. C/G vs. G/G = 0 vs. 33.33 vs. 100%; P=0.03297, Fisher’s exact test) during the entire course of the treatment.
PMCID: PMC3915807  PMID: 24649314
S-1; oxaliplatin; adjuvant chemotherapy; maximum-tolerated dose; cytochrome P450 2A6
5.  The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes 
PLoS ONE  2013;8(8):e69476.
Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes.
Methodology/Principal Findings
We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes.
The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
PMCID: PMC3734230  PMID: 23940520
6.  Insight into the specific virulence related genes and toxin-antitoxin virulent pathogenicity islands in swine streptococcosis pathogen Streptococcus equi ssp. zooepidemicus strain ATCC35246 
BMC Genomics  2013;14:377.
Streptococcus equi ssp. zooepidemicus (S. zooepidemicus) is an important pathogen causing swine streptococcosis in China. Pathogenicity islands (PAIs) of S. zooepidemicus have been transferred among bacteria through horizontal gene transfer (HGT) and play important roles in the adaptation and increased virulence of S. zooepidemicus. The present study used comparative genomics to examine the different pathogenicities of S. zooepidemicus.
Genome of S. zooepidemicus ATCC35246 (Sz35246) comprises 2,167,264-bp of a single circular chromosome, with a GC content of 41.65%. Comparative genome analysis of Sz35246, S. zooepidemicus MGCS10565 (Sz10565), Streptococcus equi. ssp. equi. 4047 (Se4047) and S. zooepidemicus H70 (Sz70) identified 320 Sz35246-specific genes, clustered into three toxin-antitoxin (TA) systems PAIs and one restriction modification system (RM system) PAI. These four acquired PAIs encode proteins that may contribute to the overall pathogenic capacity and fitness of this bacterium to adapt to different hosts. Analysis of the in vivo and in vitro transcriptomes of this bacterium revealed differentially expressed PAI genes and non-PAI genes, suggesting that Sz35246 possess mechanisms for infecting animals and adapting to a wide range of host environments. Analysis of the genome identified potential Sz35246 virulence genes. Genes of the Fim III operon were presumed to be involved in breaking the host-restriction of Sz35246.
Genome wide comparisons of Sz35246 with three other strains and transcriptome analysis revealed novel genes related to bacterial virulence and breaking the host-restriction. Four specific PAIs, which were judged to have been transferred into Sz35246 genome through HGT, were identified for the first time. Further analysis of the TA and RM systems in the PAIs will improve our understanding of the pathogenicity of this bacterium and could lead to the development of diagnostics and vaccines.
PMCID: PMC3750634  PMID: 23742619
7.  Mining genes involved in the stratification of Paris Polyphylla seeds using high-throughput embryo Transcriptome sequencing 
BMC Genomics  2013;14:358.
Paris polyphylla var. yunnanensis is an important medicinal plant. Seed dormancy is one of the main factors restricting artificial cultivation. The molecular mechanisms of seed dormancy remain unclear, and little genomic or transcriptome data are available for this plant.
In this study, massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform was used to generate a substantial sequence dataset for the P. polyphylla embryo. 369,496 high quality reads were obtained, ranging from 50 to 1146 bp, with a mean of 219 bp. These reads were assembled into 47,768 unigenes, which included 16,069 contigs and 31,699 singletons. Using BLASTX searches of public databases, 15,757 (32.3%) unique transcripts were identified. Gene Ontology and Cluster of Orthologous Groups of proteins annotations revealed that these transcripts were broadly representative of the P. polyphylla embryo transcriptome. The Kyoto Encyclopedia of Genes and Genomes assigned 5961 of the unique sequences to specific metabolic pathways. Relative expression levels analysis showed that eleven phytohormone-related genes and five other genes have different expression patterns in the embryo and endosperm in the seed stratification process.
Gene annotation and quantitative RT-PCR expression analysis identified 464 transcripts that may be involved in phytohormone catabolism and biosynthesis, hormone signal, seed dormancy, seed maturation, cell wall growth and circadian rhythms. In particular, the relative expression analysis of sixteen genes (CYP707A, NCED, GA20ox2, GA20ox3, ABI2, PP2C, ARP3, ARP7, IAAH, IAAS, BRRK, DRM, ELF1, ELF2, SFR6, and SUS) in embryo and endosperm and at two temperatures indicated that these related genes may be candidates for clarifying the molecular basis of seed dormancy in P. polyphlla var. yunnanensis.
PMCID: PMC3679829  PMID: 23718911
Embryo; Stratification; Seed dormancy; High-throughput sequencing; Paris polyphylla
8.  Detection and genotyping of restriction fragment associated polymorphisms in polyploid crops with a pseudo-reference sequence: a case study in allotetraploid Brassica napus 
BMC Genomics  2013;14:346.
The presence of homoeologous sequences and absence of a reference genome sequence make discovery and genotyping of single nucleotide polymorphisms (SNPs) more challenging in polyploid crops.
To address this challenge, we constructed reduced representation libraries (RRLs) for two Brassica napus inbred lines and their 91 doubled haploid (DH) progenies using a modified ddRADseq technique. A bioinformatics pipeline termed RFAPtools was developed to discover and genotype SNPs and presence/absence variations (PAVs). Using this pipeline, a pseudo-reference sequence (PRF) containing 180,991 sequence tags was constructed. By aligning sequence reads to the pseudo-reference sequence, allelic SNPs as well as PAVs were identified and genotyped with RFAPtools. Two parallel linkage maps, one SNP bin map containing 8,780 SNP loci and one PAV linkage map containing 12,423 dominant loci, were constructed. By aligning marker sequences to B. rapa sequence scaffolds, whose genome is available, we assigned 44 unassembled sequence scaffolds comprising 8.15 Mb onto the B. rapa chromosomes, and also identified 14 instances of misassembly and eight instances of mis-ordering sequence scaffolds.
These results indicate that the modified ddRADseq approach is a cost-effective and simple method to genotype tens of thousands SNPs and PAV markers in a polyploidy plant species. The results also demonstrated that RFAPtools developed in this study are powerful to mine allelic SNPs from homoeologous sequences in polyploids, therefore they are generally applicable in either diploid or polyploid species with or without a reference genome sequence.
PMCID: PMC3665465  PMID: 23706002
Polyploid crops; Brassica napus; Pseudo-reference sequence; Single nucleotide polymorphism; Presence/absence variation
9.  Digital Gene Expression Tag Profiling Analysis of the Gene Expression Patterns Regulating the Early Stage of Mouse Spermatogenesis 
PLoS ONE  2013;8(3):e58680.
Detailed characterization of the gene expression patterns in spermatogonia and primary spermatocytes is critical to understand the processes which occur prior to meiosis during normal spermatogenesis. The genome-wide expression profiles of mouse type B spermatogonia and primary spermatocytes were investigated using the Solexa/Illumina digital gene expression (DGE) system, a tag based high-throughput transcriptome sequencing method, and the developmental processes which occur during early spermatogenesis were systematically analyzed. Gene expression patterns vary significantly between mouse type B spermatogonia and primary spermatocytes. The functional analysis revealed that genes related to junction assembly, regulation of the actin cytoskeleton and pluripotency were most significantly differently expressed. Pathway analysis indicated that the Wnt non-canonical signaling pathway played a central role and interacted with the actin filament organization pathway during the development of spermatogonia. This study provides a foundation for further analysis of the gene expression patterns and signaling pathways which regulate the molecular mechanisms of early spermatogenesis.
PMCID: PMC3598852  PMID: 23554914
10.  Complete Genome Sequence of the Metabolically Versatile Halophilic Archaeon Haloferax mediterranei, a Poly(3-Hydroxybutyrate-co-3-Hydroxyvalerate) Producer 
Journal of Bacteriology  2012;194(16):4463-4464.
Haloferax mediterranei, an extremely halophilic archaeon, has shown promise for production of poly(3-hydroxybutyrate-co-3-hydroxyvalerate) (PHBV) from unrelated cheap carbon sources. Here we report the complete genome (3,904,707 bp) of H. mediterranei CGMCC 1.2087, consisting of one chromosome and three megaplasmids.
PMCID: PMC3416209  PMID: 22843593
11.  Comparative Transcriptome Analysis of the Accessory Sex Gland and Testis from the Chinese Mitten Crab (Eriocheir sinensis) 
PLoS ONE  2013;8(1):e53915.
The accessory sex gland (ASG) is an important component of the male reproductive system, which functions to enhance the fertility of spermatozoa during male reproduction. Certain proteins secreted by the ASG are known to bind to the spermatozoa membrane and affect its function. The ASG gene expression profile in Chinese mitten crab (Eriocheir sinensis) has not been extensively studied, and limited genetic research has been conducted on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for the ASG of E. sinensis using Illumina sequencing technology. This analysis yielded a total of 33,221,284 sequencing reads, including 2.6 Gb of total nucleotides. Reads were assembled into 85,913 contigs (average 218 bp), or 58,567 scaffold sequences (average 292 bp), that identified 37,955 unigenes (average 385 bp). We assembled all unigenes and compared them with the published testis transcriptome from E. sinensis. In order to identify which genes may be involved in ASG function, as it pertains to modification of spermatozoa, we compared the ASG and testis transcriptome of E. sinensis. Our analysis identified specific genes with both higher and lower tissue expression levels in the two tissues, and the functions of these genes were analyzed to elucidate their potential roles during maturation of spermatozoa. Availability of detailed transcriptome data from ASG and testis in E. sinensis can assist our understanding of the molecular mechanisms involved with spermatozoa conservation, transport, maturation and capacitation and potentially acrosome activation.
PMCID: PMC3547057  PMID: 23342039
12.  Gene and Genome Parameters of Mammalian Liver Circadian Genes (LCGs) 
PLoS ONE  2012;7(10):e46961.
The mammalian circadian system controls various physiology processes and behavior responses by regulating thousands of circadian genes with rhythmic expressions. In this study, we redefined circadian-regulated genes based on published results in the mouse liver and compared them with other gene groups defined relative to circadian regulations, especially the non-circadian-regulated genes expressed in liver at multiple molecular levels from gene position to protein expression based on integrative analyses of different datasets from the literature. Based on the intra-tissue analysis, the liver circadian genes or LCGs show unique features when compared to other gene groups. First, LCGs in general have less neighboring genes and larger in both genomic and 3′-UTR lengths but shorter in CDS (coding sequence) lengths. Second, LCGs have higher mRNA and protein abundance, higher temporal expression variations, and shorter mRNA half-life. Third, more than 60% of LCGs form major co-expression clusters centered in four temporal windows: dawn, day, dusk, and night. In addition, larger and smaller LCGs are found mainly expressed in the day and night temporal windows, respectively, and we believe that LCGs are well-partitioned into the gene expression regulatory network that takes advantage of gene size, expression constraint, and chromosomal architecture. Based on inter-tissue analysis, more than half of LCGs are ubiquitously expressed in multiple tissues but only show rhythmical expression in one or limited number of tissues. LCGs show at least three-fold lower expression variations across the temporal windows than those among different tissues, and this observation suggests that temporal expression variations regulated by the circadian system is relatively subtle as compared with the tissue expression variations formed during development. Taken together, we suggest that the circadian system selects gene parameters in a cost effective way to improve tissue-specific functions by adapting temporal variations from the environment over evolutionary time scales.
PMCID: PMC3468600  PMID: 23071677
13.  Replication-Associated Mutational Pressure (RMP) Governs Strand-Biased Compositional Asymmetry (SCA) and Gene Organization in Animal Mitochondrial Genomes 
Current Genomics  2012;13(1):28-36.
The nucleotide composition of the light (L-) and heavy (H-) strands of animal mitochondrial genomes is known to exhibit strand-biased compositional asymmetry (SCA). One of the possibilities is the existence of a replication-associated mutational pressure (RMP) that may introduce characteristic nucleotide changes among mitochondrial genomes of different animal lineages. Here, we discuss the influence of RMP on nucleotide and amino acid compositions as well as gene organization. Among animal mitochondrial genomes, RMP may represent the major force that compels the evolution of mitochondrial protein-coding genes, coupled with other process-based selective pressures, such as on components of translation machinery— tRNAs and their anticodons. Through comparative analyses of sequenced mitochondrial genomes among diverse animal lineages and literature reviews, we suggest a strong RMP effect, observed among invertebrate mitochondrial genes as compared to those of vertebrates, that is either a result of positive selection on the invertebrate or a relaxed selective pressure on the vertebrate mitochondrial genes.
PMCID: PMC3269014  PMID: 22942673
Function-based selection; mitochondrion genome; replication-associated mutational pressure; strand-biased compositional asymmetry.
14.  Comparative Analysis of the Genomes of Two Field Isolates of the Rice Blast Fungus Magnaporthe oryzae 
PLoS Genetics  2012;8(8):e1002869.
Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus.
Author Summary
Magnaporthe oryzae is the causal agent of rice blast that is mainly controlled with resistance cultivars. However, genetic variations in the pathogen often lead to overcoming R gene-mediated resistance in rice cultivars. In this study we sequenced two field isolates from China and Japan. In comparison with the laboratory strain that was previously sequenced, the field isolates have a similar genome size and overall genome structure. However, they have slightly more genes and contain a number of expanded gene families that are likely involved in plant-fungal interactions. Each of the isolates has specific genes, some of which affect virulence and some others are important for asexual development. The three strains differ noticeably in the distribution of transposon-like elements. Many of the transposable elements tend to be associated with isolate-specific or duplicated sequences. This study revealed genetic factors involved in genome variation of the rice blast fungus.
PMCID: PMC3410873  PMID: 22876203
15.  The Organelle Genomes of Hassawi Rice (Oryza sativa L.) and Its Hybrid in Saudi Arabia: Genome Variation, Rearrangement, and Origins 
PLoS ONE  2012;7(7):e42041.
Hassawi rice (Oryza sativa L.) is a landrace adapted to the climate of Saudi Arabia, characterized by its strong resistance to soil salinity and drought. Using high quality sequencing reads extracted from raw data of a whole genome sequencing project, we assembled both chloroplast (cp) and mitochondrial (mt) genomes of the wild-type Hassawi rice (Hassawi-1) and its dwarf hybrid (Hassawi-2). We discovered 16 InDels (insertions and deletions) but no SNP (single nucleotide polymorphism) is present between the two Hassawi cp genomes. We identified 48 InDels and 26 SNPs in the two Hassawi mt genomes and a new type of sequence variation, termed reverse complementary variation (RCV) in the rice cp genomes. There are two and four RCVs identified in Hassawi-1 when compared to 93–11 (indica) and Nipponbare (japonica), respectively. Microsatellite sequence analysis showed there are more SSRs in the genic regions of both cp and mt genomes in the Hassawi rice than in the other rice varieties. There are also large repeats in the Hassawi mt genomes, with the longest length of 96,168 bp and 96,165 bp in Hassawi-1 and Hassawi-2, respectively. We believe that frequent DNA rearrangement in the Hassawi mt and cp genomes indicate ongoing dynamic processes to reach genetic stability under strong environmental pressures. Based on sequence variation analysis and the breeding history, we suggest that both Hassawi-1 and Hassawi-2 originated from the Indonesian variety Peta since genetic diversity between the two Hassawi cultivars is very low albeit an unknown historic origin of the wild-type Hassawi rice.
PMCID: PMC3409126  PMID: 22870184
16.  Metagenomic Insights into the Fibrolytic Microbiome in Yak Rumen 
PLoS ONE  2012;7(7):e40430.
The rumen hosts one of the most efficient microbial systems for degrading plant cell walls, yet the predominant cellulolytic proteins and fibrolytic mechanism(s) remain elusive. Here we investigated the cellulolytic microbiome of the yak rumen by using a combination of metagenome-based and bacterial artificial chromosome (BAC)-based functional screening approaches. Totally 223 fibrolytic BAC clones were pyrosequenced and 10,070 ORFs were identified. Among them 150 were annotated as the glycoside hydrolase (GH) genes for fibrolytic proteins, and the majority (69%) of them were clustered or linked with genes encoding related functions. Among the 35 fibrolytic contigs of >10 Kb in length, 25 were derived from Bacteroidetes and four from Firmicutes. Coverage analysis indicated that the fibrolytic genes on most Bacteroidetes-contigs were abundantly represented in the metagenomic sequences, and they were frequently linked with genes encoding SusC/SusD-type outer-membrane proteins. GH5, GH9, and GH10 cellulase/hemicellulase genes were predominant, but no GH48 exocellulase gene was found. Most (85%) of the cellulase and hemicellulase proteins possessed a signal peptide; only a few carried carbohydrate-binding modules, and no cellulosomal domains were detected. These findings suggest that the SucC/SucD-involving mechanism, instead of one based on cellulosomes or the free-enzyme system, serves a major role in lignocellulose degradation in yak rumen. Genes encoding an endoglucanase of a novel GH5 subfamily occurred frequently in the metagenome, and the recombinant proteins encoded by the genes displayed moderate Avicelase in addition to endoglucanase activities, suggesting their important contribution to lignocellulose degradation in the exocellulase-scarce rumen.
PMCID: PMC3396655  PMID: 22808161
17.  EvolView, an online tool for visualizing, annotating and managing phylogenetic trees 
Nucleic Acids Research  2012;40(Web Server issue):W569-W572.
EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at:
PMCID: PMC3394307  PMID: 22695796
18.  A Complete Sequence and Transcriptomic Analyses of Date Palm (Phoenix dactylifera L.) Mitochondrial Genome 
PLoS ONE  2012;7(5):e37164.
Based on next-generation sequencing data, we assembled the mitochondrial (mt) genome of date palm (Phoenix dactylifera L.) into a circular molecule of 715,001 bp in length. The mt genome of P. dactylifera encodes 38 proteins, 30 tRNAs, and 3 ribosomal RNAs, which constitute a gene content of 6.5% (46,770 bp) over the full length. The rest, 93.5% of the genome sequence, is comprised of cp (chloroplast)-derived (10.3% with respect to the whole genome length) and non-coding sequences. In the non-coding regions, there are 0.33% tandem and 2.3% long repeats. Our transcriptomic data from eight tissues (root, seed, bud, fruit, green leaf, yellow leaf, female flower, and male flower) showed higher gene expression levels in male flower, root, bud, and female flower, as compared to four other tissues. We identified 120 potential SNPs among three date palm cultivars (Khalas, Fahal, and Sukry), and successfully found seven SNPs in the coding sequences. A phylogenetic analysis, based on 22 conserved genes of 15 representative plant mitochondria, showed that P. dactylifera positions at the root of all sequenced monocot mt genomes. In addition, consistent with previous discoveries, there are three co-transcribed gene clusters–18S-5S rRNA, rps3-rpl16 and nad3-rps12–in P. dactylifera, which are highly conserved among all known mitochondrial genomes of angiosperms.
PMCID: PMC3360038  PMID: 22655034
19.  Transposable-Element Associated Small RNAs in Bombyx mori Genome 
PLoS ONE  2012;7(5):e36599.
Small RNAs are a group of regulatory RNA molecules that control gene expression at transcriptional or post-transcriptional levels among eukaryotes. The silkworm, Bombyx mori L., genome harbors abundant repetitive sequences derived from families of retrotransposons and transposons, which together constitute almost half of the genome space and provide ample resource for biogenesis of the three major small RNA families. We systematically discovered transposable-element (TE)-associated small RNAs in B. mori genome based on a deep RNA-sequencing strategy and the effort yielded 182, 788 and 4,990 TE-associated small RNAs in the miRNA, siRNA and piRNA species, respectively. Our analysis suggested that the three small RNA species preferentially associate with different TEs to create sequence and functional diversity, and we also show evidence that a Bombyx non-LTR retrotransposon, bm1645, alone contributes to the generation of TE-associated small RNAs in a very significant way. The fact that bm1645-associated small RNAs partially overlap with each other implies a possibility that this element may be modulated by different mechanisms to generate different products with diverse functions. Taken together, these discoveries expand the small RNA pool in B. mori genome and lead to new knowledge on the diversity and functional significance of TE-associated small RNAs.
PMCID: PMC3359762  PMID: 22662121
20.  Complete Genome and Transcriptomes of Streptococcus parasanguinis FW213: Phylogenic Relations and Potential Virulence Mechanisms 
PLoS ONE  2012;7(4):e34769.
Streptococcus parasanguinis, a primary colonizer of the tooth surface, is also an opportunistic pathogen for subacute endocarditis. The complete genome of strain FW213 was determined using the traditional shotgun sequencing approach and further refined by the transcriptomes of cells in early exponential and early stationary growth phases in this study. The transcriptomes also discovered 10 transcripts encoding known hypothetical proteins, one pseudogene, five transcripts matched to the Rfam and additional 87 putative small RNAs within the intergenic regions defined by the GLIMMER analysis. The genome contains five acquired genomic islands (GIs) encoding proteins which potentially contribute to the overall pathogenic capacity and fitness of this microbe. The differential expression of the GIs and various open reading frames outside the GIs at the two growth phases suggested that FW213 possess a range of mechanisms to avoid host immune clearance, to colonize host tissues, to survive within oral biofilms and to overcome various environmental insults. Furthermore, the comparative genome analysis of five S. parasanguinis strains indicates that albeit S. parasanguinis strains are highly conserved, variations in the genome content exist. These variations may reflect differences in pathogenic potential between the strains.
PMCID: PMC3329508  PMID: 22529932
21.  Genomic Analysis of the Multidrug-Resistant Acinetobacter baumannii Strain MDR-ZJ06 Widely Spread in China▿ 
Antimicrobial Agents and Chemotherapy  2011;55(10):4506-4512.
We previously reported that the multidrug-resistant (MDR) Acinetobacter baumannii strain MDR-ZJ06, belonging to European clone II, was widely spread in China. In this study, we report the whole-genome sequence of this clinically important strain. A 38.6-kb AbaR-type genomic resistance island (AbaR22) was identified in MDR-ZJ06. AbaR22 has a structure similar to those of the resistance islands found in A. baumannii strains AYE and AB0057, but it contained only a few antibiotic resistance genes. The region of resistant gene accumulation as previously described was not found in AbaR22. In the chromosome of the strain MDR-ZJ06, we identified the gene blaoxa-23 in a composite transposon (Tn2009). Tn2009 shared the backbone with other A. baumannii transponsons that harbor blaoxa-23, but it was bracketed by two ISAba1 elements which were transcribed in the same orientation. MDR-ZJ06 also expressed the armA gene on its plasmid pZJ06, and this gene has the same genetic environment as the armA gene of the Enterobacteriaceae. These results suggest variability of resistance acquisition even in closely related A. baumannii strains.
PMCID: PMC3187012  PMID: 21788470
22.  Complete Genome Analysis of Sulfobacillus acidophilus Strain TPY, Isolated from a Hydrothermal Vent in the Pacific Ocean 
Journal of Bacteriology  2011;193(19):5555-5556.
Sulfobacillus acidophilus strain TPY is a moderately thermoacidophilic bacterium originally isolated from a hydrothermal vent in the Pacific Ocean. Ferrous iron and sulfur oxidation in acidic environments in strain TPY have been confirmed. Here we report the genome sequence and annotation of the strain TPY, which is the first complete genome of Sulfobacillus acidophilus.
PMCID: PMC3187392  PMID: 21914875
23.  Genome Sequence of Agrobacterium tumefaciens Strain F2, a Bioflocculant-Producing Bacterium 
Journal of Bacteriology  2011;193(19):5531.
Agrobacterium tumefaciens F2 is an efficient bioflocculant-producing bacterium. But the genes related to the metabolic pathway of bioflocculant biosynthesis in strain F2 are unknown. We present the draft genome of A. tumefaciens F2. It could provide further insight into the biosynthetic mechanism of polysaccharide-like bioflocculant in strain F2.
PMCID: PMC3187402  PMID: 21914861
24.  Complete Genome Sequence of Alicyclobacillus acidocaldarius Strain Tc-4-1 
Journal of Bacteriology  2011;193(19):5602-5603.
Alicyclobacillus acidocaldarius strain Tc-4-1 was initially isolated from a hot spring in Tengchong, China. This organism is both thermophilic and acidophilic. It can produce heat- and acid-stable enzymes, such as amylase and esterase, which may be important in industry. Here we report the whole genome sequence of the strain.
PMCID: PMC3187405  PMID: 21914900
25.  Complete Genome Sequence of the Ureolytic Streptococcus salivarius Strain 57.I 
Journal of Bacteriology  2011;193(19):5596-5597.
Streptococcus salivarius 57.I is one of the most abundant and highly ureolytic bacteria in the human mouth. It can utilize urea as the sole nitrogen source via the activity of urease. Complete genome sequencing of S. salivarius 57.I revealed a chromosome and a phage which are absent in strain SK126.
PMCID: PMC3187406  PMID: 21914897

Results 1-25 (65)