Search tips
Search criteria

Results 1-25 (73)

Clipboard (0)

Select a Filter Below

Year of Publication
author:("Hu, songjiang")
1.  Evaluation of the optimal dosage of S-1 in adjuvant SOX chemotherapy for gastric cancer 
Oncology Letters  2014;9(3):1451-1457.
Gastric cancer (GC) is the second leading cause of cancer-related mortality worldwide. The usual treatment of GC consists of surgery with additional adjuvant chemotherapy. In the present study, the feasibility and safety of adjuvant S-1 plus oxaliplatin (SOX) chemotherapy for patients with GC and the optimal dosage of S-1 were determined. Eligible patients were randomly assigned to either arm A (30 cases) receiving 70 mg/m2 S-1 (in two seperate half doses) daily or arm B (30 cases) receiving 80 mg/m2 S-1 (in two seperate half doses) daily. The S-1 was administered twice daily for 14 days followed by a 7-day rest period for the third week. A total of 130 mg/m2 oxaliplatin was administered on day 1 every 3 weeks for each arm. The cumulative rates of the relative total administration dose of S-1 at 100% in the 6th treatment course was 71.4% [95% confidence interval (CI), 56.5–90.3%] in arm A, which was significantly higher than 21.4% (95% CI, 10.5–43.6%) in arm B (P=0.001). The most common grade 3/4 toxicities were neutropenia (19.6%), thrombocytopenia (19.6%) and vomiting (16.1%). Grade 3/4 thrombocytopenia was observed in 7.1% of patients in arm A and in 32.1% of patients in arm B (P=0.019). With regard to the adverse events induced by S-1 administration, the incidence of diarrhea (3.6 vs. 42.9%; P<0.001) was significantly higher in arm B than in arm A, as anticipated. Collectively, adjuvant SOX therapy for GC is feasible and safe, and when combined with 130 mg/m2 oxaliplatin, 70 mg/m2/day S-1 appears to the optimal dose.
PMCID: PMC4315063  PMID: 25663930
adjuvant chemotherapy; gastric cancer; S-1; oxaliplatin
2.  Transcriptome and Expression Profiling Analysis of the Hemocytes Reveals a Large Number of Immune-Related Genes in Mud Crab Scylla paramamosain during Vibrio parahaemolyticus Infection 
PLoS ONE  2014;9(12):e114500.
Mud crab Scylla paramamosain is an economically important marine species in China. However, frequent outbreaks of infectious diseases caused by marine bacteria, such as Vibrio parahaemolyticus, result in great economic losses.
Methodology/Principal Findings
Comparative de novo transcriptome analysis of S. paramamosain infected with V. parahaemolyticus was carried out to investigate the molecular mechanisms underlying the immune response to pathogenic bacteria by using the Illumina paired-end sequencing platform. A total of 52,934,042 clean reads from the hemocytes of V. parahaemolyticus-infected mud crabs and controls were obtained and assembled into 186,193 contigs. 59,120 unigenes were identified from 81,709 consensus sequences of mud crabs and 48,934 unigenes were matched proteins in the Nr or Swissprot databases. Among these, 10,566 unigenes belong to 3 categories of Gene Ontology, 25,349 to 30 categories of KEGG, and 15,191 to 25 categories of COG database, covering almost all functional categories. By using the Solexa/Illumina's DGE platform, 1213 differentially expressed genes (P<0.05), including 538 significantly up-regulated and 675 down-regulated, were detected in V. parahaemolyticus-infected crabs as compared to that in the controls. Transcript levels of randomly-chosen genes were further measured by quantitative real-time PCR to confirm the expression profiles. Many differentially expressed genes are involved in various immune processes, including stimulation of the Toll pathway, Immune Deficiency (IMD) pathway, Ras-regulated endocytosis, and proPO-activating system.
Analysis of the expression profile of crabs under infection provides invaluable new data for biological research in S. paramamosain, such as the identification of novel genes in the hemocytes during V. parahaemolyticus infection. These results will facilitate our comprehensive understanding of the mechanisms involved in the immune response to bacterial infection and will be helpful for diseases prevention in crab aquaculture.
PMCID: PMC4259333  PMID: 25486443
3.  Cassava genome from a wild ancestor to cultivated varieties 
Nature Communications  2014;5:5110.
Cassava is a major tropical food crop in the Euphorbiaceae family that has high carbohydrate production potential and adaptability to diverse environments. Here we present the draft genome sequences of a wild ancestor and a domesticated variety of cassava and comparative analyses with a partial inbred line. We identify 1,584 and 1,678 gene models specific to the wild and domesticated varieties, respectively, and discover high heterozygosity and millions of single-nucleotide variations. Our analyses reveal that genes involved in photosynthesis, starch accumulation and abiotic stresses have been positively selected, whereas those involved in cell wall biosynthesis and secondary metabolism, including cyanogenic glucoside formation, have been negatively selected in the cultivated varieties, reflecting the result of natural selection and domestication. Differences in microRNA genes and retrotransposon regulation could partly explain an increased carbon flux towards starch accumulation and reduced cyanogenic glucoside accumulation in domesticated cassava. These results may contribute to genetic improvement of cassava through better understanding of its biology.
Cassava is a major source of food in tropical and subtropical regions. Here the authors sequence the genomes of wild and domesticated cassava varieties and identify genes that have been selected for and against during the evolution and domestication of this economically important crop.
PMCID: PMC4214410  PMID: 25300236
4.  Transcriptome dynamics during human erythroid differentiation and development 
Genomics  2013;102(0):431-441.
To explore the mechanisms controlling erythroid differentiation and development, we analyzed the genome-wide transcription dynamics occurring during the differentiation of human embryonic stem cells (HESCs) into the erythroid lineage and development of embryonic to adult erythropoiesis using high throughput sequencing technology. HESCs and erythroid cells at three developmental stages: ESER (embryonic), FLER (fetal), and PBER (adult) were analyzed. Our findings revealed that the number of expressed genes decreased during differentiation, whereas the total expression intensity increased. At each of the three transitions (HESCs–ESERs, ESERs–FLERs, and FLERs–PBERs), many differentially expressed genes were observed, which were involved in maintaining pluripotency, early erythroid specification, rapid cell growth, and cell–cell adhesion and interaction. We also discovered dynamic networks and their central nodes in each transition. Our study provides a fundamental basis for further investigation of erythroid differentiation and development, and has implications in using ESERs for transfusion product in clinical settings.
PMCID: PMC4151266  PMID: 24121002
High-throughput RNA sequencing; Erythropoiesis; Cell differentiation; Development; Gene regulatory networks
5.  XELIRI compared with FOLFIRI as a second-line treatment in patients with metastatic colorectal cancer 
Oncology Letters  2014;8(4):1864-1872.
The aim of this study was to compare the efficacy, safety and survival rate of a treatment regimen comprising capecitabine plus irinotecan (XELIRI) to those of a standard regimen comprising leucovorin, fluorouracil and irinotecan (FOLFIRI), to determine the correlation among the inherited genetic variations in UGT1A1, UGT1A7 and UGT1A9. A total of 84 consecutive patients with histologically confirmed metastatic colorectal cancer (mCRC) were included in the study. All patients were treated with FOLFIRI or XELIRI. The median progression-free survival time was 4.4 months for FOLFIRI and 5.7 months for XELIRI (hazard ratio=1.35; 95% confidence interval, 0.83–2.21; P=0.22). When compared with FOLFIRI (6.34%), XELIRI was associated with lower rates of severe toxicity (3.29) (P=0.026) and similar disease control rates (69.57% for FOLFIRI and 61.11% for XELIRI; P=0.49). In total, 17 single nucleotide polymorphisms were identified, five of which revealed an association with grade 3/4 neutropenia, including UGT1A7*4; however, UGT1A1*28 and UGT1A1*6, which have been previously reported, were not significant. Additionally, H2 haplotypes, which include UGT1A9*22, and H5 and H7 haplotypes, which include UGT1A7*2, UGT1A7*3 and UGT1A7*4, were associated with a higher risk of severe neutropenia. In conclusion, XELIRI is an effective treatment regimen with acceptable response rates and tolerability for mCRC patients as a second-line treatment. Furthermore, inherited genetic variations in UGT1A1, UGT1A7 and UGT1A9 are associated with grade 3/4 neutropenia.
PMCID: PMC4156196  PMID: 25202427
FOLFIRI; XELIRI; UGT1A; polymorphism; haplotype
6.  Sequence and Expression Analyses of Ethylene Response Factors Highly Expressed in Latex Cells from Hevea brasiliensis 
PLoS ONE  2014;9(6):e99367.
The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors.
PMCID: PMC4074046  PMID: 24971876
7.  De Novo Characterization of the Spleen Transcriptome of the Large Yellow Croaker (Pseudosciaena crocea) and Analysis of the Immune Relevant Genes and Pathways Involved in the Antiviral Response 
PLoS ONE  2014;9(5):e97471.
The large yellow croaker (Pseudosciaena crocea) is an economically important marine fish in China. To understand the molecular basis for antiviral defense in this species, we used Illumia paired-end sequencing to characterize the spleen transcriptome of polyriboinosinic:polyribocytidylic acid [poly(I:C)]-induced large yellow croakers. The library produced 56,355,728 reads and assembled into 108,237 contigs. As a result, 15,192 unigenes were found from this transcriptome. Gene ontology analysis showed that 4,759 genes were involved in three major functional categories: biological process, cellular component, and molecular function. We further ascertained that numerous consensus sequences were homologous to known immune-relevant genes. Kyoto Encyclopedia of Genes and Genomes orthology mapping annotated 5,389 unigenes and identified numerous immune-relevant pathways. These immune-relevant genes and pathways revealed major antiviral immunity effectors, including but not limited to: pattern recognition receptors, adaptors and signal transducers, the interferons and interferon-stimulated genes, inflammatory cytokines and receptors, complement components, and B-cell and T-cell antigen activation molecules. Moreover, the partial genes of Toll-like receptor signaling pathway, RIG-I-like receptors signaling pathway, Janus kinase-Signal Transducer and Activator of Transcription (JAK-STAT) signaling pathway, and T-cell receptor (TCR) signaling pathway were found to be changed after poly(I:C) induction by real-time polymerase chain reaction (PCR) analysis, suggesting that these signaling pathways may be regulated by poly(I:C), a viral mimic. Overall, the antivirus-related genes and signaling pathways that were identified in response to poly(I:C) challenge provide valuable leads for further investigation of the antiviral defense mechanism in the large yellow croaker.
PMCID: PMC4018400  PMID: 24820969
8.  The genomes of four tapeworm species reveal adaptations to parasitism 
Nature  2013;496(7443):57-63.
Tapeworms cause debilitating neglected diseases that can be deadly and often require surgery due to ineffective drugs. Here we present the first analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115-141 megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have species-specific expansions of non-canonical heat shock proteins and families of known antigens; specialised detoxification pathways, and metabolism finely tuned to rely on nutrients scavenged from their hosts. We identify new potential drug targets, including those on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
PMCID: PMC3964345  PMID: 23485966
HSP70; parasitism; Cestoda; cysticercosis; echinococcosis; Platyhelminthes
9.  RiceWiki: a wiki-based database for community curation of rice genes 
Nucleic Acids Research  2013;42(Database issue):D1222-D1228.
Rice is the most important staple food for a large part of the world’s human population and also a key model organism for biological studies of crops as well as other related plants. Here we present RiceWiki (, a wiki-based, publicly editable and open-content platform for community curation of rice genes. Most existing related biological databases are based on expert curation; with the exponentially exploding volume of rice knowledge and other relevant data, however, expert curation becomes increasingly laborious and time-consuming to keep knowledge up-to-date, accurate and comprehensive, struggling with the flood of data and requiring a large number of people getting involved in rice knowledge curation. Unlike extant relevant databases, RiceWiki features harnessing collective intelligence in community curation of rice genes, quantifying users' contributions in each curated gene and providing explicit authorship for each contributor in any given gene, with the aim to exploit the full potential of the scientific community for rice knowledge curation. Based on community curation, RiceWiki bears the potential to make it possible to build a rice encyclopedia by and for the scientific community that harnesses community intelligence for collaborative knowledge curation, covers all aspects of biological knowledge and keeps evolving with novel knowledge.
PMCID: PMC3964990  PMID: 24136999
10.  Regulation of MIR Genes in Response to Abiotic Stress in Hevea brasiliensis 
Increasing demand for natural rubber (NR) calls for an increase in latex yield and also an extension of rubber plantations in marginal zones. Both harvesting and abiotic stresses lead to tapping panel dryness through the production of reactive oxygen species. Many microRNAs regulated during abiotic stress modulate growth and development. The objective of this paper was to study the regulation of microRNAs in response to different types of abiotic stress and hormone treatments in Hevea. Regulation of MIR genes differs depending on the tissue and abiotic stress applied. A negative co-regulation between HbMIR398b with its chloroplastic HbCuZnSOD target messenger is observed in response to salinity. The involvement of MIR gene regulation during latex harvesting and tapping panel dryness (TPD) occurrence is further discussed.
PMCID: PMC3821574  PMID: 24084713
gene expression; miRNA; MIR gene; abiotic stress; rubber tree; tapping panel dryness
11.  Dose-finding study on adjuvant chemotherapy with S-1 plus oxaliplatin for gastric cancer 
Gastric cancer (GC) is the fourth most common type of cancer, accounting for an estimated one million new cases annually worldwide. Locally advanced GC often recurs, even following curative surgical resection. Therefore, there is a need for an effective adjuvant chemotherapy regimen. The aim of this trial was to investigate the maximum tolerated dose (MTD) of S-1 when administered in combination with oxaliplatin in postoperative GC patients. Oxaliplatin was administered at a fixed dose of 130 mg/m2 on day 1. S-1 was administered from day 1 to 14 of a 3-week cycle and escalated by 10 mg/m2/day from 60 to 80 mg/m2/day. A total of 15 patients were enrolled in this study. No dose-limiting toxicities (DLTs) occurred at level 1 (S-1, 60 mg/m2; n=3). One case of DLT (grade 3 vomiting) occurred at level 2 (S-1, 70 mg/m2; n= 6), whereas 2 cases of grade 3 vomiting were observed at level 3 (S-1, 80 mg/m2; n=6). Based on these results, the MTD of S-1 was initially determined to be 70 mg/m2. Furthermore, we observed that cytochrome P450 2A6 (CYP2A6) 41349640C>G was associated with severe neutropenia (C/C vs. C/G vs. G/G = 0 vs. 33.33 vs. 100%; P=0.03297, Fisher’s exact test) during the entire course of the treatment.
PMCID: PMC3915807  PMID: 24649314
S-1; oxaliplatin; adjuvant chemotherapy; maximum-tolerated dose; cytochrome P450 2A6
12.  The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes 
PLoS ONE  2013;8(8):e69476.
Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes.
Methodology/Principal Findings
We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes.
The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
PMCID: PMC3734230  PMID: 23940520
13.  Insight into the specific virulence related genes and toxin-antitoxin virulent pathogenicity islands in swine streptococcosis pathogen Streptococcus equi ssp. zooepidemicus strain ATCC35246 
BMC Genomics  2013;14:377.
Streptococcus equi ssp. zooepidemicus (S. zooepidemicus) is an important pathogen causing swine streptococcosis in China. Pathogenicity islands (PAIs) of S. zooepidemicus have been transferred among bacteria through horizontal gene transfer (HGT) and play important roles in the adaptation and increased virulence of S. zooepidemicus. The present study used comparative genomics to examine the different pathogenicities of S. zooepidemicus.
Genome of S. zooepidemicus ATCC35246 (Sz35246) comprises 2,167,264-bp of a single circular chromosome, with a GC content of 41.65%. Comparative genome analysis of Sz35246, S. zooepidemicus MGCS10565 (Sz10565), Streptococcus equi. ssp. equi. 4047 (Se4047) and S. zooepidemicus H70 (Sz70) identified 320 Sz35246-specific genes, clustered into three toxin-antitoxin (TA) systems PAIs and one restriction modification system (RM system) PAI. These four acquired PAIs encode proteins that may contribute to the overall pathogenic capacity and fitness of this bacterium to adapt to different hosts. Analysis of the in vivo and in vitro transcriptomes of this bacterium revealed differentially expressed PAI genes and non-PAI genes, suggesting that Sz35246 possess mechanisms for infecting animals and adapting to a wide range of host environments. Analysis of the genome identified potential Sz35246 virulence genes. Genes of the Fim III operon were presumed to be involved in breaking the host-restriction of Sz35246.
Genome wide comparisons of Sz35246 with three other strains and transcriptome analysis revealed novel genes related to bacterial virulence and breaking the host-restriction. Four specific PAIs, which were judged to have been transferred into Sz35246 genome through HGT, were identified for the first time. Further analysis of the TA and RM systems in the PAIs will improve our understanding of the pathogenicity of this bacterium and could lead to the development of diagnostics and vaccines.
PMCID: PMC3750634  PMID: 23742619
14.  Mining genes involved in the stratification of Paris Polyphylla seeds using high-throughput embryo Transcriptome sequencing 
BMC Genomics  2013;14:358.
Paris polyphylla var. yunnanensis is an important medicinal plant. Seed dormancy is one of the main factors restricting artificial cultivation. The molecular mechanisms of seed dormancy remain unclear, and little genomic or transcriptome data are available for this plant.
In this study, massive parallel pyrosequencing on the Roche 454-GS FLX Titanium platform was used to generate a substantial sequence dataset for the P. polyphylla embryo. 369,496 high quality reads were obtained, ranging from 50 to 1146 bp, with a mean of 219 bp. These reads were assembled into 47,768 unigenes, which included 16,069 contigs and 31,699 singletons. Using BLASTX searches of public databases, 15,757 (32.3%) unique transcripts were identified. Gene Ontology and Cluster of Orthologous Groups of proteins annotations revealed that these transcripts were broadly representative of the P. polyphylla embryo transcriptome. The Kyoto Encyclopedia of Genes and Genomes assigned 5961 of the unique sequences to specific metabolic pathways. Relative expression levels analysis showed that eleven phytohormone-related genes and five other genes have different expression patterns in the embryo and endosperm in the seed stratification process.
Gene annotation and quantitative RT-PCR expression analysis identified 464 transcripts that may be involved in phytohormone catabolism and biosynthesis, hormone signal, seed dormancy, seed maturation, cell wall growth and circadian rhythms. In particular, the relative expression analysis of sixteen genes (CYP707A, NCED, GA20ox2, GA20ox3, ABI2, PP2C, ARP3, ARP7, IAAH, IAAS, BRRK, DRM, ELF1, ELF2, SFR6, and SUS) in embryo and endosperm and at two temperatures indicated that these related genes may be candidates for clarifying the molecular basis of seed dormancy in P. polyphlla var. yunnanensis.
PMCID: PMC3679829  PMID: 23718911
Embryo; Stratification; Seed dormancy; High-throughput sequencing; Paris polyphylla
15.  Detection and genotyping of restriction fragment associated polymorphisms in polyploid crops with a pseudo-reference sequence: a case study in allotetraploid Brassica napus 
BMC Genomics  2013;14:346.
The presence of homoeologous sequences and absence of a reference genome sequence make discovery and genotyping of single nucleotide polymorphisms (SNPs) more challenging in polyploid crops.
To address this challenge, we constructed reduced representation libraries (RRLs) for two Brassica napus inbred lines and their 91 doubled haploid (DH) progenies using a modified ddRADseq technique. A bioinformatics pipeline termed RFAPtools was developed to discover and genotype SNPs and presence/absence variations (PAVs). Using this pipeline, a pseudo-reference sequence (PRF) containing 180,991 sequence tags was constructed. By aligning sequence reads to the pseudo-reference sequence, allelic SNPs as well as PAVs were identified and genotyped with RFAPtools. Two parallel linkage maps, one SNP bin map containing 8,780 SNP loci and one PAV linkage map containing 12,423 dominant loci, were constructed. By aligning marker sequences to B. rapa sequence scaffolds, whose genome is available, we assigned 44 unassembled sequence scaffolds comprising 8.15 Mb onto the B. rapa chromosomes, and also identified 14 instances of misassembly and eight instances of mis-ordering sequence scaffolds.
These results indicate that the modified ddRADseq approach is a cost-effective and simple method to genotype tens of thousands SNPs and PAV markers in a polyploidy plant species. The results also demonstrated that RFAPtools developed in this study are powerful to mine allelic SNPs from homoeologous sequences in polyploids, therefore they are generally applicable in either diploid or polyploid species with or without a reference genome sequence.
PMCID: PMC3665465  PMID: 23706002
Polyploid crops; Brassica napus; Pseudo-reference sequence; Single nucleotide polymorphism; Presence/absence variation
16.  Digital Gene Expression Tag Profiling Analysis of the Gene Expression Patterns Regulating the Early Stage of Mouse Spermatogenesis 
PLoS ONE  2013;8(3):e58680.
Detailed characterization of the gene expression patterns in spermatogonia and primary spermatocytes is critical to understand the processes which occur prior to meiosis during normal spermatogenesis. The genome-wide expression profiles of mouse type B spermatogonia and primary spermatocytes were investigated using the Solexa/Illumina digital gene expression (DGE) system, a tag based high-throughput transcriptome sequencing method, and the developmental processes which occur during early spermatogenesis were systematically analyzed. Gene expression patterns vary significantly between mouse type B spermatogonia and primary spermatocytes. The functional analysis revealed that genes related to junction assembly, regulation of the actin cytoskeleton and pluripotency were most significantly differently expressed. Pathway analysis indicated that the Wnt non-canonical signaling pathway played a central role and interacted with the actin filament organization pathway during the development of spermatogonia. This study provides a foundation for further analysis of the gene expression patterns and signaling pathways which regulate the molecular mechanisms of early spermatogenesis.
PMCID: PMC3598852  PMID: 23554914
17.  Complete Genome Sequence of the Metabolically Versatile Halophilic Archaeon Haloferax mediterranei, a Poly(3-Hydroxybutyrate-co-3-Hydroxyvalerate) Producer 
Journal of Bacteriology  2012;194(16):4463-4464.
Haloferax mediterranei, an extremely halophilic archaeon, has shown promise for production of poly(3-hydroxybutyrate-co-3-hydroxyvalerate) (PHBV) from unrelated cheap carbon sources. Here we report the complete genome (3,904,707 bp) of H. mediterranei CGMCC 1.2087, consisting of one chromosome and three megaplasmids.
PMCID: PMC3416209  PMID: 22843593
18.  Comparative Transcriptome Analysis of the Accessory Sex Gland and Testis from the Chinese Mitten Crab (Eriocheir sinensis) 
PLoS ONE  2013;8(1):e53915.
The accessory sex gland (ASG) is an important component of the male reproductive system, which functions to enhance the fertility of spermatozoa during male reproduction. Certain proteins secreted by the ASG are known to bind to the spermatozoa membrane and affect its function. The ASG gene expression profile in Chinese mitten crab (Eriocheir sinensis) has not been extensively studied, and limited genetic research has been conducted on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for the ASG of E. sinensis using Illumina sequencing technology. This analysis yielded a total of 33,221,284 sequencing reads, including 2.6 Gb of total nucleotides. Reads were assembled into 85,913 contigs (average 218 bp), or 58,567 scaffold sequences (average 292 bp), that identified 37,955 unigenes (average 385 bp). We assembled all unigenes and compared them with the published testis transcriptome from E. sinensis. In order to identify which genes may be involved in ASG function, as it pertains to modification of spermatozoa, we compared the ASG and testis transcriptome of E. sinensis. Our analysis identified specific genes with both higher and lower tissue expression levels in the two tissues, and the functions of these genes were analyzed to elucidate their potential roles during maturation of spermatozoa. Availability of detailed transcriptome data from ASG and testis in E. sinensis can assist our understanding of the molecular mechanisms involved with spermatozoa conservation, transport, maturation and capacitation and potentially acrosome activation.
PMCID: PMC3547057  PMID: 23342039
19.  Gene and Genome Parameters of Mammalian Liver Circadian Genes (LCGs) 
PLoS ONE  2012;7(10):e46961.
The mammalian circadian system controls various physiology processes and behavior responses by regulating thousands of circadian genes with rhythmic expressions. In this study, we redefined circadian-regulated genes based on published results in the mouse liver and compared them with other gene groups defined relative to circadian regulations, especially the non-circadian-regulated genes expressed in liver at multiple molecular levels from gene position to protein expression based on integrative analyses of different datasets from the literature. Based on the intra-tissue analysis, the liver circadian genes or LCGs show unique features when compared to other gene groups. First, LCGs in general have less neighboring genes and larger in both genomic and 3′-UTR lengths but shorter in CDS (coding sequence) lengths. Second, LCGs have higher mRNA and protein abundance, higher temporal expression variations, and shorter mRNA half-life. Third, more than 60% of LCGs form major co-expression clusters centered in four temporal windows: dawn, day, dusk, and night. In addition, larger and smaller LCGs are found mainly expressed in the day and night temporal windows, respectively, and we believe that LCGs are well-partitioned into the gene expression regulatory network that takes advantage of gene size, expression constraint, and chromosomal architecture. Based on inter-tissue analysis, more than half of LCGs are ubiquitously expressed in multiple tissues but only show rhythmical expression in one or limited number of tissues. LCGs show at least three-fold lower expression variations across the temporal windows than those among different tissues, and this observation suggests that temporal expression variations regulated by the circadian system is relatively subtle as compared with the tissue expression variations formed during development. Taken together, we suggest that the circadian system selects gene parameters in a cost effective way to improve tissue-specific functions by adapting temporal variations from the environment over evolutionary time scales.
PMCID: PMC3468600  PMID: 23071677
20.  Replication-Associated Mutational Pressure (RMP) Governs Strand-Biased Compositional Asymmetry (SCA) and Gene Organization in Animal Mitochondrial Genomes 
Current Genomics  2012;13(1):28-36.
The nucleotide composition of the light (L-) and heavy (H-) strands of animal mitochondrial genomes is known to exhibit strand-biased compositional asymmetry (SCA). One of the possibilities is the existence of a replication-associated mutational pressure (RMP) that may introduce characteristic nucleotide changes among mitochondrial genomes of different animal lineages. Here, we discuss the influence of RMP on nucleotide and amino acid compositions as well as gene organization. Among animal mitochondrial genomes, RMP may represent the major force that compels the evolution of mitochondrial protein-coding genes, coupled with other process-based selective pressures, such as on components of translation machinery— tRNAs and their anticodons. Through comparative analyses of sequenced mitochondrial genomes among diverse animal lineages and literature reviews, we suggest a strong RMP effect, observed among invertebrate mitochondrial genes as compared to those of vertebrates, that is either a result of positive selection on the invertebrate or a relaxed selective pressure on the vertebrate mitochondrial genes.
PMCID: PMC3269014  PMID: 22942673
Function-based selection; mitochondrion genome; replication-associated mutational pressure; strand-biased compositional asymmetry.
21.  Comparative Analysis of the Genomes of Two Field Isolates of the Rice Blast Fungus Magnaporthe oryzae 
PLoS Genetics  2012;8(8):e1002869.
Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus.
Author Summary
Magnaporthe oryzae is the causal agent of rice blast that is mainly controlled with resistance cultivars. However, genetic variations in the pathogen often lead to overcoming R gene-mediated resistance in rice cultivars. In this study we sequenced two field isolates from China and Japan. In comparison with the laboratory strain that was previously sequenced, the field isolates have a similar genome size and overall genome structure. However, they have slightly more genes and contain a number of expanded gene families that are likely involved in plant-fungal interactions. Each of the isolates has specific genes, some of which affect virulence and some others are important for asexual development. The three strains differ noticeably in the distribution of transposon-like elements. Many of the transposable elements tend to be associated with isolate-specific or duplicated sequences. This study revealed genetic factors involved in genome variation of the rice blast fungus.
PMCID: PMC3410873  PMID: 22876203
22.  The Organelle Genomes of Hassawi Rice (Oryza sativa L.) and Its Hybrid in Saudi Arabia: Genome Variation, Rearrangement, and Origins 
PLoS ONE  2012;7(7):e42041.
Hassawi rice (Oryza sativa L.) is a landrace adapted to the climate of Saudi Arabia, characterized by its strong resistance to soil salinity and drought. Using high quality sequencing reads extracted from raw data of a whole genome sequencing project, we assembled both chloroplast (cp) and mitochondrial (mt) genomes of the wild-type Hassawi rice (Hassawi-1) and its dwarf hybrid (Hassawi-2). We discovered 16 InDels (insertions and deletions) but no SNP (single nucleotide polymorphism) is present between the two Hassawi cp genomes. We identified 48 InDels and 26 SNPs in the two Hassawi mt genomes and a new type of sequence variation, termed reverse complementary variation (RCV) in the rice cp genomes. There are two and four RCVs identified in Hassawi-1 when compared to 93–11 (indica) and Nipponbare (japonica), respectively. Microsatellite sequence analysis showed there are more SSRs in the genic regions of both cp and mt genomes in the Hassawi rice than in the other rice varieties. There are also large repeats in the Hassawi mt genomes, with the longest length of 96,168 bp and 96,165 bp in Hassawi-1 and Hassawi-2, respectively. We believe that frequent DNA rearrangement in the Hassawi mt and cp genomes indicate ongoing dynamic processes to reach genetic stability under strong environmental pressures. Based on sequence variation analysis and the breeding history, we suggest that both Hassawi-1 and Hassawi-2 originated from the Indonesian variety Peta since genetic diversity between the two Hassawi cultivars is very low albeit an unknown historic origin of the wild-type Hassawi rice.
PMCID: PMC3409126  PMID: 22870184
23.  Metagenomic Insights into the Fibrolytic Microbiome in Yak Rumen 
PLoS ONE  2012;7(7):e40430.
The rumen hosts one of the most efficient microbial systems for degrading plant cell walls, yet the predominant cellulolytic proteins and fibrolytic mechanism(s) remain elusive. Here we investigated the cellulolytic microbiome of the yak rumen by using a combination of metagenome-based and bacterial artificial chromosome (BAC)-based functional screening approaches. Totally 223 fibrolytic BAC clones were pyrosequenced and 10,070 ORFs were identified. Among them 150 were annotated as the glycoside hydrolase (GH) genes for fibrolytic proteins, and the majority (69%) of them were clustered or linked with genes encoding related functions. Among the 35 fibrolytic contigs of >10 Kb in length, 25 were derived from Bacteroidetes and four from Firmicutes. Coverage analysis indicated that the fibrolytic genes on most Bacteroidetes-contigs were abundantly represented in the metagenomic sequences, and they were frequently linked with genes encoding SusC/SusD-type outer-membrane proteins. GH5, GH9, and GH10 cellulase/hemicellulase genes were predominant, but no GH48 exocellulase gene was found. Most (85%) of the cellulase and hemicellulase proteins possessed a signal peptide; only a few carried carbohydrate-binding modules, and no cellulosomal domains were detected. These findings suggest that the SucC/SucD-involving mechanism, instead of one based on cellulosomes or the free-enzyme system, serves a major role in lignocellulose degradation in yak rumen. Genes encoding an endoglucanase of a novel GH5 subfamily occurred frequently in the metagenome, and the recombinant proteins encoded by the genes displayed moderate Avicelase in addition to endoglucanase activities, suggesting their important contribution to lignocellulose degradation in the exocellulase-scarce rumen.
PMCID: PMC3396655  PMID: 22808161
24.  EvolView, an online tool for visualizing, annotating and managing phylogenetic trees 
Nucleic Acids Research  2012;40(Web Server issue):W569-W572.
EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at:
PMCID: PMC3394307  PMID: 22695796
25.  A Complete Sequence and Transcriptomic Analyses of Date Palm (Phoenix dactylifera L.) Mitochondrial Genome 
PLoS ONE  2012;7(5):e37164.
Based on next-generation sequencing data, we assembled the mitochondrial (mt) genome of date palm (Phoenix dactylifera L.) into a circular molecule of 715,001 bp in length. The mt genome of P. dactylifera encodes 38 proteins, 30 tRNAs, and 3 ribosomal RNAs, which constitute a gene content of 6.5% (46,770 bp) over the full length. The rest, 93.5% of the genome sequence, is comprised of cp (chloroplast)-derived (10.3% with respect to the whole genome length) and non-coding sequences. In the non-coding regions, there are 0.33% tandem and 2.3% long repeats. Our transcriptomic data from eight tissues (root, seed, bud, fruit, green leaf, yellow leaf, female flower, and male flower) showed higher gene expression levels in male flower, root, bud, and female flower, as compared to four other tissues. We identified 120 potential SNPs among three date palm cultivars (Khalas, Fahal, and Sukry), and successfully found seven SNPs in the coding sequences. A phylogenetic analysis, based on 22 conserved genes of 15 representative plant mitochondria, showed that P. dactylifera positions at the root of all sequenced monocot mt genomes. In addition, consistent with previous discoveries, there are three co-transcribed gene clusters–18S-5S rRNA, rps3-rpl16 and nad3-rps12–in P. dactylifera, which are highly conserved among all known mitochondrial genomes of angiosperms.
PMCID: PMC3360038  PMID: 22655034

Results 1-25 (73)