Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Overexpression of the soybean transcription factor GmDof4 significantly enhances the lipid content of Chlorella ellipsoidea 
Biotechnology for Biofuels  2014;7(1):128.
The lipid content of microalgae is regarded as an important indicator for biodiesel. Many attempts have been made to increase the lipid content of microalgae through biochemical and genetic engineering. Significant lipid accumulation in microalgae has been achieved using biochemical engineering, such as nitrogen starvation, but the cell growth was severely limited. However, enrichment of lipid content in microalgae by genetic engineering is anticipated. In this study, GmDof4 from soybean (Glycine max), a transcription factor affecting the lipid content in Arabidopsis, was transferred into Chlorella ellipsoidea. We then investigated the molecular mechanism underlying the enhancement of the lipid content of transformed C. ellipsoidea.
We constructed a plant expression vector, pGmDof4, and transformed GmDof4 into C. ellipsoidea by electroporation. The resulting expression of GmDof4 significantly enhanced the lipid content by 46.4 to 52.9%, but did not affect the growth rate of the host cells under mixotrophic culture conditions. Transcriptome profiles indicated that 1,076 transcripts were differentially regulated: of these, 754 genes were significantly upregulated and 322 genes were significantly downregulated in the transgenic strains under mixotrophic culture conditions. There are 22 significantly regulated genes (|log2 ratio| >1) involved in lipid and fatty acid metabolism. Quantitative real-time PCR and an enzyme activity assay revealed that GmDof4 significantly up-regulated the gene expression and enzyme activity of acetyl-coenzyme A carboxylase, a key enzyme for fatty acid synthesis, in transgenic C. ellipsoidea cells.
The hetero-expression of a transcription factor GmDof4 gene from soybean can significantly increase the lipid content but not affect the growth rate of C. ellipsoidea under mixotrophic culture conditions. The increase in lipid content could be attributed to the large number of genes with regulated expression. In particular, the acetyl-coenzyme A carboxylase gene expression and enzyme activity were significantly upregulated in the transgenic cells. Our research provides a new way to increase the lipid content of microalgae by introducing a specific transcription factor to microalgae strains that can be used for the biofuel and food industries.
Electronic supplementary material
The online version of this article (doi:10.1186/s13068-014-0128-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4159510  PMID: 25246944
Microalgae; Chlorella ellipsoidea; Transcription factor; Lipid accumulation; RNA-seq; Acetyl-coenzyme A carboxylase
2.  The sequence and de novo assembly of the giant panda genome 
Li, Ruiqiang | Fan, Wei | Tian, Geng | Zhu, Hongmei | He, Lin | Cai, Jing | Huang, Quanfei | Cai, Qingle | Li, Bo | Bai, Yinqi | Zhang, Zhihe | Zhang, Yaping | Wang, Wen | Li, Jun | Wei, Fuwen | Li, Heng | Jian, Min | Li, Jianwen | Zhang, Zhaolei | Nielsen, Rasmus | Li, Dawei | Gu, Wanjun | Yang, Zhentao | Xuan, Zhaoling | Ryder, Oliver A. | Leung, Frederick Chi-Ching | Zhou, Yan | Cao, Jianjun | Sun, Xiao | Fu, Yonggui | Fang, Xiaodong | Guo, Xiaosen | Wang, Bo | Hou, Rong | Shen, Fujun | Mu, Bo | Ni, Peixiang | Lin, Runmao | Qian, Wubin | Wang, Guodong | Yu, Chang | Nie, Wenhui | Wang, Jinhuan | Wu, Zhigang | Liang, Huiqing | Min, Jiumeng | Wu, Qi | Cheng, Shifeng | Ruan, Jue | Wang, Mingwei | Shi, Zhongbin | Wen, Ming | Liu, Binghang | Ren, Xiaoli | Zheng, Huisong | Dong, Dong | Cook, Kathleen | Shan, Gao | Zhang, Hao | Kosiol, Carolin | Xie, Xueying | Lu, Zuhong | Zheng, Hancheng | Li, Yingrui | Steiner, Cynthia C. | Lam, Tommy Tsan-Yuk | Lin, Siyuan | Zhang, Qinghui | Li, Guoqing | Tian, Jing | Gong, Timing | Liu, Hongde | Zhang, Dejin | Fang, Lin | Ye, Chen | Zhang, Juanbin | Hu, Wenbo | Xu, Anlong | Ren, Yuanyuan | Zhang, Guojie | Bruford, Michael W. | Li, Qibin | Ma, Lijia | Guo, Yiran | An, Na | Hu, Yujie | Zheng, Yang | Shi, Yongyong | Li, Zhiqiang | Liu, Qing | Chen, Yanling | Zhao, Jing | Qu, Ning | Zhao, Shancen | Tian, Feng | Wang, Xiaoling | Wang, Haiyin | Xu, Lizhi | Liu, Xiao | Vinar, Tomas | Wang, Yajun | Lam, Tak-Wah | Yiu, Siu-Ming | Liu, Shiping | Zhang, Hemin | Li, Desheng | Huang, Yan | Wang, Xia | Yang, Guohua | Jiang, Zhi | Wang, Junyi | Qin, Nan | Li, Li | Li, Jingxiang | Bolund, Lars | Kristiansen, Karsten | Wong, Gane Ka-Shu | Olson, Maynard | Zhang, Xiuqing | Li, Songgang | Yang, Huanming | Wang, Jian | Wang, Jun
Nature  2009;463(7279):311-317.
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
PMCID: PMC3951497  PMID: 20010809
3.  Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology 
BMC Genomics  2013;14(1):711.
Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.
We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.
Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users.
PMCID: PMC4046676  PMID: 24134808
Next-generation sequencing; Gap filling; Genome assembly
4.  Identification of medium-sized genomic deletions with low coverage, mate-paired restricted tags 
BMC Genomics  2013;14:51.
Genomic deletions are known to be widespread in many species. Variant sequencing-based approaches for identifying deletions have been developed, but their powers to detect those deletions that affect medium-sized regions are limited when the sequencing coverage is low.
We present a cost-effective method for identifying medium-sized deletions in genomic regions with low genomic coverage. Two mate-paired libraries were separately constructed from human cancerous tissue to generate paired short reads (ditags) from restriction fragments digested with a 4-base restriction enzyme. A total of 3 Gb of paired reads (1.0× genome size) was collected, and 175 deletions were inferred by identifying the ditags with disorder alignments to the reference genome sequence. Sanger sequencing results confirmed an overall detection accuracy of 95%. Good reproducibility was verified by the deletions that were detected by both libraries.
We provide an approach to accurately identify medium-sized deletions in large genomes with low sequence coverage. It can be applied in studies of comparative genomics and in the identification of germline and somatic variants.
PMCID: PMC3608957  PMID: 23347462
Medium-sized deletion; Restriction enzymes; Next generation sequencing; Structural variation
5.  Reference-Free Comparative Genomics of 174 Chloroplasts 
PLoS ONE  2012;7(11):e48995.
Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.
PMCID: PMC3502452  PMID: 23185288
6.  Deciphering neo-sex and B chromosome evolution by the draft genome of Drosophila albomicans 
BMC Genomics  2012;13:109.
Drosophila albomicans is a unique model organism for studying both sex chromosome and B chromosome evolution. A pair of its autosomes comprising roughly 40% of the whole genome has fused to the ancient X and Y chromosomes only about 0.12 million years ago, thereby creating the youngest and most gene-rich neo-sex system reported to date. This species also possesses recently derived B chromosomes that show non-Mendelian inheritance and significantly influence fertility.
We sequenced male flies with B chromosomes at 124.5-fold genome coverage using next-generation sequencing. To characterize neo-Y specific changes and B chromosome sequences, we also sequenced inbred female flies derived from the same strain but without B's at 28.5-fold.
We assembled a female genome and placed 53% of the sequence and 85% of the annotated proteins into specific chromosomes, by comparison with the 12 Drosophila genomes. Despite its very recent origin, the non-recombining neo-Y chromosome shows various signs of degeneration, including a significant enrichment of non-functional genes compared to the neo-X, and an excess of tandem duplications relative to other chromosomes. We also characterized a B-chromosome linked scaffold that contains an actively transcribed unit and shows sequence similarity to the subcentromeric regions of both the ancient X and the neo-X chromosome.
Our results provide novel insights into the very early stages of sex chromosome evolution and B chromosome origination, and suggest an unprecedented connection between the births of these two systems in D. albomicans.
PMCID: PMC3353239  PMID: 22439699
Drosophila albomicans; neo-sex chromosome; B chromosome
7.  The diploid genome sequence of an Asian individual 
Nature  2008;456(7218):60-65.
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
PMCID: PMC2716080  PMID: 18987735
8.  The Sequence Alignment/Map format and SAMtools 
Bioinformatics  2009;25(16):2078-2079.
Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
PMCID: PMC2723002  PMID: 19505943
9.  TreeFam: 2008 Update 
Nucleic Acids Research  2007;36(Database issue):D735-D740.
TreeFam ( was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.
PMCID: PMC2238856  PMID: 18056084
10.  PigGIS: Pig Genomic Informatics System 
Nucleic Acids Research  2006;35(Database issue):D654-D657.
Pig Genomic Information System (PigGIS) is a web-based depository of pig (Sus scrofa) genomic learning mainly engineered for biomedical research to locate pig genes from their human homologs and position single nucleotide polymorphisms (SNPs) in different pig populations. It utilizes a variety of sequence data, including whole genome shotgun (WGS) reads and expressed sequence tags (ESTs), and achieves a successful mapping solution to the low-coverage genome problem. With the data presently available, we have identified a total of 15 700 pig consensus sequences covering 18.5 Mb of the homologous human exons. We have also recovered 18 700 SNPs and 20 800 unique 60mer oligonucleotide probes for future pig genome analyses. PigGIS can be freely accessed via the web at and .
PMCID: PMC1669765  PMID: 17090590
11.  TreeFam: a curated database of phylogenetic trees of animal gene families 
Nucleic Acids Research  2005;34(Database issue):D572-D580.
TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins from UniProt; ∼40–85% of proteins encoded in the fully sequenced animal genomes are included in TreeFam. TreeFam is freely available at and .
PMCID: PMC1347480  PMID: 16381935

Results 1-11 (11)