Search tips
Search criteria

Results 1-25 (74)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  The Seventh Asia Pacific Bioinformatics Conference (APBC2009) 
BMC Bioinformatics  2009;10(Suppl 1):S1.
PMCID: PMC2648764  PMID: 19208108
2.  OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds 
Nucleic Acids Research  2013;41(10):5149-5163.
A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (∼14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows–Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at
PMCID: PMC3664805  PMID: 23571760
3.  A common set of distinct features that characterize noncoding RNAs across multiple species 
Nucleic Acids Research  2014;43(1):104-114.
To find signature features shared by various ncRNA sub-types and characterize novel ncRNAs, we have developed a method, RNAfeature, to investigate >600 sets of genomic and epigenomic data with various evolutionary and biophysical scores. RNAfeature utilizes a fine-tuned intra-species wrapper algorithm that is followed by a novel feature selection strategy across species. It considers long distance effect of certain features (e.g. histone modification at the promoter region). We finally narrow down on 10 informative features (including sequences, structures, expression profiles and epigenetic signals). These features are complementary to each other and as a whole can accurately distinguish canonical ncRNAs from CDSs and UTRs (accuracies: >92% in human, mouse, worm and fly). Moreover, the feature pattern is conserved across multiple species. For instance, the supervised 10-feature model derived from animal species can predict ncRNAs in Arabidopsis (accuracy: 82%). Subsequently, we integrate the 10 features to define a set of noncoding potential scores, which can identify, evaluate and characterize novel noncoding RNAs. The score covers all transcribed regions (including unconserved ncRNAs), without requiring assembly of the full-length transcripts. Importantly, the noncoding potential allows us to identify and characterize potential functional domains with feature patterns similar to canonical ncRNAs (e.g. tRNA, snRNA, miRNA, etc) on ∼70% of human long ncRNAs (lncRNAs).
PMCID: PMC4288202  PMID: 25505163
4.  Chd5 orchestrates chromatin remodeling during sperm development 
Nature communications  2014;5:3812.
One of the most remarkable chromatin remodeling processes occurs during spermiogenesis, the post-meiotic phase of sperm development during which histones are replaced with sperm-specific protamines to repackage the genome into the highly compact chromatin structure of mature sperm. Here we identify Chromodomain helicase DNA binding protein 5 (Chd5) as a master regulator of the histone-to-protamine chromatin remodeling process. Chd5 deficiency leads to defective sperm chromatin compaction and male infertility in mice, mirroring the observation of low CHD5 expression in testes of infertile men. Chd5 orchestrates a cascade of molecular events required for histone removal and replacement, including histone 4 (H4) hyperacetylation, histone variant expression, nucleosome eviction, and DNA damage repair. Chd5 deficiency also perturbs expression of transition proteins (Tnp1/Tnp2) and protamines (Prm1/2). These findings define Chd5 as a multi-faceted mediator of histone-to-protamine replacement and depict the cascade of molecular events underlying chromatin remodeling during this process of extensive chromatin remodeling.
PMCID: PMC4151132  PMID: 24818823
5.  CDKN2A/p16 inactivation mechanisms and their relationship to smoke exposure and molecular features in non-small cell lung cancer 
CDKN2A(p16) inactivation is common in lung cancer and occurs via homozygous deletions (HD), methylation of promoter region, or point mutations. While p16 promoter methylation has been linked to KRAS mutation and smoking, the associations between p16 inactivation mechanisms and other common genetic mutations and smoking status are still controversial or unknown.
We determined all three p16 inactivation mechanisms using multiple methodologies for genomic status, methylation, RNA and protein expression, and correlated them with EGFR, KRAS, STK11 mutations and smoking status in 40 cell lines and 45 tumor samples of primary NSCLC. We also performed meta-analyses to investigate the impact of smoke exposure on p16 inactivation.
p16 inactivation was the major mechanism of RB pathway perturbation in NSCLC, with HD being the most frequent method, followed by methylation and the rarer point mutations. Inactivating mechanisms were tightly correlated with loss of mRNA and protein expression. p16 inactivation occurred at comparable frequencies regardless of mutational status of EGFR, KRAS and STK11, however, the major inactivation mechanism of p16 varied. p16 methylation was linked to KRAS mutation but was mutually exclusive with EGFR mutation. Cell lines and tumor samples demonstrated similar results. Our meta-analyses confirmed a modest positive association between p16 promoter methylation and smoking.
Our results confirm that all of the inactivation mechanisms are truly associated with loss of gene product and identify specific associations between p16 inactivation mechanisms and other genetic changes and smoking status.
PMCID: PMC3951422  PMID: 24077454
p16; CDKN2A; inactivation; homozygous deletion; methylation; lung cancer; adenocarcinoma; meta-analysis
6.  The Transcription Factor Foxo1 Controls Central Memory CD8+ T Cell Responses to Infection 
Immunity  2013;39(2):10.1016/j.immuni.2013.07.013.
Memory T cells protect hosts from pathogen reinfection, but how these cells emerge from a pool of antigen-experienced T cells is unclear. Here we show that mice lacking the transcription factor Foxo1 in activated CD8+ T cells had defective secondary, but not primary, responses to Listeria monocytogenes infection. Compared to short-lived effector T cells, memory precursor T cells expressed higher amounts of Foxo1, which promoted their generation and maintenance. Chromatin immunoprecipitation sequencing experiments revealed the transcription factor Tcf7 and the chemokine receptor Ccr7 as Foxo1-bound target genes, which have critical functions in central memory T cell differentiation and trafficking. These findings demonstrate that Foxo1 is selectively incorporated into the genetic program that regulates memory CD8+ T cell responses to infection.
PMCID: PMC3809840  PMID: 23932570
7.  Epigenomic Analysis of Multi-lineage Differentiation of Human Embryonic Stem Cells 
Cell  2013;153(5):1134-1148.
Epigenetic mechanisms have been proposed to play crucial roles in mammalian development, but their precise functions are only partially understood. To investigate epigenetic regulation of embryonic development, we differentiated human embryonic stem cells into mesendoderm, neural progenitor cells, trophoblast-like cells, and mesenchymal stem cells, and systematically characterized DNA methylation, chromatin modifications, and the transcriptome in each lineage. We found that promoters that are active in early developmental stages tend to be CG rich and mainly engage H3K27me3 upon silencing in non-expressing lineages. By contrast, promoters for genes expressed preferentially at later stages are often CG poor and primarily employ DNA methylation upon repression. Interestingly, the early developmental regulatory genes are often located in large genomic domains that are generally devoid of DNA methylation in most lineages, which we termed DNA methylation valleys (DMVs). Our results suggest that distinct epigenetic mechanisms regulate early and late stages of ES cell differentiation.
PMCID: PMC3786220  PMID: 23664764
8.  Assembly and Validation of Versatile Transcription Activator-Like Effector Libraries 
Scientific Reports  2014;4:4857.
The ability to perturb individual genes in genome-wide experiments has been instrumental in unraveling cellular and disease properties. Here we introduce, describe the assembly, and demonstrate the use of comprehensive and versatile transcription activator-like effector (TALE) libraries. As a proof of principle, we built an 11-mer library that covers all possible combinations of the nucleotides that determine the TALE-DNA binding specificity. We demonstrate the versatility of the methodology by constructing a constraint library, customized to bind to a known p53 motif. To verify the functionality in assays, we applied the 11-mer library in yeast-one-hybrid screens to discover TALEs that activate human SCN9A and miR-34b respectively. Additionally, we performed a genome-wide screen using the complete 11-mer library to confirm known genes that confer cycloheximide resistance in yeast. Considering the highly modular nature of TALEs and the versatility and ease of constructing these libraries we envision broad implications for high-throughput genomic assays.
PMCID: PMC4010924  PMID: 24798576
9.  ModuleRole: A Tool for Modulization, Role Determination and Visualization in Protein-Protein Interaction Networks 
PLoS ONE  2014;9(5):e94608.
Rapidly increasing amounts of (physical and genetic) protein-protein interaction (PPI) data are produced by various high-throughput techniques, and interpretation of these data remains a major challenge. In order to gain insight into the organization and structure of the resultant large complex networks formed by interacting molecules, using simulated annealing, a method based on the node connectivity, we developed ModuleRole, a user-friendly web server tool which finds modules in PPI network and defines the roles for every node, and produces files for visualization in Cytoscape and Pajek. For given proteins, it analyzes the PPI network from BioGRID database, finds and visualizes the modules these proteins form, and then defines the role every node plays in this network, based on two topological parameters Participation Coefficient and Z-score. This is the first program which provides interactive and very friendly interface for biologists to find and visualize modules and roles of proteins in PPI network. It can be tested online at the website, which is free and open to all users and there is no login requirement, with demo data provided by “User Guide” in the menu Help. Non-server application of this program is considered for high-throughput data with more than 200 nodes or user’s own interaction datasets. Users are able to bookmark the web link to the result page and access at a later time. As an interactive and highly customizable application, ModuleRole requires no expert knowledge in graph theory on the user side and can be used in both Linux and Windows system, thus a very useful tool for biologist to analyze and visualize PPI networks from databases such as BioGRID.
ModuleRole is implemented in Java and C, and is freely available at Supplementary information (user guide, demo data) is also available at this website. API for ModuleRole used for this program can be obtained upon request.
PMCID: PMC4006751  PMID: 24788790
10.  Nucleosome eviction and multiple co-factor binding predict estrogen-receptor-alpha-associated long-range interactions 
Nucleic Acids Research  2014;42(11):6935-6944.
Many enhancers regulate their target genes via long-distance interactions. High-throughput experiments like ChIA-PET have been developed to map such largely cell-type-specific interactions between cis-regulatory elements genome-widely. In this study, we integrated multiple types of data in order to reveal the general hidden patterns embedded in the ChIA-PET data. We found characteristic distance features related to promoter–promoter, enhancer–enhancer and insulator–insulator interactions. Although a protein may have many binding sites along the genome, our hypothesis is that those sites that share certain open chromatin structure can accommodate relatively larger protein complex consisting of specific regulatory and ‘bridging’ factors, and may be more likely to form robust long-range deoxyribonucleic acid (DNA) loops. This hypothesis was validated in the estrogen receptor alpha (ERα) ChIA-PET data. An efficient classifier was built to predict ERα-associated long-range interactions solely from the related ChIP-seq data, hence linking distal ERα-dependent enhancers to their target genes. We further applied the classifier to generate additional novel interactions, which were undetected in the original ChIA-PET paper but were validated by other independent experiments. Our work provides a new insight into the long-range chromatin interactions through deeper and integrative ChIA-PET data analysis and demonstrates DNA looping predictability from ordinary ChIP-seq data.
PMCID: PMC4066761  PMID: 24782518
11.  Computational comparison of two mouse draft genomes and the human golden path 
Genome Biology  2002;4(1):R1.
A comparison of the newly completed, publicly available, genome sequence of the mouse with the prior sequence of the mouse from Celera Genomics Inc. and with the human genome provides a consensus view of the mouse and important insights into human gene numbers.
The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods.
We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes.
The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics.
PMCID: PMC151282  PMID: 12537546
12.  Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells 
Nucleic Acids Research  2013;42(5):3009-3016.
DNA methylation is an important defense and regulatory mechanism. In mammals, most DNA methylation occurs at CpG sites, and asymmetric non-CpG methylation has only been detected at appreciable levels in a few cell types. We are the first to systematically study the strand-specific distribution of non-CpG methylation. With the divide-and-compare strategy, we show that CHG and CHH methylation are not intrinsically different in human embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs). We also find that non-CpG methylation is skewed between the two strands in introns, especially at intron boundaries and in highly expressed genes. Controlling for the proximal sequences of non-CpG sites, we show that the skew of non-CpG methylation in introns is mainly guided by sequence skew. By studying subgroups of transposable elements, we also found that non-CpG methylation is distributed in a strand-specific manner in both short interspersed nuclear elements (SINE) and long interspersed nuclear elements (LINE), but not in long terminal repeats (LTR). Finally, we show that on the antisense strand of Alus, a non-CpG site just downstream of the A-box is highly methylated. Together, the divide-and-compare strategy leads us to identify regions with strand-specific distributions of non-CpG methylation in humans.
PMCID: PMC3950701  PMID: 24343027
13.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data 
BMC Genomics  2013;14:774.
DNA methylation is an important epigenetic modification involved in many biological processes. Bisulfite treatment coupled with high-throughput sequencing provides an effective approach for studying genome-wide DNA methylation at base resolution. Libraries such as whole genome bisulfite sequencing (WGBS) and reduced represented bisulfite sequencing (RRBS) are widely used for generating DNA methylomes, demanding efficient and versatile tools for aligning bisulfite sequencing data.
We have developed BS-Seeker2, an updated version of BS Seeker, as a full pipeline for mapping bisulfite sequencing data and generating DNA methylomes. BS-Seeker2 improves mappability over existing aligners by using local alignment. It can also map reads from RRBS library by building special indexes with improved efficiency and accuracy. Moreover, BS-Seeker2 provides additional function for filtering out reads with incomplete bisulfite conversion, which is useful in minimizing the overestimation of DNA methylation levels. We also defined CGmap and ATCGmap file formats for full representations of DNA methylomes, as part of the outputs of BS-Seeker2 pipeline together with BAM and WIG files.
Our evaluations on the performance show that BS-Seeker2 works efficiently and accurately for both WGBS data and RRBS data. BS-Seeker2 is freely available at and the Galaxy server.
PMCID: PMC3840619  PMID: 24206606
DNA methylation; Bisulfite sequencing aligner; WGBS; RRBS; BS Seeker; Bisulfite conversion failure; Galaxy toolshed
14.  Integrated omics study delineates the dynamics of lipid droplets in Rhodococcus opacus PD630 
Nucleic Acids Research  2013;42(2):1052-1064.
Rhodococcus opacus strain PD630 (R. opacus PD630), is an oleaginous bacterium, and also is one of few prokaryotic organisms that contain lipid droplets (LDs). LD is an important organelle for lipid storage but also intercellular communication regarding energy metabolism, and yet is a poorly understood cellular organelle. To understand the dynamics of LD using a simple model organism, we conducted a series of comprehensive omics studies of R. opacus PD630 including complete genome, transcriptome and proteome analysis. The genome of R. opacus PD630 encodes 8947 genes that are significantly enriched in the lipid transport, synthesis and metabolic, indicating a super ability of carbon source biosynthesis and catabolism. The comparative transcriptome analysis from three culture conditions revealed the landscape of gene-altered expressions responsible for lipid accumulation. The LD proteomes further identified the proteins that mediate lipid synthesis, storage and other biological functions. Integrating these three omics uncovered 177 proteins that may be involved in lipid metabolism and LD dynamics. A LD structure-like protein LPD06283 was further verified to affect the LD morphology. Our omics studies provide not only a first integrated omics study of prokaryotic LD organelle, but also a systematic platform for facilitating further prokaryotic LD research and biofuel development.
PMCID: PMC3902926  PMID: 24150943
15.  Neural Potential of a Stem Cell Population in the Hair Follicle 
Cell cycle (Georgetown, Tex.)  2007;6(17):2161-2170.
The bulge region of the hair follicle serves as a repository for epithelial stem cells that can regenerate the follicle in each hair growth cycle and contribute to epidermis regeneration upon injury. Here we describe a population of multipotential stem cells in the hair follicle bulge region; these cells can be identified by fluorescence in transgenic nestin-GFP mice. The morphological features of these cells suggest that they maintain close associations with each other and with the surrounding niche. Upon explantation, these cells can give rise to neurosphere-like structures in vitro. When these cells are permitted to differentiate, they produce several cell types, including cells with neuronal, astrocytic, oligodendrocytic, smooth muscle, adipocytic, and other phenotypes. Furthermore, upon implantation into the developing nervous system of chick, these cells generate neuronal cells in vivo. We used transcriptional profiling to assess the relationship between these cells and embryonic and postnatal neural stem cells and to compare them with other stem cell populations of the bulge. Our results show that nestin-expressing cells in the bulge region of the hair follicle have stem cell-like properties, are multipotent, and can effectively generate cells of neural lineage in vitro and in vivo.
PMCID: PMC3789384  PMID: 17873521
stem cells; hair follicle; bulge; neurogenesis; transcriptional profiling
16.  Chromatin state and microRNA determine different gene expression dynamics responsive to TNF stimulation 
Genomics  2012;100(5):297-302.
Gene expression is a dynamic process, and what factors influence gene expression changes upon external stimulus have not been clearly understood. We studied gene expression profiles in human umbilical vein endothelial cells (HUVEC) after the Tumor Necrosis Factor (TNF) stimulus, and found that: the promoters of fast-response up-regulated genes were enriched with several “active” chromatin markers like H3K27ac and H3K4me3, and also preferentially bound by Pol II and c-Myc; the core-promoter regions of slow-response up-regulated genes were frequently occupied by nucleosomes; down-regulated genes were more intensively regulated by microRNAs. Moreover, the Gene Ontology and motif analysis of the promoter regions revealed that gene clusters with different response behaviors had different functions and were regulated by different sets of transcription factors. Our observations suggested that the different gene expression patterns upon external stimulus were regulated by a combination of multi-layer regulators.
PMCID: PMC3771509  PMID: 22824656
TNF; Gene expression profiles; Chromatin; Histone code; MicroRNAs
17.  Novel Foxo1–dependent transcriptional programs control Treg cell function 
Nature  2012;491(7425):554-559.
Regulatory T (Treg) cells, characterized by expression of the transcription factor forkhead box P3 (Foxp3), maintain immune homeostasis by suppressing self-destructive immune responses1–4. Foxp3 operates as a late-acting differentiation factor controlling Treg cell homeostasis and function5, whereas the early Treg-cell-lineage commitment is regulated by the Akt kinase and the forkhead box O (Foxo) family of transcription factors6–10. However, whether Foxo proteins act beyond the Treg-cell-commitment stage to control Treg cell homeostasis and function remains largely unexplored. Here we show that Foxo1 is a pivotal regulatorof Treg cell function. Treg cells express high amounts of Foxo1 and display reduced T-cell-receptor-induced Akt activation, Foxo1 phosphorylation and Foxo1 nuclear exclusion. Mice with Treg-cell-specific deletion of Foxo1 develop a fatal inflammatory disorder similar in severity to that seen in Foxp3-deficient mice, but without the loss of Treg cells. Genome-wide analysis of Foxo1 binding sites reveals ~300 Foxo1-bound target genes, including the pro-inflammatory cytokine Ifng, that do not seem to be directly regulated by Foxp3. These findings show that the evolutionarily ancient Akt–Foxo1 signalling module controls a novel genetic program indispensable for Treg cell function.
PMCID: PMC3771531  PMID: 23135404
18.  FastDMA: An Infinium HumanMethylation450 Beadchip Analyzer 
PLoS ONE  2013;8(9):e74275.
DNA methylation is vital for many essential biological processes and human diseases. Illumina Infinium HumanMethylation450 Beadchip is a recently developed platform studying genome-wide DNA methylation state on more than 480,000 CpG sites and a few CHG sites with high data quality. To analyze the data of this promising platform, we developed FastDMA which can be used to identify significantly differentially methylated probes. Besides single probe analysis, FastDMA can also do region-based analysis for identifying the differentially methylated region (DMRs). A uniformed statistical model, analysis of covariance (ANCOVA), is used to achieve all the analyses in FastDMA. We apply FastDMA on three large-scale DNA methylation datasets from The Cancer Genome Atlas (TCGA) and find many differentially methylated genomic sites in different types of cancer. On the testing datasets, FastDMA shows much higher computational efficiency than current tools. FastDMA can benefit the data analyses of large-scale DNA methylation studies with an integrative pipeline and a high computational efficiency. The software is freely available via
PMCID: PMC3764200  PMID: 24040221
19.  Epigenomic Analysis of Multi-lineage Differentiation of Human Embryonic Stem Cells 
Epigenetic mechanisms have been proposed as crucial for regulating mammalian development, but their precise function is only partially understood. To investigate the epigenetic control of embryonic development, we differentiated human embryonic stem cells into mesendoderm, neural progenitor cells, trophoblast-like cells, and mesenchymal stem cells and systematically characterized DNA methylation, chromatin modifications, and the transcriptome in each lineage. Strikingly, we found that promoters that are active in early developmental stages tend to be CG rich and mainly engage H3K27me3 upon silencing in non-expressing lineages. By contrast, promoters for genes expressed preferentially at later stages are often CG poor and employ DNA methylation upon repression. Interestingly, the early developmental regulatory genes are often located in large genomic domains that are generally devoid of DNA methylation in most lineages, as we termed DNA methylation valleys (DMVs). Our results suggest that distinct epigenetic mechanisms regulate early and late stages of ES cell differentiation.
PMCID: PMC3635352
20.  Cell-type based analysis of microRNA profiles in the mouse brain 
Neuron  2012;73(1):35-48.
MicroRNAs (miRNA) are implicated in brain development and function but the underlying mechanisms have been difficult to study in part due to the cellular heterogeneity in neural circuits. To systematically analyze miRNA expression in neurons, we have established a miRNA tagging and affinity purification (miRAP) method that is targeted to cell types through the Cre-loxP binary system in mice. Our studies of the neocortex and cerebellum reveal the expression of a large fraction of known miRNAs with distinct profiles in glutamatergic and GABAergic neurons, and subtypes of GABAergic neurons. We further detected putative novel miRNAs, tissue or cell type-specific strand selection of miRNAs, and miRNA editing. Our method thus will facilitate a systematic analysis of miRNA expression and regulation in specific neuron types in the context of neuronal development, physiology, plasticity, pathology and disease models, and is generally applicable to other cell types and tissues.
PMCID: PMC3270494  PMID: 22243745
21.  New Fusion Transcripts Identified in Normal Karyotype Acute Myeloid Leukemia 
PLoS ONE  2012;7(12):e51203.
Genetic aberrations contribute to acute myeloid leukemia (AML). However, half of AML cases do not contain the well-known aberrations detectable mostly by cytogenetic analysis, and these cases are classified as normal karyotype AML. Different outcomes of normal karyotype AML suggest that this subgroup of AML could be genetically heterogeneous. But lack of genetic markers makes it difficult to further study this subgroup of AML. Using paired-end RNAseq method, we performed a transcriptome analysis in 45 AML cases including 29 normal karyotype AML, 8 abnormal karyotype AML and 8 AML without karyotype informaiton. Our study identified 134 fusion transcripts, all of which were formed between the partner genes adjacent in the same chromosome and distributed at different frequencies in the AML cases. Seven fusions are exclusively present in normal karyotype AML, and the rest fusions are shared between the normal karyotype AML and abnormal karyotype AML. CIITA, a master regulator of MHC class II gene expression and truncated in B-cell lymphoma and Hodgkin disease, is found to fuse with DEXI in 48% of normal karyotype AML cases. The fusion transcripts formed between adjacent genes highlight the possibility that certain such fusions could be involved in oncological process in AML, and provide a new source to identify genetic markers for normal karyotype AML.
PMCID: PMC3520980  PMID: 23251452
22.  SpliceTrap: a method to quantify alternative splicing under single cellular conditions 
Bioinformatics  2011;27(21):3010-3016.
Motivation: Alternative splicing (AS) is a pre-mRNA maturation process leading to the expression of multiple mRNA variants from the same primary transcript. More than 90% of human genes are expressed via AS. Therefore, quantifying the inclusion level of every exon is crucial for generating accurate transcriptomic maps and studying the regulation of AS.
Results: Here we introduce SpliceTrap, a method to quantify exon inclusion levels using paired-end RNA-seq data. Unlike other tools, which focus on full-length transcript isoforms, SpliceTrap approaches the expression-level estimation of each exon as an independent Bayesian inference problem. In addition, SpliceTrap can identify major classes of alternative splicing events under a single cellular condition, without requiring a background set of reads to estimate relative splicing changes. We tested SpliceTrap both by simulation and real data analysis, and compared it to state-of-the-art tools for transcript quantification. SpliceTrap demonstrated improved accuracy, robustness and reliability in quantifying exon-inclusion ratios.
Conclusions: SpliceTrap is a useful tool to study alternative splicing regulation, especially for accurate quantification of local exon-inclusion ratios from RNA-seq data.
Availability and Implementation: SpliceTrap can be implemented online through the CSH Galaxy server and is also available for download and installation at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3198574  PMID: 21896509
23.  Regulatory elements of Caenorhabditis elegans ribosomal protein genes 
BMC Genomics  2012;13:433.
Ribosomal protein genes (RPGs) are essential, tightly regulated, and highly expressed during embryonic development and cell growth. Even though their protein sequences are strongly conserved, their mechanism of regulation is not conserved across yeast, Drosophila, and vertebrates. A recent investigation of genomic sequences conserved across both nematode species and associated with different gene groups indicated the existence of several elements in the upstream regions of C. elegans RPGs, providing a new insight regarding the regulation of these genes in C. elegans.
In this study, we performed an in-depth examination of C. elegans RPG regulation and found nine highly conserved motifs in the upstream regions of C. elegans RPGs using the motif discovery algorithm DME. Four motifs were partially similar to transcription factor binding sites from C. elegans, Drosophila, yeast, and human. One pair of these motifs was found to co-occur in the upstream regions of 250 transcripts including 22 RPGs. The distance between the two motifs displayed a complex frequency pattern that was related to their relative orientation.
We tested the impact of three of these motifs on the expression of rpl-2 using a series of reporter gene constructs and showed that all three motifs are necessary to maintain the high natural expression level of this gene. One of the motifs was similar to the binding site of an orthologue of POP-1, and we showed that RNAi knockdown of pop-1 impacts the expression of rpl-2. We further determined the transcription start site of rpl-2 by 5’ RACE and found that the motifs lie 40–90 bases upstream of the start site. We also found evidence that a noncoding RNA, contained within the outron of rpl-2, is co-transcribed with rpl-2 and cleaved during trans-splicing.
Our results indicate that C. elegans RPGs are regulated by a complex novel series of regulatory elements that is evolutionarily distinct from those of all other species examined up until now.
PMCID: PMC3575287  PMID: 22928635
24.  Bivalent-Like Chromatin Markers Are Predictive for Transcription Start Site Distribution in Human 
PLoS ONE  2012;7(6):e38112.
Deep sequencing of 5′ capped transcripts has revealed a variety of transcription initiation patterns, from narrow, focused promoters to wide, broad promoters. Attempts have already been made to model empirically classified patterns, but virtually no quantitative models for transcription initiation have been reported. Even though both genetic and epigenetic elements have been associated with such patterns, the organization of regulatory elements is largely unknown. Here, linear regression models were derived from a pool of regulatory elements, including genomic DNA features, nucleosome organization, and histone modifications, to predict the distribution of transcription start sites (TSS). Importantly, models including both active and repressive histone modification markers, e.g. H3K4me3 and H4K20me1, were consistently found to be much more predictive than models with only single-type histone modification markers, indicating the possibility of “bivalent-like” epigenetic control of transcription initiation. The nucleosome positions are proposed to be coded in the active component of such bivalent-like histone modification markers. Finally, we demonstrated that models trained on one cell type could successfully predict TSS distribution in other cell types, suggesting that these models may have a broader application range.
PMCID: PMC3387189  PMID: 22768038
25.  The pro-longevity gene FoxO3 is a direct target of the p53 tumor suppressor 
Oncogene  2011;30(29):3207-3221.
FoxO transcription factors play a conserved role in longevity and act as tissue-specific tumor suppressors in mammals. Several nodes of interaction have been identified between FoxO transcription factors and p53, a major tumor suppressor in humans and mice. However, the extent and importance of the functional interaction between FoxO and p53 have not been fully explored. Here, we show that p53 transactivates the expression of FoxO3, one of the four mammalian FoxO genes, in response to DNA damaging agents in both mouse embryonic fibroblasts and in thymocytes. We show that p53 transactivates FoxO3 in cells by binding to a site in the second intron of the FoxO3 gene, a genomic region recently found to be associated with extreme longevity in humans. While FoxO3 is not necessary for p53-dependent cell cycle arrest, FoxO3 appears to modulate p53-dependent apoptosis. We also find that FoxO3 loss does not interact with p53 loss for tumor development in vivo, although the tumor spectrum of p53 deficient mice may be affected by FoxO3 loss. Our findings indicate that FoxO3 is a p53 target gene, and suggest that FoxO3 and p53 are part of a regulatory transcriptional network that may play an important role during aging and cancer.
PMCID: PMC3136551  PMID: 21423206

Results 1-25 (74)