Search tips
Search criteria

Results 1-25 (816)

Clipboard (0)
Year of Publication
more »
1.  H2B ubiquitination regulates meiotic recombination by promoting chromatin relaxation 
Nucleic Acids Research  2016;44(20):9681-9697.
Meiotic recombination is essential for fertility in most sexually reproducing species, but the molecular mechanisms underlying this process remain poorly understood in mammals. Here, we show that RNF20-mediated H2B ubiquitination is required for meiotic recombination. A germ cell-specific knockout of the H2B ubiquitination E3 ligase RNF20 results in complete male infertility. The Stra8-Rnf20−/− spermatocytes arrest at the pachytene stage because of impaired programmed double-strand break (DSB) repair. Further investigations reveal that the depletion of RNF20 in the germ cells affects chromatin relaxation, thus preventing programmed DSB repair factors from being recruited to proper positions on the chromatin. The gametogenetic defects of the H2B ubiquitination deficient cells could be partially rescued by forced chromatin relaxation. Taken together, our results demonstrate that RNF20/Bre1p-mediated H2B ubiquitination regulates meiotic recombination by promoting chromatin relaxation, and suggest an old drug may provide a new way to treat some oligo- or azoospermia patients with chromatin relaxation disorders.
PMCID: PMC5175339  PMID: 27431324
2.  A SnoRNA-derived piRNA interacts with human interleukin-4 pre-mRNA and induces its decay in nuclear exosomes 
Nucleic Acids Research  2015;43(21):10474-10491.
PIWI interacting RNAs (piRNAs) are highly expressed in germline cells and are involved in maintaining genome integrity by silencing transposons. These are also involved in DNA/histone methylation and gene expression regulation in somatic cells of invertebrates. The functions of piRNAs in somatic cells of vertebrates, however, remain elusive. We found that snoRNA-derived and C (C′)/D′ (D)-box conserved piRNAs are abundant in human CD4 primary T-lymphocytes. piRNA (piR30840) significantly downregulated interleukin-4 (IL-4) via sequence complementarity binding to pre-mRNA intron, which subsequently inhibited the development of Th2 T-lymphocytes. Piwil4 and Ago4 are associated with this piRNA, and this complex further interacts with Trf4-Air2-Mtr4 Polyadenylation (TRAMP) complex, which leads to the decay of targeted pre-mRNA through nuclear exosomes. Taken together, we demonstrate a novel piRNA mechanism in regulating gene expression in highly differentiated somatic cells and a possible novel target for allergy therapeutics.
PMCID: PMC4666397  PMID: 26405199
3.  AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors 
Nucleic Acids Research  2014;43(Database issue):D76-D81.
Transcription factors (TFs) are key regulators for gene expression. Here we updated the animal TF database AnimalTFDB to version 2.0 ( Using the improved prediction pipeline, we identified 72 336 TF genes, 21 053 transcription co-factor genes and 6502 chromatin remodeling factor genes from 65 species covering main animal lineages. Besides the abundant annotations (basic information, gene model, protein functional domain, gene ontology, pathway, protein interaction, ortholog and paralog, etc.) in the previous version, we made several new features and functions in the updated version. These new features are: (i) gene expression from RNA-Seq for nine model species, (ii) gene phenotype information, (iii) multiple sequence alignment of TF DNA-binding domains, and the weblogo and phylogenetic tree based on the alignment, (iv) a TF prediction server to identify new TFs from input sequences and (v) a BLAST server to search against TFs in AnimalTFDB. A new nice web interface was designed for AnimalTFDB 2.0 allowing users to browse and search all data in the database. We aim to maintain the AnimalTFDB as a solid resource for TF identification and studies of transcription regulation and comparative genomics.
PMCID: PMC4384004  PMID: 25262351
4.  AMPK regulates histone H2B O-GlcNAcylation 
Nucleic Acids Research  2014;42(9):5594-5604.
Histone H2B O-GlcNAcylation is an important post-translational modification of chromatin during gene transcription. However, how this epigenetic modification is regulated remains unclear. Here we found that the energy-sensing adenosine-monophosphate-activated protein kinase (AMPK) could suppress histone H2B O-GlcNAcylation. AMPK directly phosphorylates O-linked β-N-acetylglucosamine (O-GlcNAc) transferase (OGT). Although this phosphorylation does not regulate the enzymatic activity of OGT, it inhibits OGT–chromatin association, histone O-GlcNAcylation and gene transcription. Conversely, OGT also O-GlcNAcylates AMPK and positively regulates AMPK activity, creating a feedback loop. Taken together, these results reveal a crosstalk between the LKB1-AMPK and the hexosamine biosynthesis (HBP)-OGT pathways, which coordinate together for the sensing of nutrient state and regulation of gene transcription.
PMCID: PMC4027166  PMID: 24692660
5.  Integrated omics study delineates the dynamics of lipid droplets in Rhodococcus opacus PD630 
Nucleic Acids Research  2013;42(2):1052-1064.
Rhodococcus opacus strain PD630 (R. opacus PD630), is an oleaginous bacterium, and also is one of few prokaryotic organisms that contain lipid droplets (LDs). LD is an important organelle for lipid storage but also intercellular communication regarding energy metabolism, and yet is a poorly understood cellular organelle. To understand the dynamics of LD using a simple model organism, we conducted a series of comprehensive omics studies of R. opacus PD630 including complete genome, transcriptome and proteome analysis. The genome of R. opacus PD630 encodes 8947 genes that are significantly enriched in the lipid transport, synthesis and metabolic, indicating a super ability of carbon source biosynthesis and catabolism. The comparative transcriptome analysis from three culture conditions revealed the landscape of gene-altered expressions responsible for lipid accumulation. The LD proteomes further identified the proteins that mediate lipid synthesis, storage and other biological functions. Integrating these three omics uncovered 177 proteins that may be involved in lipid metabolism and LD dynamics. A LD structure-like protein LPD06283 was further verified to affect the LD morphology. Our omics studies provide not only a first integrated omics study of prokaryotic LD organelle, but also a systematic platform for facilitating further prokaryotic LD research and biofuel development.
PMCID: PMC3902926  PMID: 24150943
6.  Zinc-finger-nucleases mediate specific and efficient excision of HIV-1 proviral DNA from infected and latently infected human T cells 
Nucleic Acids Research  2013;41(16):7771-7782.
HIV-infected individuals currently cannot be completely cured because existing antiviral therapy regimens do not address HIV provirus DNA, flanked by long terminal repeats (LTRs), already integrated into host genome. Here, we present a possible alternative therapeutic approach to specifically and directly mediate deletion of the integrated full-length HIV provirus from infected and latently infected human T cell genomes by using specially designed zinc-finger nucleases (ZFNs) to target a sequence within the LTR that is well conserved across all clades. We designed and screened one pair of ZFN to target the highly conserved HIV-1 5′-LTR and 3′-LTR DNA sequences, named ZFN-LTR. We found that ZFN-LTR can specifically target and cleave the full-length HIV-1 proviral DNA in several infected and latently infected cell types and also HIV-1 infected human primary cells in vitro. We observed that the frequency of excision was 45.9% in infected human cell lines after treatment with ZFN-LTR, without significant host-cell genotoxicity. Taken together, our data demonstrate that a single ZFN-LTR pair can specifically and effectively cleave integrated full-length HIV-1 proviral DNA and mediate antiretroviral activity in infected and latently infected cells, suggesting that this strategy could offer a novel approach to eradicate the HIV-1 virus from the infected host in the future.
PMCID: PMC3763554  PMID: 23804764
7.  High-efficiency and heritable gene targeting in mouse by transcription activator-like effector nucleases 
Nucleic Acids Research  2013;41(11):e120.
Transcription activator-like effector nucleases (TALENs) are a powerful new approach for targeted gene disruption in various animal models, but little is known about their activities in Mus musculus, the widely used mammalian model organism. Here, we report that direct injection of in vitro transcribed messenger RNA of TALEN pairs into mouse zygotes induced somatic mutations, which were stably passed to the next generation through germ-line transmission. With one TALEN pair constructed for each of 10 target genes, mutant F0 mice for each gene were obtained with the mutation rate ranged from 13 to 67% and an average of ∼40% of total healthy newborns with no significant differences between C57BL/6 and FVB/N genetic background. One TALEN pair with single mismatch to their intended target sequence in each side failed to yield any mutation. Furthermore, highly efficient germ-line transmission was obtained, as all the F0 founders tested transmitted the mutations to F1 mice. In addition, we also observed that one bi-allele mutant founder of Lepr gene, encoding Leptin receptor, had similar diabetic phenotype as db/db mouse. Together, our results suggest that TALENs are an effective genetic tool for rapid gene disruption with high efficiency and heritability in mouse with distinct genetic background.
PMCID: PMC3675477  PMID: 23630316
8.  CADgene: a comprehensive database for coronary artery disease genes 
Nucleic Acids Research  2010;39(Database issue):D991-D996.
Coronary artery disease (CAD) is a complex, multifactorial disease and a leading cause of mortality world wide. Over the past decades, great efforts have been made to elucidate the underlying genetic basis of CAD and massive data have been accumulated. To integrate these data together and to provide a useful resource for researchers, we developed the CADgene, a comprehensive database for CAD genes. We manually extracted CAD-related evidence for more than 300 candidate genes for CAD from over 1300 publications of genetic studies. We classified these candidate genes into 12 functional categories based on their roles in CAD. For each gene, we extracted detailed information from related studies (e.g. the size of case–control, population, SNP, odds ratio, P-value, etc.) and made useful annotations, which include general gene information, Gene Ontology annotations, KEGG pathways, protein–protein interactions and others. Besides the statistical number of studies for each gene, CADgene also provides tools to search and show the most frequently studied candidate genes. In addition, CADgene provides cumulative data from 11 publications of CAD-related genome-wide association studies. CADgene has a user-friendly web interface with multiple browse and search functions. It is freely available at
PMCID: PMC3013698  PMID: 21045063
9.  Human Bex2 interacts with LMO2 and regulates the transcriptional activity of a novel DNA-binding complex 
Nucleic Acids Research  2005;33(20):6555-6565.
Human Bex2 (brain expressed X-linked, hBex2) is highly expressed in the embryonic brain, but its function remains unknown. We have identified that LMO2, a LIM-domain containing transcriptional factor, specifically interacts with hBex2 but not with mouse Bex1 and Bex2. The interaction was confirmed both by pull-down with GST-hBex2 and by coimmunoprecipitation assays in vivo. Using electrophoretic mobility shift assay, we have demonstrated the physical interaction of hBex2 and LMO2 as part of a DNA-binding protein complex. We have also shown that hBex2 can enhance the transcriptional activity of LMO2 in vivo. Furthermore, using mammalian two-hybrid analysis, we have identified a neuronal bHLH protein, NSCL2, as a novel binding partner for LMO2. We then showed that LMO2 could up-regulate NSCL2-dependent transcriptional activity, and hBex2 augmented this effect. Thus, hBex2 may act as a specific regulator during embryonic development by modulating the transcriptional activity of a novel E-box sequence-binding complex that contains hBex2, LMO2, NSCL2 and LDB1.
PMCID: PMC1298925  PMID: 16314316
10.  A suite of web-based programs to search for transcriptional regulatory motifs 
Nucleic Acids Research  2004;32(Web Server issue):W204-W207.
The identification of regulatory motifs is important for the study of gene expression. Here we present a suite of programs that we have developed to search for regulatory sequence motifs: (i) BioProspector, a Gibbs-sampling-based program for predicting regulatory motifs from co-regulated genes in prokaryotes or lower eukaryotes; (ii) CompareProspector, an extension to BioProspector which incorporates comparative genomics features to be used for higher eukaryotes; (iii) MDscan, a program for finding protein–DNA interaction sites from ChIP-on-chip targets. All three programs examine a group of sequences that may share common regulatory motifs and output a list of putative motifs as position-specific probability matrices, the individual sites used to construct the motifs and the location of each site on the input sequences. The web servers and executables can be accessed at
PMCID: PMC441599  PMID: 15215381
11.  Cistrome Data Browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse 
Nucleic Acids Research  2016;45(Database issue):D658-D662.
Chromatin immunoprecipitation, DNase I hypersensitivity and transposase-accessibility assays combined with high-throughput sequencing enable the genome-wide study of chromatin dynamics, transcription factor binding and gene regulation. Although rapidly accumulating publicly available ChIP-seq, DNase-seq and ATAC-seq data are a valuable resource for the systematic investigation of gene regulation processes, a lack of standardized curation, quality control and analysis procedures have hindered extensive reuse of these data. To overcome this challenge, we built the Cistrome database, a collection of ChIP-seq and chromatin accessibility data (DNase-seq and ATAC-seq) published before January 1, 2016, including 13 366 human and 9953 mouse samples. All the data have been carefully curated and processed with a streamlined analysis pipeline and evaluated with comprehensive quality control metrics. We have also created a user-friendly web server for data query, exploration and visualization. The resulting Cistrome DB (Cistrome Data Browser), available online at, is expected to become a valuable resource for transcriptional and epigenetic regulation studies.
PMCID: PMC5210658  PMID: 27789702
12.  LNCediting: a database for functional effects of RNA editing in lncRNAs 
Nucleic Acids Research  2016;45(Database issue):D79-D84.
RNA editing is a widespread post-transcriptional mechanism that can make a single base change on specific nucleotide sequence in an RNA transcript. RNA editing events can result in missense codon changes and modulation of alternative splicing in mRNA, and modification of regulatory RNAs and their binding sites in noncoding RNAs. Recent computational studies accurately detected more than 2 million A-to-I RNA editing sites from next-generation sequencing (NGS). However, the vast majority of these RNA editing sites have unknown functions and are in noncoding regions of the genome. To provide a useful resource for the functional effects of RNA editing in long noncoding RNAs (lncRNAs), we systematically analyzed the A-to-I editing sites in lncRNAs across human, rhesus, mouse, and fly, and observed an appreciable number of RNA editing sites which can significantly impact the secondary structures of lncRNAs and lncRNA–miRNA interactions. All the data were compiled into LNCediting, a user-friendly database ( LNCediting provides customized tools to predict functional effects of novel editing sites in lncRNAs. We hope that it will become an important resource for exploring functions of RNA editing sites in lncRNAs.
PMCID: PMC5210611  PMID: 27651464
13.  Evoking picomolar binding in RNA by a single phosphorodithioate linkage 
Nucleic Acids Research  2016;44(17):8052-8064.
RNA aptamers are synthetic oligonucleotide-based affinity molecules that utilize unique three-dimensional structures for their affinity and specificity to a target such as a protein. They hold the promise of numerous advantages over biologically produced antibodies; however, the binding affinity and specificity of RNA aptamers are often insufficient for successful implementation in diagnostic assays or as therapeutic agents. Strong binding affinity is important to improve the downstream applications. We report here the use of the phosphorodithioate (PS2) substitution on a single nucleotide of RNA aptamers to dramatically improve target binding affinity by ∼1000-fold (from nanomolar to picomolar). An X-ray co-crystal structure of the α-thrombin:PS2-aptamer complex reveals a localized induced-fit rearrangement of the PS2-containing nucleotide which leads to enhanced target interaction. High-level quantum mechanical calculations for model systems that mimic the PS2 moiety and phenylalanine demonstrate that an edge-on interaction between sulfur and the aromatic ring is quite favorable, and also confirm that the sulfur analogs are much more polarizable than the corresponding phosphates. This favorable interaction involving the sulfur atom is likely even more significant in the full aptamer-protein complexes than in the model systems.
PMCID: PMC5041495  PMID: 27566147
14.  SOX9 is targeted for proteasomal degradation by the E3 ligase FBW7 in response to DNA damage 
Nucleic Acids Research  2016;44(18):8855-8869.
SOX9 encodes a transcription factor that governs cell fate specification throughout development and tissue homeostasis. Elevated SOX9 is implicated in the genesis and progression of human tumors by increasing cell proliferation and epithelial-mesenchymal transition. We found that in response to UV irradiation or genotoxic chemotherapeutics, SOX9 is actively degraded in various cancer types and in normal epithelial cells, through a pathway independent of p53, ATM, ATR and DNA-PK. SOX9 is phosphorylated by GSK3β, facilitating the binding of SOX9 to the F-box protein FBW7α, an E3 ligase that functions in the DNA damage response pathway. The binding of FBW7α to the SOX9 K2 domain at T236-T240 targets SOX9 for subsequent ubiquitination and proteasomal destruction. Exogenous overexpression of SOX9 after genotoxic stress increases cell survival. Our findings reveal a novel regulatory mechanism for SOX9 stability and uncover a unique function of SOX9 in the cellular response to DNA damage. This new mechanism underlying a FBW7-SOX9 axis in cancer could have implications in therapy resistance.
PMCID: PMC5062998  PMID: 27566146
15.  Analysis of C. elegans muscle transcriptome using trans-splicing-based RNA tagging (SRT) 
Nucleic Acids Research  2016;44(21):e156.
Current approaches to profiling tissue-specific gene expression in C. elegans require delicate manipulation and are difficult under certain conditions, e.g. from dauer or aging worms. We have developed an easy and robust method for tissue-specific RNA-seq by taking advantage of the endogenous trans-splicing process. In this method, transgenic worms are generated in which a spliced leader (SL) RNA gene is fused with a sequence tag and driven by a tissue-specific promoter. Only in the tissue of interest, the tagged SL RNA gene is transcribed and then trans-spliced onto mRNAs. The tag allows enrichment and sequencing of mRNAs from that tissue only. As a proof of principle, we profiled the muscle transcriptome, which showed high coverage and efficient enrichment of muscle specific genes, with low background noise. To demonstrate the robustness of our method, we profiled muscle gene expression in dauer larvae and aging worms, revealing gene expression changes consistent with the physiology of these stages. The resulting muscle transcriptome also revealed 461 novel RNA transcripts, likely muscle-expressed long non-coding RNAs. In summary, the splicing-based RNA tagging (SRT) method provides a convenient and robust tool to profile trans-spliced genes and identify novel transcripts in a tissue-specific manner, with a low false positive rate.
PMCID: PMC5137427  PMID: 27557708
16.  NanoStringDiff: a novel statistical method for differential expression analysis based on NanoString nCounter data 
Nucleic Acids Research  2016;44(20):e151.
The advanced medium-throughput NanoString nCounter technology has been increasingly used for mRNA or miRNA differential expression (DE) studies due to its advantages including direct measurement of molecule expression levels without amplification, digital readout and superior applicability to formalin fixed paraffin embedded samples. However, the analysis of nCounter data is hampered because most methods developed are based on t-tests, which do not fit the count data generated by the NanoString nCounter system. Furthermore, data normalization procedures of current methods are either not suitable for counts or not specific for NanoString nCounter data. We develop a novel DE detection method based on NanoString nCounter data. The method, named NanoStringDiff, considers a generalized linear model of the negative binomial family to characterize count data and allows for multifactor design. Data normalization is incorporated in the model framework through data normalization parameters, which are estimated from positive controls, negative controls and housekeeping genes embedded in the nCounter system. We propose an empirical Bayes shrinkage approach to estimate the dispersion parameter in the model and a likelihood ratio test to identify differentially expressed genes. Simulations and real data analysis demonstrate that the proposed method performs better than existing methods.
PMCID: PMC5175344  PMID: 27471031
17.  Multiple P-TEFbs cooperatively regulate the release of promoter-proximally paused RNA polymerase II 
Nucleic Acids Research  2016;44(14):6853-6867.
The association of DSIF and NELF with initiated RNA Polymerase II (Pol II) is the general mechanism for inducing promoter-proximal pausing of Pol II. However, it remains largely unclear how the paused Pol II is released in response to stimulation. Here, we show that the release of the paused Pol II is cooperatively regulated by multiple P-TEFbs which are recruited by bromodomain-containing protein Brd4 and super elongation complex (SEC) via different recruitment mechanisms. Upon stimulation, Brd4 recruits P-TEFb to Spt5/DSIF via a recruitment pathway consisting of Med1, Med23 and Tat-SF1, whereas SEC recruits P-TEFb to NELF-A and NELF-E via Paf1c and Med26, respectively. P-TEFb-mediated phosphorylation of Spt5, NELF-A and NELF-E results in the dissociation of NELF from Pol II, thereby transiting transcription from pausing to elongation. Additionally, we demonstrate that P-TEFb-mediated Ser2 phosphorylation of Pol II is dispensable for pause release. Therefore, our studies reveal a co-regulatory mechanism of Brd4 and SEC in modulating the transcriptional pause release by recruiting multiple P-TEFbs via a Mediator- and Paf1c-coordinated recruitment network.
PMCID: PMC5001612  PMID: 27353326
18.  hTERT promotes tumor angiogenesis by activating VEGF via interactions with the Sp1 transcription factor 
Nucleic Acids Research  2016;44(18):8693-8703.
Angiogenesis is recognized as an important hallmark of cancer. Although telomerase is thought to be involved in tumor angiogenesis, the evidence and underlying mechanism remain elusive. Here, we demonstrate that human telomerase reverse transcriptase (hTERT) activates vascular epithelial growth factor (VEGF) gene expression through interactions with the VEGF promoter and the transcription factor Sp1. hTERT binds to Sp1 in vitro and in vivo and stimulates angiogenesis in a manner dependent on Sp1. Deletion of the mTert gene in the first generation of Tert null mice compromised tumor growth, with reduced VEGF expression. In addition, we show that hTERT expression levels are positively correlated with those of VEGF in human gastric tumor samples. Together, our results demonstrate that hTERT facilitates tumor angiogenesis by up-regulating VEGF expression through direct interactions with the VEGF gene and the Sp1 transcription factor. These results provide novel insights into hTERT function in tumor progression in addition to its role in telomere maintenance.
PMCID: PMC5062966  PMID: 27325744
19.  CTLPScanner: a web server for chromothripsis-like pattern detection 
Nucleic Acids Research  2016;44(Web Server issue):W252-W258.
Chromothripsis is a recently observed phenomenon in cancer cells in which one or several chromosomes shatter into pieces with subsequent inaccurate reassembly and clonal propagation. This type of event generates a potentially vast number of mutations within a relatively short-time period, and has been considered as a new paradigm in cancer development. Despite recent advances, much work is still required to better understand the molecular mechanisms of this phenomenon, and thus an easy-to-use tool is in urgent need for automatically detecting and annotating chromothripsis. Here we present CTLPScanner, a web server for detection of chromothripsis-like pattern (CTLP) in genomic array data. The output interface presents intuitive graphical representations of detected chromosome pulverization region, as well as detailed results in table format. CTLPScanner also provides additional information for associated genes in chromothripsis region to help identify the potential candidates involved in tumorigenesis. To assist in performing meta-data analysis, we integrated over 50 000 pre-processed genomic arrays from The Cancer Genome Atlas and Gene Expression Omnibus into CTLPScanner. The server allows users to explore the presence of chromothripsis signatures from public data resources, without carrying out any local data processing. CTLPScanner is freely available at
PMCID: PMC4987951  PMID: 27185889
20.  Synergistic action of master transcription factors controls epithelial-to-mesenchymal transition 
Nucleic Acids Research  2016;44(6):2514-2527.
Epithelial-to-mesenchymal transition (EMT) is a complex multistep process in which phenotype switches are mediated by a network of transcription factors (TFs). Systematic characterization of all dynamic TFs controlling EMT state transitions, especially for the intermediate partial-EMT state, represents a highly relevant yet largely unexplored task. Here, we performed a computational analysis that integrated time-course EMT transcriptomic data with public cistromic data and identified three synergistic master TFs (ETS2, HNF4A and JUNB) that regulate the transition through the partial-EMT state. Overexpression of these regulators predicted a poor clinical outcome, and their elimination readily abolished TGF-β-induced EMT. Importantly, these factors utilized a clique motif, physically interact and their cumulative binding generally characterized EMT-associated genes. Furthermore, analyses of H3K27ac ChIP-seq data revealed that ETS2, HNF4A and JUNB are associated with super-enhancers and the administration of BRD4 inhibitor readily abolished TGF-β-induced EMT. These findings have implications for systematic discovery of master EMT regulators and super-enhancers as novel targets for controlling metastasis.
PMCID: PMC4824118  PMID: 26926107
21.  Integrative analysis identifies targetable CREB1/FoxA1 transcriptional co-regulation as a predictor of prostate cancer recurrence 
Nucleic Acids Research  2016;44(9):4105-4122.
Identifying prostate cancer-driving transcription factors (TFs) in addition to the androgen receptor promises to improve our ability to effectively diagnose and treat this disease. We employed an integrative genomics analysis of master TFs CREB1 and FoxA1 in androgen-dependent prostate cancer (ADPC) and castration-resistant prostate cancer (CRPC) cell lines, primary prostate cancer tissues and circulating tumor cells (CTCs) to investigate their role in defining prostate cancer gene expression profiles. Combining genome-wide binding site and gene expression profiles we define CREB1 as a critical driver of pro-survival, cell cycle and metabolic transcription programs. We show that CREB1 and FoxA1 co-localize and mutually influence each other's binding to define disease-driving transcription profiles associated with advanced prostate cancer. Gene expression analysis in human prostate cancer samples found that CREB1/FoxA1 target gene panels predict prostate cancer recurrence. Finally, we showed that this signaling pathway is sensitive to compounds that inhibit the transcription co-regulatory factor MED1. These findings not only reveal a novel, global transcriptional co-regulatory function of CREB1 and FoxA1, but also suggest CREB1/FoxA1 signaling is a targetable driver of prostate cancer progression and serves as a biomarker of poor clinical outcomes.
PMCID: PMC4872073  PMID: 26743006
22.  Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes 
Nucleic Acids Research  2015;44(1):75-94.
DNA methylation is a key epigenetic mark that is critical for gene regulation in multicellular eukaryotes. Although various human cell types may have the same genome, these cells have different methylomes. The systematic identification and characterization of methylation marks across cell types are crucial to understand the complex regulatory network for cell fate determination. In this study, we proposed an entropy-based framework termed SMART to integrate the whole genome bisulfite sequencing methylomes across 42 human tissues/cells and identified 757 887 genome segments. Nearly 75% of the segments showed uniform methylation across all cell types. From the remaining 25% of the segments, we identified cell type-specific hypo/hypermethylation marks that were specifically hypo/hypermethylated in a minority of cell types using a statistical approach and presented an atlas of the human methylation marks. Further analysis revealed that the cell type-specific hypomethylation marks were enriched through H3K27ac and transcription factor binding sites in cell type-specific manner. In particular, we observed that the cell type-specific hypomethylation marks are associated with the cell type-specific super-enhancers that drive the expression of cell identity genes. This framework provides a complementary, functional annotation of the human genome and helps to elucidate the critical features and functions of cell type-specific hypomethylation.
PMCID: PMC4705665  PMID: 26635396
23.  The long noncoding RNA Gm15055 represses Hoxa gene expression by recruiting PRC2 to the gene cluster 
Nucleic Acids Research  2015;44(6):2613-2627.
The Hox genes encode transcription factors that determine embryonic pattern formation. In embryonic stem cells, the Hox genes are silenced by PRC2. Recent studies have reported a role for long noncoding RNAs in PRC2 recruitment in vertebrates. However, little is known about how PRC2 is recruited to the Hox genes in ESCs. Here, we used stable knockdown and knockout strategies to characterize the function of the long noncoding RNA Gm15055 in the regulation of Hoxa genes in mouse ESCs. We found that Gm15055 is highly expressed in mESCs and its expression is maintained by OCT4. Gm15055 represses Hoxa gene expression by recruiting PRC2 to the cluster and maintaining the H3K27me3 modification on Hoxa promoters. A chromosome conformation capture assay revealed the close physical association of the Gm15055 locus to multiple sites at the Hoxa gene cluster in mESCs, which may facilitate the in cis targeting of Gm15055 RNA to the Hoxa genes. Furthermore, an OCT4-responsive positive cis-regulatory element is found in the Gm15055 gene locus, which potentially regulates both Gm15055 itself and the Hoxa gene activation. This study suggests how PRC2 is recruited to the Hoxa locus in mESCs, and implies an elaborate mechanism for Hoxa gene regulation in mESCs.
PMCID: PMC4824075  PMID: 26615201
24.  InsectBase: a resource for insect genomes and transcriptomes 
Nucleic Acids Research  2015;44(Database issue):D801-D807.
The genomes and transcriptomes of hundreds of insects have been sequenced. However, insect community lacks an integrated, up-to-date collection of insect gene data. Here, we introduce the first release of InsectBase, available online at The database encompasses 138 insect genomes, 116 insect transcriptomes, 61 insect gene sets, 36 gene families of 60 insects, 7544 miRNAs of 69 insects, 96 925 piRNAs of Drosophila melanogaster and Chilo suppressalis, 2439 lncRNA of Nilaparvata lugens, 22 536 pathways of 78 insects, 678 881 untranslated regions (UTR) of 84 insects and 160 905 coding sequences (CDS) of 70 insects. This release contains over 12 million sequences and provides search functionality, a BLAST server, GBrowse, insect pathway construction, a Facebook-like network for the insect community (iFacebook), and phylogenetic analysis of selected genes.
PMCID: PMC4702856  PMID: 26578584
25.  A gating mechanism for Pi release governs the mRNA unwinding by eIF4AI during translation initiation 
Nucleic Acids Research  2015;43(21):10157-10167.
Eukaryotic translation initiation factor eIF4AI, the founding member of DEAD-box helicases, undergoes ATP hydrolysis-coupled conformational changes to unwind mRNA secondary structures during translation initiation. However, the mechanism of its coupled enzymatic activities remains unclear. Here we report that a gating mechanism for Pi release controlled by the inter-domain linker of eIF4AI regulates the coupling between ATP hydrolysis and RNA unwinding. Molecular dynamic simulations and experimental results revealed that, through forming a hydrophobic core with the conserved SAT motif of the N-terminal domain and I357 from the C-terminal domain, the linker gated the release of Pi from the hydrolysis site, which avoided futile hydrolysis cycles of eIF4AI. Further mutagenesis studies suggested this linker also plays an auto-inhibitory role in the enzymatic activity of eIF4AI, which may be essential for its function during translation initiation. Overall, our results reveal a novel regulatory mechanism that controls eIF4AI-mediated mRNA unwinding and can guide further mechanistic studies on other DEAD-box helicases.
PMCID: PMC4666354  PMID: 26464436

Results 1-25 (816)