Search tips
Search criteria

Results 1-25 (641)

Clipboard (0)
Year of Publication
more »
1.  AMPK regulates histone H2B O-GlcNAcylation 
Nucleic Acids Research  2014;42(9):5594-5604.
Histone H2B O-GlcNAcylation is an important post-translational modification of chromatin during gene transcription. However, how this epigenetic modification is regulated remains unclear. Here we found that the energy-sensing adenosine-monophosphate-activated protein kinase (AMPK) could suppress histone H2B O-GlcNAcylation. AMPK directly phosphorylates O-linked β-N-acetylglucosamine (O-GlcNAc) transferase (OGT). Although this phosphorylation does not regulate the enzymatic activity of OGT, it inhibits OGT–chromatin association, histone O-GlcNAcylation and gene transcription. Conversely, OGT also O-GlcNAcylates AMPK and positively regulates AMPK activity, creating a feedback loop. Taken together, these results reveal a crosstalk between the LKB1-AMPK and the hexosamine biosynthesis (HBP)-OGT pathways, which coordinate together for the sensing of nutrient state and regulation of gene transcription.
PMCID: PMC4027166  PMID: 24692660
2.  Integrated omics study delineates the dynamics of lipid droplets in Rhodococcus opacus PD630 
Nucleic Acids Research  2013;42(2):1052-1064.
Rhodococcus opacus strain PD630 (R. opacus PD630), is an oleaginous bacterium, and also is one of few prokaryotic organisms that contain lipid droplets (LDs). LD is an important organelle for lipid storage but also intercellular communication regarding energy metabolism, and yet is a poorly understood cellular organelle. To understand the dynamics of LD using a simple model organism, we conducted a series of comprehensive omics studies of R. opacus PD630 including complete genome, transcriptome and proteome analysis. The genome of R. opacus PD630 encodes 8947 genes that are significantly enriched in the lipid transport, synthesis and metabolic, indicating a super ability of carbon source biosynthesis and catabolism. The comparative transcriptome analysis from three culture conditions revealed the landscape of gene-altered expressions responsible for lipid accumulation. The LD proteomes further identified the proteins that mediate lipid synthesis, storage and other biological functions. Integrating these three omics uncovered 177 proteins that may be involved in lipid metabolism and LD dynamics. A LD structure-like protein LPD06283 was further verified to affect the LD morphology. Our omics studies provide not only a first integrated omics study of prokaryotic LD organelle, but also a systematic platform for facilitating further prokaryotic LD research and biofuel development.
PMCID: PMC3902926  PMID: 24150943
3.  Zinc-finger-nucleases mediate specific and efficient excision of HIV-1 proviral DNA from infected and latently infected human T cells 
Nucleic Acids Research  2013;41(16):7771-7782.
HIV-infected individuals currently cannot be completely cured because existing antiviral therapy regimens do not address HIV provirus DNA, flanked by long terminal repeats (LTRs), already integrated into host genome. Here, we present a possible alternative therapeutic approach to specifically and directly mediate deletion of the integrated full-length HIV provirus from infected and latently infected human T cell genomes by using specially designed zinc-finger nucleases (ZFNs) to target a sequence within the LTR that is well conserved across all clades. We designed and screened one pair of ZFN to target the highly conserved HIV-1 5′-LTR and 3′-LTR DNA sequences, named ZFN-LTR. We found that ZFN-LTR can specifically target and cleave the full-length HIV-1 proviral DNA in several infected and latently infected cell types and also HIV-1 infected human primary cells in vitro. We observed that the frequency of excision was 45.9% in infected human cell lines after treatment with ZFN-LTR, without significant host-cell genotoxicity. Taken together, our data demonstrate that a single ZFN-LTR pair can specifically and effectively cleave integrated full-length HIV-1 proviral DNA and mediate antiretroviral activity in infected and latently infected cells, suggesting that this strategy could offer a novel approach to eradicate the HIV-1 virus from the infected host in the future.
PMCID: PMC3763554  PMID: 23804764
4.  High-efficiency and heritable gene targeting in mouse by transcription activator-like effector nucleases 
Nucleic Acids Research  2013;41(11):e120.
Transcription activator-like effector nucleases (TALENs) are a powerful new approach for targeted gene disruption in various animal models, but little is known about their activities in Mus musculus, the widely used mammalian model organism. Here, we report that direct injection of in vitro transcribed messenger RNA of TALEN pairs into mouse zygotes induced somatic mutations, which were stably passed to the next generation through germ-line transmission. With one TALEN pair constructed for each of 10 target genes, mutant F0 mice for each gene were obtained with the mutation rate ranged from 13 to 67% and an average of ∼40% of total healthy newborns with no significant differences between C57BL/6 and FVB/N genetic background. One TALEN pair with single mismatch to their intended target sequence in each side failed to yield any mutation. Furthermore, highly efficient germ-line transmission was obtained, as all the F0 founders tested transmitted the mutations to F1 mice. In addition, we also observed that one bi-allele mutant founder of Lepr gene, encoding Leptin receptor, had similar diabetic phenotype as db/db mouse. Together, our results suggest that TALENs are an effective genetic tool for rapid gene disruption with high efficiency and heritability in mouse with distinct genetic background.
PMCID: PMC3675477  PMID: 23630316
5.  CADgene: a comprehensive database for coronary artery disease genes 
Nucleic Acids Research  2010;39(Database issue):D991-D996.
Coronary artery disease (CAD) is a complex, multifactorial disease and a leading cause of mortality world wide. Over the past decades, great efforts have been made to elucidate the underlying genetic basis of CAD and massive data have been accumulated. To integrate these data together and to provide a useful resource for researchers, we developed the CADgene, a comprehensive database for CAD genes. We manually extracted CAD-related evidence for more than 300 candidate genes for CAD from over 1300 publications of genetic studies. We classified these candidate genes into 12 functional categories based on their roles in CAD. For each gene, we extracted detailed information from related studies (e.g. the size of case–control, population, SNP, odds ratio, P-value, etc.) and made useful annotations, which include general gene information, Gene Ontology annotations, KEGG pathways, protein–protein interactions and others. Besides the statistical number of studies for each gene, CADgene also provides tools to search and show the most frequently studied candidate genes. In addition, CADgene provides cumulative data from 11 publications of CAD-related genome-wide association studies. CADgene has a user-friendly web interface with multiple browse and search functions. It is freely available at
PMCID: PMC3013698  PMID: 21045063
6.  Human Bex2 interacts with LMO2 and regulates the transcriptional activity of a novel DNA-binding complex 
Nucleic Acids Research  2005;33(20):6555-6565.
Human Bex2 (brain expressed X-linked, hBex2) is highly expressed in the embryonic brain, but its function remains unknown. We have identified that LMO2, a LIM-domain containing transcriptional factor, specifically interacts with hBex2 but not with mouse Bex1 and Bex2. The interaction was confirmed both by pull-down with GST-hBex2 and by coimmunoprecipitation assays in vivo. Using electrophoretic mobility shift assay, we have demonstrated the physical interaction of hBex2 and LMO2 as part of a DNA-binding protein complex. We have also shown that hBex2 can enhance the transcriptional activity of LMO2 in vivo. Furthermore, using mammalian two-hybrid analysis, we have identified a neuronal bHLH protein, NSCL2, as a novel binding partner for LMO2. We then showed that LMO2 could up-regulate NSCL2-dependent transcriptional activity, and hBex2 augmented this effect. Thus, hBex2 may act as a specific regulator during embryonic development by modulating the transcriptional activity of a novel E-box sequence-binding complex that contains hBex2, LMO2, NSCL2 and LDB1.
PMCID: PMC1298925  PMID: 16314316
7.  A suite of web-based programs to search for transcriptional regulatory motifs 
Nucleic Acids Research  2004;32(Web Server issue):W204-W207.
The identification of regulatory motifs is important for the study of gene expression. Here we present a suite of programs that we have developed to search for regulatory sequence motifs: (i) BioProspector, a Gibbs-sampling-based program for predicting regulatory motifs from co-regulated genes in prokaryotes or lower eukaryotes; (ii) CompareProspector, an extension to BioProspector which incorporates comparative genomics features to be used for higher eukaryotes; (iii) MDscan, a program for finding protein–DNA interaction sites from ChIP-on-chip targets. All three programs examine a group of sequences that may share common regulatory motifs and output a list of putative motifs as position-specific probability matrices, the individual sites used to construct the motifs and the location of each site on the input sequences. The web servers and executables can be accessed at
PMCID: PMC441599  PMID: 15215381
8.  Human fetal globin gene expression is regulated by LYAR 
Nucleic Acids Research  2014;42(15):9740-9752.
Human globin gene expression during development is modulated by transcription factors in a stage-dependent manner. However, the mechanisms controlling the process are still largely unknown. In this study, we found that a nuclear protein, LYAR (human homologue of mouse Ly-1 antibody reactive clone) directly interacted with the methyltransferase PRMT5 which triggers the histone H4 Arg3 symmetric dimethylation (H4R3me2s) mark. We found that PRMT5 binding on the proximal γ-promoter was LYAR-dependent. The LYAR DNA-binding motif (GGTTAT) was identified by performing CASTing (cyclic amplification and selection of targets) experiments. Results of EMSA and ChIP assays confirmed that LYAR bound to a DNA region corresponding to the 5′-untranslated region of the γ-globin gene. We also found that LYAR repressed human fetal globin gene expression in both K562 cells and primary human adult erythroid progenitor cells. Thus, these data indicate that LYAR acts as a novel transcription factor that binds the γ-globin gene, and is essential for silencing the γ-globin gene.
PMCID: PMC4150809  PMID: 25092918
9.  GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs 
Nucleic Acids Research  2014;42(Web Server issue):W325-W330.
Small ubiquitin-like modifiers (SUMOs) regulate a variety of cellular processes through two distinct mechanisms, including covalent sumoylation and non-covalent SUMO interaction. The complexity of SUMO regulations has greatly hampered the large-scale identification of SUMO substrates or interaction partners on a proteome-wide level. In this work, we developed a new tool called GPS-SUMO for the prediction of both sumoylation sites and SUMO-interaction motifs (SIMs) in proteins. To obtain an accurate performance, a new generation group-based prediction system (GPS) algorithm integrated with Particle Swarm Optimization approach was applied. By critical evaluation and comparison, GPS-SUMO was demonstrated to be substantially superior against other existing tools and methods. With the help of GPS-SUMO, it is now possible to further investigate the relationship between sumoylation and SUMO interaction processes. A web service of GPS-SUMO was implemented in PHP + JavaScript and freely available at
PMCID: PMC4086084  PMID: 24880689
10.  Boronic acid-mediated polymerase chain reaction for gene- and fragment-specific detection of 5-hydroxymethylcytosine 
Nucleic Acids Research  2014;42(9):e81.
The gene- or fragment-specific detection of newly recognized deoxyribonucleic acid (DNA) base 5-hydroxymethylcytosine (5hmC) will provide insights into its critical functions in development and diseases, and is also important for screening 5hmC-rich genes as an indicator of epigenetic states, pathogenic processes and pharmacological responses. Current analytical technologies for gene-specific detection of 5hmC are heavily dependent on glucosylated 5hmC-resistant restriction endonuclease cleavage. Here, we find that boronic acid (BA) can inhibit the amplification activity of Taq DNA polymerase for replicating glucosylated 5hmC bases in template DNA by interacting with their glucose moiety. On the basis of this finding, we propose for the first time a BA-mediated polymerase chain reaction (PCR) assay for rapid and sensitive detection of gene- or fragment-specific 5hmC without restriction-assay-like sequence limitations. To optimize the BA-mediated PCR assay, we further tested BA derivatives and show that one BA derivative, 2-(2′-chlorobenzyloxy) phenylboronic acid, displays the highest inhibitory efficiency. Using the optimized assay, we demonstrate the enrichment of 5hmC in an intron region of Pax5 gene (a member of the paired box family of transcription factors) in mouse embryonic stem cells. Our work potentially opens a new way for the screening and identification of 5hmC-rich genes and for high throughput analysis of 5hmC in mammalian cells.
PMCID: PMC4027215  PMID: 24682822
11.  Deciphering the rules by which dynamics of mRNA secondary structure affect translation efficiency in Saccharomyces cerevisiae 
Nucleic Acids Research  2014;42(8):4813-4822.
Messenger RNA (mRNA) secondary structure decreases the elongation rate, as ribosomes must unwind every structure they encounter during translation. Therefore, the strength of mRNA secondary structure is assumed to be reduced in highly translated mRNAs. However, previous studies in vitro reported a positive correlation between mRNA folding strength and protein abundance. The counterintuitive finding suggests that mRNA secondary structure affects translation efficiency in an undetermined manner. Here, we analyzed the folding behavior of mRNA during translation and its effect on translation efficiency. We simulated translation process based on a novel computational model, taking into account the interactions among ribosomes, codon usage and mRNA secondary structures. We showed that mRNA secondary structure shortens ribosomal distance through the dynamics of folding strength. Notably, when adjacent ribosomes are close, mRNA secondary structures between them disappear, and codon usage determines the elongation rate. More importantly, our results showed that the combined effect of mRNA secondary structure and codon usage in highly translated mRNAs causes a short ribosomal distance in structural regions, which in turn eliminates the structures during translation, leading to a high elongation rate. Together, these findings reveal how the dynamics of mRNA secondary structure coupling with codon usage affect translation efficiency.
PMCID: PMC4005662  PMID: 24561808
12.  Three-tiered role of the pioneer factor GATA2 in promoting androgen-dependent gene expression in prostate cancer 
Nucleic Acids Research  2014;42(6):3607-3622.
In prostate cancer, androgen receptor (AR) binding and androgen-responsive gene expression are defined by hormone-independent binding patterns of the pioneer factors FoxA1 and GATA2. Insufficient evidence of the mechanisms by which GATA2 contributes to this process precludes complete understanding of a key determinant of tissue-specific AR activity. Our observations suggest that GATA2 facilitates androgen-responsive gene expression by three distinct modes of action. By occupying novel binding sites within the AR gene locus, GATA2 positively regulates AR expression before and after androgen stimulation. Additionally, GATA2 engages AR target gene enhancers prior to hormone stimulation, producing an active and accessible chromatin environment via recruitment of the histone acetyltransferase p300. Finally, GATA2 functions in establishing and/or sustaining basal locus looping by recruiting the Mediator subunit MED1 in the absence of androgen. These mechanisms may contribute to the generally positive role of GATA2 in defining AR genome-wide binding patterns that determine androgen-responsive gene expression profiles. We also find that GATA2 and FoxA1 exhibit both independent and codependent co-occupancy of AR target gene enhancers. Identifying these determinants of AR transcriptional activity may provide a foundation for the development of future prostate cancer therapeutics that target pioneer factor function.
PMCID: PMC3973339  PMID: 24423874
13.  High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method 
Nucleic Acids Research  2013;42(6):e39.
Alternative transcript processing is an important mechanism for generating functional diversity in genes. However, little is known about the precise functions of individual isoforms. In fact, proteins (translated from transcript isoforms), not genes, are the function carriers. By integrating multiple human RNA-seq data sets, we carried out the first systematic prediction of isoform functions, enabling high-resolution functional annotation of human transcriptome. Unlike gene function prediction, isoform function prediction faces a unique challenge: the lack of the training data—all known functional annotations are at the gene level. To address this challenge, we modelled the gene–isoform relationships as multiple instance data and developed a novel label propagation method to predict functions. Our method achieved an average area under the receiver operating characteristic curve of 0.67 and assigned functions to 15 572 isoforms. Interestingly, we observed that different functions have different sensitivities to alternative isoform processing, and that the function diversity of isoforms from the same gene is positively correlated with their tissue expression diversity. Finally, we surveyed the literature to validate our predictions for a number of apoptotic genes. Strikingly, for the famous ‘TP53’ gene, we not only accurately identified the apoptosis regulation function of its five isoforms, but also correctly predicted the precise direction of the regulation.
PMCID: PMC3973446  PMID: 24369432
14.  ASD v2.0: updated content and novel features focusing on allosteric regulation 
Nucleic Acids Research  2013;42(Database issue):D510-D516.
Allostery is the most direct and efficient way for regulation of biological macromolecule function and is induced by the binding of a ligand at an allosteric site topographically distinct from the orthosteric site. AlloSteric Database (ASD, has been developed to provide comprehensive information on allostery. Owing to the inherent high receptor selectivity and lower target-based toxicity, allosteric regulation is expected to assume a more prominent role in drug discovery and bioengineering, leading to the rapid growth of allosteric findings. In this updated version, ASD v2.0 has expanded to 1286 allosteric proteins, 565 allosteric diseases and 22 008 allosteric modulators. A total of 907 allosteric site-modulator structural complexes and >200 structural pairs of orthosteric/allosteric sites in the allosteric proteins were constructed for researchers to develop allosteric site and pathway tools in response to community demands. Up-to-date allosteric pathways were manually curated in the updated version. In addition, both the front-end and the back-end of ASD have been redesigned and enhanced to allow more efficient access. Taken together, these updates are useful for facilitating the investigation of allosteric mechanisms, allosteric target identification and allosteric drug discovery.
PMCID: PMC3965017  PMID: 24293647
15.  Multiple replication origins with diverse control mechanisms in Haloarcula hispanica 
Nucleic Acids Research  2013;42(4):2282-2294.
The use of multiple replication origins in archaea is not well understood. In particular, little is known about their specific control mechanisms. Here, we investigated the active replication origins in the three replicons of a halophilic archaeon, Haloarcula hispanica, by extensive gene deletion, DNA mutation and genome-wide marker frequency analyses. We revealed that individual origins are specifically dependent on their co-located cdc6 genes, and a single active origin/cdc6 pairing is essential and sufficient for each replicon. Notably, we demonstrated that the activities of oriC1 and oriC2, the two origins on the main chromosome, are differently controlled. A G-rich inverted repeat located in the internal region between the two inverted origin recognition boxes (ORBs) plays as an enhancer for oriC1, whereas the replication initiation at oriC2 is negatively regulated by an ORB-rich region located downstream of oriC2-cdc6E, likely via Cdc6E-titrating. The oriC2 placed on a plasmid is incompatible with the wild-type (but not the ΔoriC2) host strain, further indicating that strict control of the oriC2 activity is important for the cell. This is the first report revealing diverse control mechanisms of origins in haloarchaea, which has provided novel insights into the use and coordination of multiple replication origins in the domain of Archaea.
PMCID: PMC3936714  PMID: 24271389
16.  CR Cistrome: a ChIP-Seq database for chromatin regulators and histone modification linkages in human and mouse 
Nucleic Acids Research  2013;42(D1):D450-D458.
Diversified histone modifications (HMs) are essential epigenetic features. They play important roles in fundamental biological processes including transcription, DNA repair and DNA replication. Chromatin regulators (CRs), which are indispensable in epigenetics, can mediate HMs to adjust chromatin structures and functions. With the development of ChIP-Seq technology, there is an opportunity to study CR and HM profiles at the whole-genome scale. However, no specific resource for the integration of CR ChIP-Seq data or CR-HM ChIP-Seq linkage pairs is currently available. Therefore, we constructed the CR Cistrome database, available online at and, to further elucidate CR functions and CR-HM linkages. Within this database, we collected all publicly available ChIP-Seq data on CRs in human and mouse and categorized the data into four cohorts: the reader, writer, eraser and remodeler cohorts, together with curated introductions and ChIP-Seq data analysis results. For the HM readers, writers and erasers, we provided further ChIP-Seq analysis data for the targeted HMs and schematized the relationships between them. We believe CR Cistrome is a valuable resource for the epigenetics community.
PMCID: PMC3965064  PMID: 24253304
17.  SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database 
Nucleic Acids Research  2013;42(Database issue):D478-D484.
The Small Molecule Pathway Database (SMPDB, is a comprehensive, colorful, fully searchable and highly interactive database for visualizing human metabolic, drug action, drug metabolism, physiological activity and metabolic disease pathways. SMPDB contains >600 pathways with nearly 75% of its pathways not found in any other database. All SMPDB pathway diagrams are extensively hyperlinked and include detailed information on the relevant tissues, organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Since its last release in 2010, SMPDB has undergone substantial upgrades and significant expansion. In particular, the total number of pathways in SMPDB has grown by >70%. Additionally, every previously entered pathway has been completely redrawn, standardized, corrected, updated and enhanced with additional molecular or cellular information. Many SMPDB pathways now include transporter proteins as well as much more physiological, tissue, target organ and reaction compartment data. Thanks to the development of a standardized pathway drawing tool (called PathWhiz) all SMPDB pathways are now much more easily drawn and far more rapidly updated. PathWhiz has also allowed all SMPDB pathways to be saved in a BioPAX format. Significant improvements to SMPDB’s visualization interface now make the browsing, selection, recoloring and zooming of pathways far easier and far more intuitive. Because of its utility and breadth of coverage, SMPDB is now integrated into several other databases including HMDB and DrugBank.
PMCID: PMC3965088  PMID: 24203708
18.  A regulatory circuit comprising GATA1/2 switch and microRNA-27a/24 promotes erythropoiesis 
Nucleic Acids Research  2013;42(1):442-457.
Transcriptional networks orchestrate complex developmental processes, and such networks are commonly instigated by master regulators for development. By now, considerable progress has been made in elucidating GATA factor-dependent genetic networks that control red blood cell development. Here we reported that GATA-1 and GATA-2 co-regulated the expression of two microRNA genes, microRNA-27a and microRNA-24, with critical roles in regulating erythroid differentiation. In general, GATA-2 occupied the miR-27a∼24 promoter and repressed their transcription in immature erythroid progenitor cells. As erythropoiesis proceeded, GATA-1 directly activated miR-27a∼24 transcription, and this involved a GATA-1-mediated displacement of GATA-2 from chromatin, a process termed ‘GATA switch’. Furthermore, the mature miR-27a and miR-24 cooperatively inhibited GATA-2 translation and favoured the occupancy switch from GATA-2 to GATA-1, thus completing a positive feedback loop to promote erythroid maturation. In line with the essential role of GATA factors, ectopic expression of miR-27a or miR-24 promoted erythropoiesis in human primary CD34+ haematopoietic progenitor cells and mice, whereas attenuated miR-27 or miR-24 level led to impaired erythroid phenotypes in haematopoietic progenitor cells and zebrafish. Taken together, these data integrated micro RNA expression and function into GATA factor coordinated networks and provided mechanistic insight into a regulatory circuit that comprised GATA1/2 switch and miR-27a/24 in erythropoiesis.
PMCID: PMC3874166  PMID: 24049083
19.  Mismatch repair protein MSH2 regulates translesion DNA synthesis following exposure of cells to UV radiation 
Nucleic Acids Research  2013;41(22):10312-10322.
Translesion DNA synthesis (TLS) can use specialized DNA polymerases to insert and/or extend nucleotides across lesions, thereby limiting stalled replication fork collapse and the potential for cell death. Recent studies have shown that monoubiquitinated proliferating cell nuclear antigen (PCNA) plays an important role in recruitment of Y-family TLS polymerases to stalled replication forks after DNA damage treatment. To explore the possible roles of other factors that regulate the ultraviolet (UV)-induced assembly of specialized DNA polymerases at arrested replication forks, we performed immunoprecipitation experiments combined with mass spectrometry and established that DNA polymerase kappa (Polκ) can partner with MSH2, an important mismatch repair protein associated with hereditary non-polyposis colorectal cancer. We found that depletion of MSH2 impairs PCNA monoubiquitination and the formation of foci containing Polκ and other TLS polymerases after UV irradiation of cells. Interestingly, expression of MSH2 in Rad18-deficient cells increased UV-induced Polκ and REV1 focus formation without detectable changes in PCNA monoubiquitination, indicating that MSH2 can regulate post-UV focus formation by specialized DNA polymerases in both PCNA monoubiquitination-dependent and -independent fashions. Moreover, we observed that MSH2 can facilitate TLS across cyclobutane pyrimidine dimers photoproducts in living cells, presenting a novel role of MSH2 in post-UV cellular responses.
PMCID: PMC3905884  PMID: 24038355
20.  RecJ-like protein from Pyrococcus furiosus has 3′–5′ exonuclease activity on RNA: implications for proofreading of 3′-mismatched RNA primers in DNA replication 
Nucleic Acids Research  2013;41(11):5817-5826.
Replicative DNA polymerases require an RNA primer for leading and lagging strand DNA synthesis, and primase is responsible for the de novo synthesis of this RNA primer. However, the archaeal primase from Pyrococcus furiosus (Pfu) frequently incorporates mismatched nucleoside monophosphate, which stops RNA synthesis. Pfu DNA polymerase (PolB) cannot elongate the resulting 3′-mismatched RNA primer because it cannot remove the 3′-mismatched ribonucleotide. This study demonstrates the potential role of a RecJ-like protein from P. furiosus (PfRecJ) in proofreading 3′-mismatched ribonucleotides. PfRecJ hydrolyzes single-stranded RNA and the RNA strand of RNA/DNA hybrids in the 3′–5′ direction, and the kinetic parameters (Km and Kcat) of PfRecJ during RNA strand digestion are consistent with a role in proofreading 3′-mismatched RNA primers. Replication protein A, the single-stranded DNA–binding protein, stimulates the removal of 3′-mismatched ribonucleotides of the RNA strand in RNA/DNA hybrids, and Pfu DNA polymerase can extend the 3′-mismatched RNA primer after the 3′-mismatched ribonucleotide is removed by PfRecJ. Finally, we reconstituted the primer-proofreading reaction of a 3′-mismatched ribonucleotide RNA/DNA hybrid using PfRecJ, replication protein A, Proliferating cell nuclear antigen (PCNA) and PolB. Given that PfRecJ is associated with the GINS complex, a central nexus in archaeal DNA replication fork, we speculate that PfRecJ proofreads the RNA primer in vivo.
PMCID: PMC3675489  PMID: 23605041
21.  HMDB 3.0—The Human Metabolome Database in 2013 
Nucleic Acids Research  2012;41(Database issue):D801-D807.
The Human Metabolome Database (HMDB) ( is a resource dedicated to providing scientists with the most current and comprehensive coverage of the human metabolome. Since its first release in 2007, the HMDB has been used to facilitate research for nearly 1000 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 3.0) has been significantly expanded and enhanced over the 2009 release (version 2.0). In particular, the number of annotated metabolite entries has grown from 6500 to more than 40 000 (a 600% increase). This enormous expansion is a result of the inclusion of both ‘detected’ metabolites (those with measured concentrations or experimental confirmation of their existence) and ‘expected’ metabolites (those for which biochemical pathways are known or human intake/exposure is frequent but the compound has yet to be detected in the body). The latest release also has greatly increased the number of metabolites with biofluid or tissue concentration data, the number of compounds with reference spectra and the number of data fields per entry. In addition to this expansion in data quantity, new database visualization tools and new data content have been added or enhanced. These include better spectral viewing tools, more powerful chemical substructure searches, an improved chemical taxonomy and better, more interactive pathway maps. This article describes these enhancements to the HMDB, which was previously featured in the 2009 NAR Database Issue. (Note to referees, HMDB 3.0 will go live on 18 September 2012.).
PMCID: PMC3531200  PMID: 23161693
22.  DiffSplice: the genome-wide detection of differential splicing events with RNA-seq 
Nucleic Acids Research  2012;41(2):e39.
The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at
PMCID: PMC3553996  PMID: 23155066
23.  The novel long non-coding RNA CRG regulates Drosophila locomotor behavior 
Nucleic Acids Research  2012;40(22):11714-11727.
Long non-coding RNAs (lncRNAs) that have no protein-coding capacity make up a large proportion of the transcriptome of various species. Many lncRNAs are expressed within the animal central nervous system in spatial- and temporal-specific patterns, indicating that lncRNAs play important roles in cellular processes, neural development, and even in cognitive and behavioral processes. However, relatively little is known about their in vivo functions and underlying molecular mechanisms in the nervous system. Here, we report a neural-specific Drosophila lncRNA, CASK regulatory gene (CRG), which participates in locomotor activity and climbing ability by positively regulating its neighboring gene CASK (Ca2+/calmodulin-dependent protein kinase). CRG deficiency led to reduced locomotor activity and a defective climbing ability—phenotypes that are often seen in CASK mutant. CRG mutant also showed reduced CASK expression level while CASK over-expression could rescue the CRG mutant phenotypes in reciprocal. At the molecular level, CRG was required for the recruitment of RNA polymerase II to the CASK promoter regions, which in turn enhanced CASK expression. Our work has revealed new functional roles of lncRNAs and has provided insights to explore the pathogenesis of neurological diseases associated with movement disorders.
PMCID: PMC3526303  PMID: 23074190
24.  RecOR complex including RecR N-N dimer and RecO monomer displays a high affinity for ssDNA 
Nucleic Acids Research  2012;40(21):11115-11125.
RecR is an important recombination mediator protein in the RecFOR pathway. RecR together with RecO and RecF facilitates RecA nucleoprotein filament formation and homologous pairing. Structural and biochemical studies of Thermoanaerobacter tengcongensis RecR (TTERecR) and its series mutants revealed that TTERecR uses the N-N dimer as a basic functional unit to interact with TTERecO monomer. Two TTERecR N-N dimers form a ring-shaped tetramer via an interaction between their C-terminal regions. The tetramer is a result of crystallization only. Hydrophobic interactions between the entire helix-hairpin-helix domains within the N-terminal regions of two TTERecR monomers are necessary for formation of a RecR functional N-N dimer. The TTERecR N-N dimer conformation also affects formation of a hydrophobic patch, which creates a binding site for TTERecO in the TTERecR Toprim domain. In addition, we demonstrate that TTERecR does not bind single-stranded DNA (ssDNA) and binds double-stranded DNA very weakly, whereas TTERecOR complex can stably bind DNA, with a higher affinity for ssDNA than double-stranded DNA. Based on these results, we propose an interaction model for the RecOR:ssDNA complex.
PMCID: PMC3510498  PMID: 23019218
25.  Structural insight of a concentration-dependent mechanism by which YdiV inhibits Escherichia coli flagellum biogenesis and motility 
Nucleic Acids Research  2012;40(21):11073-11085.
YdiV is a negative regulator of cell motility. It interacts with FlhD4C2 complex, a product of flagellar master operon, which works as the transcription activator of all other flagellar operons. Here, we report the crystal structures of YdiV and YdiV2–FlhD2 complex at 1.9 Å and 2.9 Å resolutions, respectively. Interestingly, YdiV formed multiple types of complexes with FlhD4C2. YdiV1–FlhD4C2 and YdiV2–FlhD4C2 still bound to DNA, while YdiV3–FlhD4C2 and YdiV4–FlhD4C2 did not. DNA bound FlhD4C2 through wrapping around the FlhC subunit rather than the FlhD subunit. Structural analysis showed that only two peripheral FlhD subunits were accessible for YdiV binding, forming the YdiV2–FlhD4C2 complex without affecting the integrity of ring-like structure. YdiV2–FlhD2 structure and the negative staining electron microscopy reconstruction of YdiV4–FlhD4C2 suggested that the third and fourth YdiV molecule bound to the FlhD4C2 complex through squeezing into the ring-like structure of FlhD4C2 between the two internal D subunits. Consequently, the ring-like structure opened up, and the complex lost DNA-binding ability. Thus, YdiV inhibits FlhD4C2 only at relatively high concentrations.
PMCID: PMC3510510  PMID: 23002140

Results 1-25 (641)