1.  Global intron retention mediated gene regulation during CD4+ T cell activation 
Nucleic Acids Research  2016;44(14):6817-6829.
T cell activation is a well-established model for studying cellular responses to exogenous stimulation. Using strand-specific RNA-seq, we observed that intron retention is prevalent in polyadenylated transcripts in resting CD4+ T cells and is significantly reduced upon T cell activation. Several lines of evidence suggest that intron-retained transcripts are less stable than fully spliced transcripts. Strikingly, the decrease in intron retention (IR) levels correlate with the increase in steady-state mRNA levels. Further, the majority of the genes upregulated in activated T cells are accompanied by a significant reduction in IR. Of these 1583 genes, 185 genes are predominantly regulated at the IR level, and highly enriched in the proteasome pathway, which is essential for proper T cell proliferation and cytokine release. These observations were corroborated in both human and mouse CD4+ T cells. Our study revealed a novel post-transcriptional regulatory mechanism that may potentially contribute to coordinated and/or quick cellular responses to extracellular stimuli such as an acute infection.
PMCID: PMC5001615  PMID: 27369383
2.  CCL20 mediates RANK/RANKL-induced epithelial-mesenchymal transition in endometrial cancer cells 
Oncotarget  2016;7(18):25328-25339.
RANK/RANKL facilitates migration/invasion via epithelial-mesenchymal transition (EMT) in certain malignant tumors. The relationship and mechanism between RANK/RANKL and EMT in endometrial cancer (EC) cells, however, remain unclear. In this study, we firstly showed that RANK/RANKL activation was correlated with EC staging and EMT markers in human EC tissue specimen. RANK/RANKL promoted migration/invasion and initiated EMT of EC cell lines. Then, protein chip analysis and enzyme-linked immunosorbent assay (ELISA) revealed that the expression and secretion of chemokine ligand 20 (CCL20) was dramatically enhanced in RANKL-treated RANK over-expressed EC cells. Moreover, the higher level of CCL20 in both serum and tumor tissue was detected in orthotopic transplantation mouse models. Finally, we confirmed that CCL20 contributed to invasion and EMT of RANK over-expressed EC cells. In summary, all data supported the hypothesis that RANK/RANKL elevated the expression and secretion of CCL20 in EC cells, which promoted cancer progression through EMT.
PMCID: PMC5041907  PMID: 27015366
CCL20; RANK; RANKL; epithelial-mesenchymal transition; endometrial cancer
3.  GxE Interactions between FOXO Genotypes and Tea Drinking Are Significantly Associated with Cognitive Disability at Advanced Ages in China 
Logistic regression analysis based on data from 822 Han Chinese oldest old aged 92+ demonstrated that interactions between carrying FOXO1A-266 or FOXO3-310 or FOXO3-292 and tea drinking at around age 60 or at present time were significantly associated with lower risk of cognitive disability at advanced ages. Associations between tea drinking and reduced cognitive disability were much stronger among carriers of the genotypes of FOXO1A-266 or FOXO3-310 or FOXO3-292 compared with noncarriers, and it was reconfirmed by analysis of three-way interactions across FOXO genotypes, tea drinking at around age 60, and at present time. Based on prior findings from animal and human cell models, we postulate that intake of tea compounds may activate FOXO gene expression, which in turn may positively affect cognitive function in the oldest old population. Our empirical findings imply that the health benefits of particular nutritional interventions, including tea drinking, may, in part, depend upon individual genetic profiles.
PMCID: PMC4447795  PMID: 24895270
FOXO genotypes; Tea drinking; GxE interactions; Cognitive disability; Oldest old.
4.  Oroxylin A modulates mitochondrial function and apoptosis in human colon cancer cells by inducing mitochondrial translocation of wild-type p53 
Oncotarget  2016;7(13):17009-17020.
Oroxylin A is a flavonoid extracted from the root of Scutellaria baicalensis Georgi. We previously demonstrated that oroxylin A induced apoptosis in human colon cancer cells via the mitochondrial pathway. In the present study, we investigated the underlying mechanisms responsible for the mitochondrial apoptotic pathway triggered by oroxylin A. p53 regulates mitochondrial survival, mitochondrial DNA integrity, and protection from oxidative stress. We determined that oroxylin A induces p53 mitochondrial translocation and inhibits SOD2 activity. Additionally, our studies demonstrate that oroxylin A promotes the formation and mitochondrial translocation of the p53-Recql4 complex in HCT-116 cells. Finally, we showed that oroxylin A triggers cytosolic p53 activation, thereby promoting apoptosis. Mitochondrial translocation of p53 was also validated in vivo. Thus, oroxylin A induces mitochondrial translocation of p53 and leads to mitochondrial dysfunction in human colon cancer cells.
PMCID: PMC4941367  PMID: 26958937
p53; translocation; oroxylin A; oxidative stress
5.  Novel loci and pathways significantly associated with longevity 
Scientific Reports  2016;6:21243.
Only two genome-wide significant loci associated with longevity have been identified so far, probably because of insufficient sample sizes of centenarians, whose genomes may harbor genetic variants associated with health and longevity. Here we report a genome-wide association study (GWAS) of Han Chinese with a sample size 2.7 times the largest previously published GWAS on centenarians. We identified 11 independent loci associated with longevity replicated in Southern-Northern regions of China, including two novel loci (rs2069837-IL6; rs2440012-ANKRD20A9P) with genome-wide significance and the rest with suggestive significance (P < 3.65 × 10−5). Eight independent SNPs overlapped across Han Chinese, European and U.S. populations, and APOE and 5q33.3 were replicated as longevity loci. Integrated analysis indicates four pathways (starch, sucrose and xenobiotic metabolism; immune response and inflammation; MAPK; calcium signaling) highly associated with longevity (P ≤ 0.006) in Han Chinese. The association with longevity of three of these four pathways (MAPK; immunity; calcium signaling) is supported by findings in other human cohorts. Our novel finding on the association of starch, sucrose and xenobiotic metabolism pathway with longevity is consistent with the previous results from Drosophilia. This study suggests protective mechanisms including immunity and nutrient metabolism and their interactions with environmental stress play key roles in human longevity.
PMCID: PMC4766491  PMID: 26912274
6.  The Protein Interaction of RNA Helicase B (RhlB) and Polynucleotide Phosphorylase (PNPase) Contributes to the Homeostatic Control of Cysteine in Escherichia coli* 
The Journal of Biological Chemistry  2015;290(50):29953-29963.
PNPase, one of the major enzymes with 3′ to 5′ single-stranded RNA degradation and processing activities, can interact with the RNA helicase RhlB independently of RNA degradosome formation in Escherichia coli. Here, we report that loss of interaction between RhlB and PNPase impacts cysteine homeostasis in E. coli. By random mutagenesis, we identified a mutant RhlBP238L that loses 75% of its ability to interact with PNPase but retains normal interaction with RNase E and RNA, in addition to exhibiting normal helicase activity. Applying microarray analyses to an E. coli strain with impaired RNA degradosome formation, we investigated the biological consequences of a weakened interaction between RhlB and PNPase. We found significant increases in 11 of 14 genes involved in cysteine biosynthesis. Subsequent Northern blot analyses showed that the up-regulated transcripts were the result of stabilization of the cysB transcript encoding a transcriptional activator for the cys operons. Furthermore, Northern blots of PNPase or RhlB mutants showed that RhlB-PNPase plays both a catalytic and structural role in regulating cysB degradation. Cells expressing the RhlBP238L mutant exhibited an increase in intracellular cysteine and an enhanced anti-oxidative response. Collectively, this study suggests a mechanism by which bacteria use the PNPase-RhlB exosome-like complex to combat oxidative stress by modulating cysB mRNA degradation.
PMCID: PMC4705995  PMID: 26494621
bacterial metabolism; Escherichia coli (E. coli); exosome complex; oxidative stress; ribonuclease; RNA degradation; Cysteine homeostasis; PNPase; RhlB
7.  An intriguing RNA species—perspectives of circularized RNA 
Protein & Cell  2015;6(12):871-880.
Circular RNAs (circRNAs), a kind of covalently closed RNA molecule, were used to be considered a type of by-products of mis-splicing events and were discovered sporadically due to the technological limits in the early years. With the great technological progress such as high-throughput next-generation sequencing, numerous circRNAs have recently been detected in many species. CircRNAs were expressed in a spatio-temporally specific manner, suggesting their regulatory functional potentials were overlooked previously. Intriguingly, some circRNAs were indeed found with critical physiological functions in certain circumstances. CircRNAs have a more stable molecular structure that can resist to exoribonuclease comparing to those linear ones, and their molecular functions include microRNA sponge, regulatory roles in transcription, mRNA traps that compete with linear splicing, templates for translation and possibly other presently unknown roles. Here, we review the discovery and characterization of circRNAs, the origination and formation mechanism, the physiological functions and the molecular roles, along with the methods for detection of circRNAs. We further look into the future and propose key questions to be answered for these magical RNA molecules.
PMCID: PMC4656206  PMID: 26349458
circular RNA; back splice; gene regulation
8.  MitoRCA-seq reveals unbalanced cytocine to thymine transition in Polg mutant mice 
Scientific Reports  2015;5:12049.
Mutations in mitochondrial DNA (mtDNA) can lead to a wide range of human diseases. We have developed a deep sequencing strategy, mitoRCA-seq, to detect low-frequency mtDNA point mutations starting with as little as 1 ng of total DNA. It employs rolling circle amplification, which enriches the full-length circular mtDNA by either custom mtDNA-specific primers or a commercial kit, and minimizes the contamination of nuclear encoded mitochondrial DNA (Numts). By analyzing the mutation profiles of wild-type and Polg (mitochondrial DNA polymerase γ) mutant mice, we found that mice with the proofreading deficient mtDNA polymerase have a significantly higher mutation load by expanding the number of mutation sites and to a lesser extent by elevating the mutation frequency at existing sites even before the premature aging phenotypes appear. Strikingly, cytocine (C) to thymine (T) transitions are found to be overrepresented in the mtDNA of Polg mutated mice. The C → T transition, compared to other types of mutations, tends to increase the hydrophobicity of the underlying amino acids, and may contribute to the impaired protein function of the Polg mutant mice. Taken together, our findings may provide clues to further investigate the molecular mechanism underlying premature aging phenotype in Polg mutant mice.
PMCID: PMC4648470  PMID: 26212336
9.  Lactate promotes PGE2 synthesis and gluconeogenesis in monocytes to benefit the growth of inflammation-associated colorectal tumor 
Oncotarget  2015;6(18):16198-16214.
Reprogramming energy metabolism, such as enhanced glycolysis, is an Achilles' heel in cancer treatment. Most studies have been performed on isolated cancer cells. Here, we studied the energy-transfer mechanism in inflammatory tumor microenvironment. We found that human THP-1 monocytes took up lactate secreted from tumor cells through monocarboxylate transporter 1. In THP-1 monocytes, the oxidation product of lactate, pyruvate competed with the substrate of proline hydroxylase and inhibited its activity, resulting in the stabilization of HIF-1α under normoxia. Mechanistically, activated hypoxia-inducible factor 1-α in THP-1 monocytes promoted the transcriptions of prostaglandin-endoperoxide synthase 2 and phosphoenolpyruvate carboxykinase, which were the key enzyme of prostaglandin E2 synthesis and gluconeogenesis, respectively, and promote the growth of human colon cancer HCT116 cells. Interestingly, lactate could not accelerate the growth of colon cancer directly in vivo. Instead, the human monocytic cells affected by lactate would play critical roles to ‘feed’ the colon cancer cells. Thus, recycling of lactate for glucose regeneration was reported in cancer metabolism. The anabolic metabolism of monocytes in inflammatory tumor microenvironment may be a critical event during tumor development, allowing accelerated tumor growth.
PMCID: PMC4594635  PMID: 25938544
lactate; HIF-1α; gluconeogenesis; inflammation; microenvironment
10.  Oroxylin A promotes PTEN-mediated negative regulation of MDM2 transcription via SIRT3-mediated deacetylation to stabilize p53 and inhibit glycolysis in wt-p53 cancer cells 
p53 plays important roles in regulating the metabolic reprogramming of cancer, such as aerobic glycolysis. Oroxylin A is a natural active flavonoid with strong anticancer effects both in vitro and in vivo.
wt-p53 (MCF-7 and HCT116 cells) cancer cells and p53-null H1299 cancer cells were used. The glucose uptake and lactate production were analyzed using Lactic Acid production Detection kit and the Amplex Red Glucose Assay Kit. Then, the protein levels and RNA levels of p53, mouse double minute 2 (MDM2), and p53-targeted glycolytic enzymes were quantified using Western blotting and quantitative polymerase chain reaction (PCR), respectively. Immunoprecipitation were performed to assess the binding between p53, MDM2, and sirtuin-3 (SIRT3), and the deacetylation of phosphatase and tensin homolog (PTEN). Reporter assays were performed to assess the transcriptional activity of PTEN. In vivo, effects of oroxylin A was investigated in nude mice xenograft tumor-inoculated MCF-7 or HCT116 cells.
Here, we analyzed the underlying mechanisms that oroxylin A regulated p53 level and glycolytic metabolism in wt-p53 cancer cells, and found that oroxylin A inhibited glycolysis through upregulating p53 level. Oroxylin A did not directly affect the transcription of wt-p53, but suppressed the MDM2-mediated degradation of p53 via downregulating MDM2 transcription in wt-p53 cancer cells. In further studies, we found that oroxylin A induced a reduction in MDM2 transcription by promoting the lipid phosphatase activity of phosphatase and tensin homolog, which was upregulated via sirtuin3-mediated deacetylation. In vivo, oroxylin A inhibited the tumor growth of nude mice-inoculated MCF-7 or HCT116 cells. The expression of MDM2 protein in tumor tissue was downregulated by oroxylin A as well.
These results provide a p53-independent mechanism of MDM2 transcription and reveal the potential of oroxylin A on glycolytic regulation in both wt-p53 and mut-p53 cancer cells. The studies have important implications for the investigation on anticancer effects of oroxylin A, and provide the academic basis for the clinical trial of oroxylin A in cancer patients.
Electronic supplementary material
The online version of this article (doi:10.1186/s13045-015-0137-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4419472  PMID: 25902914
Oroxylin A; Glycolysis; MDM2; PTEN; SIRT3
11.  A Novel Role of CDX1 in Embryonic Epicardial Development 
PLoS ONE  2014;9(7):e103271.
The molecular mechanism that regulates epicardial development has yet to be understood. In this study, we explored the function of CDX1, a Caudal-related family member, in epicardial epithelial-to-mesenchymal transition (EMT) and in the migration and the differentiation of epicardium-derived progenitors into vascular smooth muscle cells. We detected a transient expression of CDX1 in murine embryonic hearts at 11.5 days post coitum (dpc). Using a doxycycline-inducible CDX1 mouse model, primary epicardium, and ex vivo heart culture, we further demonstrated that ectopic expression of CDX1 promoted epicardial EMT. In addition, a low-dose CDX1 induction led to enhanced migration and differentiation of epicardium-derived cells into α-SMA+ vascular smooth muscles. In contrast, either continued high-level induction of CDX1 or CDX1 deficiency attenuated the ability of epicardium-derived cells to migrate and to mature into smooth muscles induced by TGF-β1. Further RNA-seq analyses showed that CDX1 induction altered the transcript levels of genes involved in neuronal development, angiogenesis, and cell adhesions required for EMT. Our data have revealed a previously undefined role of CDX1 during epicardial development, and suggest that transient expression of CDX1 promotes epicardial EMT, whereas subsequent down-regulation of CDX1 after 11.5 dpc in mice is necessary for further subepicardial invasion of EPDCs and contribution to coronary vascular endothelium or smooth muscle cells.
PMCID: PMC4113346  PMID: 25068460
12.  Certain Adenylated Non-Coding RNAs, Including 5′ Leader Sequences of Primary MicroRNA Transcripts, Accumulate in Mouse Cells following Depletion of the RNA Helicase MTR4 
PLoS ONE  2014;9(6):e99430.
RNA surveillance plays an important role in posttranscriptional regulation. Seminal work in this field has largely focused on yeast as a model system, whereas exploration of RNA surveillance in mammals is only recently begun. The increased transcriptional complexity of mammalian systems provides a wider array of targets for RNA surveillance, and, while many questions remain unanswered, emerging data suggest the nuclear RNA surveillance machinery exhibits increased complexity as well. We have used a small interfering RNA in mouse N2A cells to target the homolog of a yeast protein that functions in RNA surveillance (Mtr4p). We used high-throughput sequencing of polyadenylated RNAs (PA-seq) to quantify the effects of the mMtr4 knockdown (KD) on RNA surveillance. We demonstrate that overall abundance of polyadenylated protein coding mRNAs is not affected, but several targets of RNA surveillance predicted from work in yeast accumulate as adenylated RNAs in the mMtr4KD. microRNAs are an added layer of transcriptional complexity not found in yeast. After Drosha cleavage separates the pre-miRNA from the microRNA's primary transcript, the byproducts of that transcript are generally thought to be degraded. We have identified the 5′ leading segments of pri-miRNAs as novel targets of mMtr4 dependent RNA surveillance.
PMCID: PMC4057207  PMID: 24926684
13.  HnRNP L and HnRNP A1 Induce Extended U1 snRNA Interactions with an Exon to Repress Spliceosome Assembly 
Molecular cell  2013;49(5):972-982.
Pre-mRNA splicing is catalyzed through the activity of the spliceosome, a dynamic enzymatic complex. Forcing aberrant interactions within the spliceosome can reduce splicing efficiency and alter splice site choice; however, it is unknown whether such alterations are naturally exploited mechanisms of splicing regulation. Here we demonstrate that hnRNP L represses CD45 exon 4 by recruiting hnRNP A1 to a sequence upstream of the 5’ splice site. Together, hnRNP L and A1 induce extended contacts between the 5’ splice site-bound U1 snRNA and neighboring exonic sequences which, in turn, inhibit stable association of U6 snRNA and subsequent catalysis. Importantly, analysis of several exons regulated by hnRNP L shows a clear relationship between the potential for binding of hnRNP A1 and U1 snRNA, and the effect of hnRNP L on splicing. Together our results demonstrate conformational perturbations within the spliceosome are a naturally occurring and generalizable mechanism for controlling alternative splicing decisions.
PMCID: PMC3595347  PMID: 23394998
alternative splicing; U1 snRNA; tri-snRNP; spliceosome assembly; hnRNP L; hnRNP A1
14.  Human Papillomavirus Type 58 Genome Variations and RNA Expression in Cervical Lesions 
Journal of Virology  2013;87(16):9313-9322.
Human papillomavirus type 58 (HPV58) is relatively prevalent in China and other Asian countries. In this study, the HPV58 genome in cervical lesions was decoded from five grade 2 or 3 cervical intraepithelial neoplasia lesion (CIN2/3) samples and five cervical cancer tissues using rolling-circle amplification of total cell DNA and deep sequencing and verified by whole-genome cloning and sequencing. HPV58 isolates from China feature a total of 52 nucleotide substitutions (0.66%) from the reference HPV58 sequence, which appear mainly in two regions, with 12 from nucleotides (nt) 3430 to 4136 covering the E2/E4/E5 open reading frames (ORFs) and 13 from nt 4621 to 5540 covering the L2 ORF; these could be grouped as HPV58 Chinese Zhejiang-1, -2, and -3 (CNZJ-1, -2, and -3) according to their sequence similarities and restriction enzyme digestion. Phylogenetically, CNZJ-3 is similar to the reference HPV58 sublineage A1 sequence. The other two are close to sublineage A2. Analysis of cervical lesion-derived RNA revealed abundant HPV58 early transcripts spliced at the E6 and E1/E2 ORFs, where two 5′ splice sites at nt 232 and nt 898 and two 3′ splice sites at nt 510 and nt 3355 can be identified. Thus, our study represents the first genome-wide analysis of HPV58 and its expression in cervical lesions.
PMCID: PMC3754072  PMID: 23785208
15.  Genome sequencing accuracy by RCA-seq versus long PCR template cloning and sequencing in identification of human papillomavirus type 58 
Cell & Bioscience  2014;4:5.
Genome variations in human papillomaviruses (HPVs) are common and have been widely investigated in the past two decades. HPV genotyping depends on the finding of the viral genome variations in the L1 ORF. Other parts of the viral genome variations have also been implicated as a possible genetic factor in viral pathogenesis and/or oncogenicity.
In this study, the HPV58 genome in cervical lesions was completely sequenced both by rolling-circle amplification of total cell DNA and deep sequencing (RCA-seq) and by long PCR template cloning and sequencing. By comparison of three HPV58 genome sequences decoded from three clinical samples to reference HPV-58, we demonstrated that RCA-seq is much more accurate than long-PCR template cloning and sequencing in decoding HPV58 genome. Three HPV58 genomes decoded by RCA-seq displayed a total of 52 nucleotide substitutions from reference HPV58, which could be verified by long PCR template cloning and sequencing. However, the long PCR template cloning and sequencing led to additional nucleotide substitutions, insertions, and deletions from an authentic HPV58 genome in a clinical sample, which vary from one cloned sequence to another. Because the inherited error-prone nature of Tgo DNA polymerase used in preparation of the long PCR templates of HPV58 genome from the clinical samples, the measurable error rate in incorporation of nucleotide into an elongating DNA template was about 0.149% ±0.038% in our studies.
Since PCR template cloning and sequencing is widely used in identification of single nucleotide polymorphism (SNP), our data indicate that a serious caution should be taken in finding of true SNPs in various genetic studies.
PMCID: PMC3903022  PMID: 24410913
Human papillomaviruses; HPV58; Cervical cancer; Single nucleotide polymorphism; Genotyping; Genome variations; Rolling circle amplification; DNA deep sequencing
16.  PfSETvs methylation of histone H3K36 represses virulence genes in Plasmodium falciparum 
Nature  2013;499(7457):223-227.
The variant antigen, Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1), expressed on the surface of P. falciparum infected Red Blood Cells (iRBCs) is a critical virulence factor for malaria1. Each parasite encodes 60 antigenically distinct var genes encoding PfEMP1s, but during infection the clonal parasite population expresses only one gene at a time before switching to the expression of a new variant antigen as an immune evasion mechanism to avoid the host’s antibody responses2,3. The mechanism by which 59 of the 60 var genes are silenced remains largely unknown4–7. Here we show that knocking out the P. falciparum variant-silencing SET gene (PfSETvs), which encodes an ortholog of Drosophila melanogaster ASH1 and controls histone H3 lysine 36 trimethylation (H3K36me3) on var genes, results in the transcription of virtually all var genes in the single parasite nuclei and their expression as proteins on the surface of individual iRBCs. PfSETvs-dependent H3K36me3 is present along the entire gene body including the transcription start site (TSS) to silence var genes. With low occupancy of PfSETvs at both the TSS of var genes and the intronic promoter, expression of var genes coincides with transcription of their corresponding antisense long non-coding RNA (lncRNA). These results uncover a novel role of the PfSETvs-dependent H3K36me3 in silencing var genes in P. falciparum that might provide a general mechanism by which orthologs of PfSETvs repress gene expression in other eukaryotes. PfSETvs knockout parasites expressing all PfEMP1s may also be applied to the development of a malaria vaccine.
PMCID: PMC3770130  PMID: 23823717
17.  A Viral Genome Landscape of RNA Polyadenylation from KSHV Latent to Lytic Infection 
PLoS Pathogens  2013;9(11):e1003749.
RNA polyadenylation (pA) is one of the major steps in regulation of gene expression at the posttranscriptional level. In this report, a genome landscape of pA sites of viral transcripts in B lymphocytes with Kaposi sarcoma-associated herpesvirus (KSHV) infection was constructed using a modified PA-seq strategy. We identified 67 unique pA sites, of which 55 could be assigned for expression of annotated ∼90 KSHV genes. Among the assigned pA sites, twenty are for expression of individual single genes and the rest for multiple genes (average 2.7 genes per pA site) in cluster-gene loci of the genome. A few novel viral pA sites that could not be assigned to any known KSHV genes are often positioned in the antisense strand to ORF8, ORF21, ORF34, K8 and ORF50, and their associated antisense mRNAs to ORF21, ORF34 and K8 could be verified by 3′RACE. The usage of each mapped pA site correlates to its peak size, the larger (broad and wide) peak size, the more usage and thus, the higher expression of the pA site-associated gene(s). Similar to mammalian transcripts, KSHV RNA polyadenylation employs two major poly(A) signals, AAUAAA and AUUAAA, and is regulated by conservation of cis-elements flanking the mapped pA sites. Moreover, we found two or more alternative pA sites downstream of ORF54, K2 (vIL6), K9 (vIRF1), K10.5 (vIRF3), K11 (vIRF2), K12 (Kaposin A), T1.5, and PAN genes and experimentally validated the alternative polyadenylation for the expression of KSHV ORF54, K11, and T1.5 transcripts. Together, our data provide not only a comprehensive pA site landscape for understanding KSHV genome structure and gene expression, but also the first evidence of alternative polyadenylation as another layer of posttranscriptional regulation in viral gene expression.
Author Summary
A genome-wide polyadenylation landscape in the expression of human herpesviruses has not been reported. In this study, we provide the first genome landscape of viral RNA polyadenylation sites in B cells from KSHV latent to lytic infection by using a modified PA-seq protocol and selectively validated by 3′ RACE. We found that KSHV genome contains 67 active pA sites for the expression of its ∼90 genes and a few antisense transcripts. Among the mapped pA sites, a large fraction of them are for the expression of cluster genes and the production of bicistronic or polycistronic transcripts from KSHV genome and only one-third are used for the expression of single genes. We found that the size of individual PA peaks is positively correlated with the usage of corresponding pA site, which is determined by the number of reads within the PA peak from latent to lytic KSHV infection, and the strength of cis-elements surrounding KSHV pA site determines the expression level of viral genes. Lastly, we identified and experimentally validated alternative polyadenylation of KSHV ORF54, T1.5, and K11 during viral lytic infection. To our knowledge, this is the first report on alternative polyadenylation events in KSHV infection.
PMCID: PMC3828183  PMID: 24244170
18.  Distinct polyadenylation landscapes of diverse human tissues revealed by a modified PA-seq strategy 
BMC Genomics  2013;14:615.
Polyadenylation is a key regulatory step in eukaryotic gene expression and one of the major contributors of transcriptome diversity. Aberrant polyadenylation often associates with expression defects and leads to human diseases.
To better understand global polyadenylation regulation, we have developed a polyadenylation sequencing (PA-seq) approach. By profiling polyadenylation events in 13 human tissues, we found that alternative cleavage and polyadenylation (APA) is prevalent in both protein-coding and noncoding genes. In addition, APA usage, similar to gene expression profiling, exhibits tissue-specific signatures and is sufficient for determining tissue origin. A 3′ untranslated region shortening index (USI) was further developed for genes with tandem APA sites. Strikingly, the results showed that different tissues exhibit distinct patterns of shortening and/or lengthening of 3′ untranslated regions, suggesting the intimate involvement of APA in establishing tissue or cell identity.
This study provides a comprehensive resource to uncover regulated polyadenylation events in human tissues and to characterize the underlying regulatory mechanism.
PMCID: PMC3848854  PMID: 24025092
19.  Cellular RNA Binding Proteins NS1-BP and hnRNP K Regulate Influenza A Virus RNA Splicing 
PLoS Pathogens  2013;9(6):e1003460.
Influenza A virus is a major human pathogen with a genome comprised of eight single-strand, negative-sense, RNA segments. Two viral RNA segments, NS1 and M, undergo alternative splicing and yield several proteins including NS1, NS2, M1 and M2 proteins. However, the mechanisms or players involved in splicing of these viral RNA segments have not been fully studied. Here, by investigating the interacting partners and function of the cellular protein NS1-binding protein (NS1-BP), we revealed novel players in the splicing of the M1 segment. Using a proteomics approach, we identified a complex of RNA binding proteins containing NS1-BP and heterogeneous nuclear ribonucleoproteins (hnRNPs), among which are hnRNPs involved in host pre-mRNA splicing. We found that low levels of NS1-BP specifically impaired proper alternative splicing of the viral M1 mRNA segment to yield the M2 mRNA without affecting splicing of mRNA3, M4, or the NS mRNA segments. Further biochemical analysis by formaldehyde and UV cross-linking demonstrated that NS1-BP did not interact directly with viral M1 mRNA but its interacting partners, hnRNPs A1, K, L, and M, directly bound M1 mRNA. Among these hnRNPs, we identified hnRNP K as a major mediator of M1 mRNA splicing. The M1 mRNA segment generates the matrix protein M1 and the M2 ion channel, which are essential proteins involved in viral trafficking, release into the cytoplasm, and budding. Thus, reduction of NS1-BP and/or hnRNP K levels altered M2/M1 mRNA and protein ratios, decreasing M2 levels and inhibiting virus replication. Thus, NS1-BP-hnRNPK complex is a key mediator of influenza A virus gene expression.
Author Summary
Influenza A virus is a major human pathogen, which causes approximately 500,000 deaths/year worldwide. In pandemic years, influenza infection can lead to even higher mortality rates, as in 1918, when ∼30–50 million deaths occurred worldwide. In this manuscript, we identified a novel function for the cellular protein termed NS1-BP as a regulator of the influenza A virus life cycle. We found that NS1-BP, together with other host factors, mediates the expression of a key viral protein termed M2. NS1-BP and its interacting partner hnRNP K specifically regulate alternative splicing of the viral M1 mRNA segment, which generates the M2 mRNA that is translated into the essential viral M2 protein. The M2 protein is key for viral uncoating and entry into the host cell cytoplasm. Altogether, inhibition of NS1-BP and hnRNP K functions regulate influenza A virus gene expression and replication. In sum, these studies revealed new functions for the cellular proteins NS1-BP and hnRNP K during viral RNA expression, which facilitate the influenza A virus life cycle.
PMCID: PMC3694860  PMID: 23825951
20.  Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation 
Bioinformatics  2013;29(13):i108-i116.
Motivation: Pre-mRNA cleavage and polyadenylation are essential steps for 3′-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage/polyadenylation sites (polyA sites), which are frequently constrained by sequence content and position. More than 50% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3′-untranslated regions, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries.
Results: We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three human adult tissue types. We specified a linear-effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual tissue types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical polyadenylation signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation.
Availability: Raw data are deposited on SRA, accession numbers: brain SRX208132, kidney SRX208087 and liver SRX208134. Processed datasets as well as model code are published on our website:
PMCID: PMC3694680  PMID: 23812974
21.  Ancient gene transfer from algae to animals: Mechanisms and evolutionary significance 
Horizontal gene transfer (HGT) is traditionally considered to be rare in multicellular eukaryotes such as animals. Recently, many genes of miscellaneous algal origins were discovered in choanoflagellates. Considering that choanoflagellates are the existing closest relatives of animals, we speculated that ancient HGT might have occurred in the unicellular ancestor of animals and affected the long-term evolution of animals.
Through genome screening, phylogenetic and domain analyses, we identified 14 gene families, including 92 genes, in the tunicate Ciona intestinalis that are likely derived from miscellaneous photosynthetic eukaryotes. Almost all of these gene families are distributed in diverse animals, suggesting that they were mostly acquired by the common ancestor of animals. Their miscellaneous origins also suggest that these genes are not derived from a particular algal endosymbiont. In addition, most genes identified in our analyses are functionally related to molecule transport, cellular regulation and methylation signaling, suggesting that the acquisition of these genes might have facilitated the intercellular communication in the ancestral animal.
Our findings provide additional evidence that algal genes in aplastidic eukaryotes are not exclusively derived from historical plastids and thus important for interpreting the evolution of eukaryotic photosynthesis. Most importantly, our data represent the first evidence that more anciently acquired genes might exist in animals and that ancient HGT events have played an important role in animal evolution.
PMCID: PMC3494510  PMID: 22690978
Gene transfer; Endosymbiosis; Plastids; Animal evolution
23.  A paired-end sequencing strategy to map the complex landscape of transcription initiation 
Nature methods  2010;7(7):521-527.
Recent high-throughput sequencing protocols have uncovered the complexity of mammalian transcription by RNA polymerase II, helping to define several initiation patterns in which transcription start sites (TSSs) cluster within both narrow and broad genomic windows. Here, we describe a paired-end sequencing strategy, which enables more robust mapping and characterization of capped transcripts. This strategy was applied to explore the transcription initiation landscape in the Drosophila melanogaster embryo. Extending the previous findings in mammals, we found that fly promoters exhibit distinct initiation patterns, which are linked to specific promoter sequence motifs. Furthermore, we identified a large number of 5′ capped transcripts originating from coding exons; analyses support that they are unlikely the result of alternative TSSs, but rather the product of post-transcriptional modifications. Taken together, paired-end TSS analysis is demonstrated to be a powerful method to uncover the transcriptional complexity of eukaryotic genomes.
PMCID: PMC3197272  PMID: 20495556
24.  Transcription Initiation Patterns Indicate Divergent Strategies for Gene Regulation at the Chromatin Level 
PLoS Genetics  2011;7(1):e1001274.
The application of deep sequencing to map 5′ capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: “focused” promoters with transcription start sites (TSSs) that occur in a narrowly defined genomic span and “dispersed” promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5′ capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization.
Author Summary
How are genes transcribed at the right levels and under the right conditions? Transcription regulation in eukaryotes has long been proposed to work by a division of labor: ubiquitous DNA sequence features in the core promoter region, close to the transcription start site (TSS) of genes, were thought to generically encode information to recruit RNA polymerase to initiate transcription, while specific sequence features, often distal from the genes, were thought to boost expression under the right conditions. Supporting the generic function of core promoters, genome-wide chromatin maps showed a stereotypical arrangement of well-spaced nucleosomes providing access to the TSS. High-throughput sequencing has generated genome-wide TSS maps at high resolution, which show that promoters exhibit different initiation patterns, ranging from focused start sites to dispersed regions. Linking these patterns to chromatin maps, we now find distinct core promoter classes, those in which the TSS location is defined broadly on the chromatin level and those in which the TSS is defined by precisely positioned sequence features. Notably, these architectures are conserved deeply across eukaryotes and are used for different functional classes of genes. Our work adds to the increasing understanding that core promoters contribute significantly to the complexity of eukaryotic gene expression.
PMCID: PMC3020932  PMID: 21249180
25.  The Prevalence and Regulation of Antisense Transcripts in Schizosaccharomyces pombe 
PLoS ONE  2010;5(12):e15271.
A strand-specific transcriptome sequencing strategy, directional ligation sequencing or DeLi-seq, was employed to profile antisense transcriptome of Schizosaccharomyces pombe. Under both normal and heat shock conditions, we found that polyadenylated antisense transcripts are broadly expressed while distinct expression patterns were observed for protein-coding and non-coding loci. Dominant antisense expression is enriched in protein-coding genes involved in meiosis or stress response pathways. Detailed analyses further suggest that antisense transcripts are independently regulated with respect to their sense transcripts, and diverse mechanisms might be potentially involved in the biogenesis and degradation of antisense RNAs. Taken together, antisense transcription may have profound impacts on global gene regulation in S. pombe.
PMCID: PMC3004915  PMID: 21187966

