1.  Regulation of pluripotency and self-renewal of ES cells through epigenetic-threshold modulation and mRNA pruning 
Cell  2012;151(3):576-589.
ES cell pluripotency requires bivalent epigenetic modifications of key developmental genes regulated by various transcription factors and chromatin modifying enzymes. How these factors coordinate with one another to maintain the bivalent chromatin state so that ES cells can undergo rapid self-renewal while retaining pluripotency is poorly understood. We report that Utf1, a target of Oct4 and Sox2, is a bivalent chromatin component that buffers poised states of bivalent genes. By limiting PRC2 loading and Histone 3 lysine-27 trimethylation, Utf1 sets proper activation thresholds for bivalent genes. It also promotes nuclear tagging of mRNAs transcribed from insufficiently silenced bivalent genes for cytoplasmic degradation through mRNA de-capping. These opposing functions of Utf1 promote coordinated differentiation. The mRNA degradation function also ensures rapid cell proliferation by blocking the Myc-Arf feedback control. Thus, Utf1 couples the core pluripotency factors with Myc and PRC2 networks to promote the pluripotency and proliferation of ESCs.
PMCID: PMC3575637  PMID: 23101626
Utf1; PRC2; Myc; bivalency; pluripotency; self-renewal; ES cells; epigenetics; mRNA degradation; differentiation
2.  Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes 
BMC Genomics  2011;12:361.
Shine-Dalgarno (SD) signal has long been viewed as the dominant translation initiation signal in prokaryotes. Recently, leaderless genes, which lack 5'-untranslated regions (5'-UTR) on their mRNAs, have been shown abundant in archaea. However, current large-scale in silico analyses on initiation mechanisms in bacteria are mainly based on the SD-led initiation way, other than the leaderless one. The study of leaderless genes in bacteria remains open, which causes uncertain understanding of translation initiation mechanisms for prokaryotes.
Here, we study signals in translation initiation regions of all genes over 953 bacterial and 72 archaeal genomes, then make an effort to construct an evolutionary scenario in view of leaderless genes in bacteria. With an algorithm designed to identify multi-signal in upstream regions of genes for a genome, we classify all genes into SD-led, TA-led and atypical genes according to the category of the most probable signal in their upstream sequences. Particularly, occurrence of TA-like signals about 10 bp upstream to translation initiation site (TIS) in bacteria most probably means leaderless genes.
Our analysis reveals that leaderless genes are totally widespread, although not dominant, in a variety of bacteria. Especially for Actinobacteria and Deinococcus-Thermus, more than twenty percent of genes are leaderless. Analyzed in closely related bacterial genomes, our results imply that the change of translation initiation mechanisms, which happens between the genes deriving from a common ancestor, is linearly dependent on the phylogenetic relationship. Analysis on the macroevolution of leaderless genes further shows that the proportion of leaderless genes in bacteria has a decreasing trend in evolution.
PMCID: PMC3160421  PMID: 21749696
3.  Computational evaluation of TIS annotation for prokaryotic genomes 
BMC Bioinformatics  2008;9:160.
Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks.
Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment of the reliability of TIS annotations for any prokaryotic genome. The method consists of modeling a positional weight matrix (PWM) of aligned sequences around predicted TISs in terms of a linear combination of three elementary PWMs, one for true TIS and the two others for false TISs. The three elementary PWMs are obtained using a reference set with highly reliable TIS predictions. A generalized least square estimator determines the weighting of the true TIS in the observed PWM, from which the accuracy of the prediction is derived. The validity of the method and the extent of the limitation of the assumptions are explicitly addressed by testing on experimentally verified TISs with variable accuracy of the reference sets. The method is applied to estimate the accuracy of TIS annotations that are provided on public databases such as RefSeq and ProTISA and by programs such as EasyGene, GeneMarkS, Glimmer 3 and TiCo. It is shown that RefSeq's TIS prediction is significantly less accurate than two recent predictors, Tico and ProTISA. With convincing proofs, we show two general preferential biases in the RefSeq annotation, i.e. over-annotating the longest open reading frame (LORF) and under-annotating ATG start codon. Finally, we have established a new TIS database, SupTISA, based on the best prediction of all the predictors; SupTISA has achieved an average accuracy of 92% over all 532 complete genomes.
Large-scale computational evaluation of TIS annotation has been achieved. A new TIS database much better than RefSeq has been constructed, and it provides a valuable resource for further TIS studies.
PMCID: PMC2362131  PMID: 18366730
4.  ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes 
Nucleic Acids Research  2007;36(Database issue):D114-D119.
Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous genes. Moreover, by combining the predictions from our recently developed TIS post-processor, ProTISA provides a refined annotation for the public database RefSeq. Furthermore, the database annotates the potential regulatory signals associated with translation initiation at the TIS upstream region. As of July 2007, ProTISA includes 440 microbial genomes with more than 390 000 confirmed TISs. The database is available at
PMCID: PMC2238952  PMID: 17942412

