Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
author:("He, simiao")
1.  A New Noncoding RNA Arranges Bacterial Chromosome Organization 
mBio  2015;6(4):e00998-15.
Repeated extragenic palindromes (REPs) in the enterobacterial genomes are usually composed of individual palindromic units separated by linker sequences. A total of 355 annotated REPs are distributed along the Escherichia coli genome. RNA sequence (RNAseq) analysis showed that almost 80% of the REPs in E. coli are transcribed. The DNA sequence of REP325 showed that it is a cluster of six repeats, each with two palindromic units capable of forming cruciform structures in supercoiled DNA. Here, we report that components of the REP325 element and at least one of its RNA products play a role in bacterial nucleoid DNA condensation. These RNA not only are present in the purified nucleoid but bind to the bacterial nucleoid-associated HU protein as revealed by RNA IP followed by microarray analysis (RIP-Chip) assays. Deletion of REP325 resulted in a dramatic increase of the nucleoid size as observed using transmission electron microscopy (TEM), and expression of one of the REP325 RNAs, nucleoid-associated noncoding RNA 4 (naRNA4), from a plasmid restored the wild-type condensed structure. Independently, chromosome conformation capture (3C) analysis demonstrated physical connections among various REP elements around the chromosome. These connections are dependent in some way upon the presence of HU and the REP325 element; deletion of HU genes and/or the REP325 element removed the connections. Finally, naRNA4 together with HU condensed DNA in vitro by connecting REP325 or other DNA sequences that contain cruciform structures in a pairwise manner as observed by atomic force microscopy (AFM). On the basis of our results, we propose molecular models to explain connections of remote cruciform structures mediated by HU and naRNA4.
Nucleoid organization in bacteria is being studied extensively, and several models have been proposed. However, the molecular nature of the structural organization is not well understood. Here we characterized the role of a novel nucleoid-associated noncoding RNA, naRNA4, in nucleoid structures both in vivo and in vitro. We propose models to explain how naRNA4 together with nucleoid-associated protein HU connects remote DNA elements for nucleoid condensation. We present the first evidence of a noncoding RNA together with a nucleoid-associated protein directly condensing nucleoid DNA.
PMCID: PMC4550694  PMID: 26307168
2.  GABPα Binding to Overlapping ETS and CRE DNA Motifs Is Enhanced by CREB1: Custom DNA Microarrays 
G3: Genes|Genomes|Genetics  2015;5(9):1909-1918.
To achieve proper spatiotemporal control of gene expression, transcription factors cooperatively assemble onto specific DNA sequences. The ETS domain protein monomer of GABPα and the B-ZIP domain protein dimer of CREB1 cooperatively bind DNA only when the ETS (C/GCGGAAGT) and CRE (GTGACGTCAC) motifs overlap precisely, producing the ETS↔CRE motif (C/GCGGAAGTGACGTCAC). We designed a Protein Binding Microarray (PBM) with 60-bp DNAs containing four identical sectors, each with 177,440 features that explore the cooperative interactions between GABPα and CREB1 upon binding the ETS↔CRE motif. The DNA sequences include all 15-mers of the form C/GCGGA—–CG—, the ETS↔CRE motif, and all single nucleotide polymorphisms (SNPs), and occurrences in the human and mouse genomes. CREB1 enhanced GABPα binding to the canonical ETS↔CRE motif CCGGAAGT two-fold, and up to 23-fold for several SNPs at the beginning and end of the ETS motif, which is suggestive of two separate and distinct allosteric mechanisms of cooperative binding. We show that the ETS-CRE array data can be used to identify regions likely cooperatively bound by GABPα and CREB1 in vivo, and demonstrate their ability to identify human genetic variants that might inhibit cooperative binding.
PMCID: PMC4555227  PMID: 26185160
ETS; CRE; GABPα; CREB1; cooperative DNA binding
3.  Nucleosomes are enriched at the boundaries of hypomethylated regions (HMRs) in mouse dermal fibroblasts and keratinocytes 
The interplay between epigenetic modifications and chromatin structure are integral to our understanding of genome function. Methylation of cytosine (5mC) at CG dinucleotides, traditionally associated with transcriptional repression, is the most highly studied chemical modification of DNA, occurring at over 70% of all CG dinucleotides in the genome. Hypomethylated regions (HMRs) often occur in CG islands (CGIs), however, they also occur outside of CGIs and function as cell-type specific enhancers. During the process of differentiation, reorganization of chromatin and nucleosome arrangement at regulatory regions is thought to occur in order for the establishment of cell-type specific transcriptional programs. However, the specifics regarding the organization of nucleosomes at HMRs and the potential mechanisms regulating nucleosome occupancy in these regions are unknown. Here, we have investigated nucleosome organization around hypomethylated regions (HMRs) identified in two mouse primary cells.
Microccocal nuclease (MNase) digested mononucleosomes from primary cultures of new-born female mouse dermal fibroblasts and keratinocytes were mapped and compared to the HMRs obtained from single base-pair resolution methylomes. In both cell types, we find that nucleosomes are enriched at HMR boundaries. In contrast to the nucleosomes found at boundaries of HMRs in CGIs, HMRs outside of CGIs are calculated to be preferentially bound by nucleosomes, with phased nucleosomes propagating into the methylated region. Nucleosomes are enriched at the tissue-specific HMRs (TS-HMR) boundaries in both cell types suggesting that nucleosome organization surrounding HMR boundaries is independent of methylation status. In addition, we find potential transcription factor (TF) binding sites (E-box motifs) enriched in non-CGI TS-HMR boundaries.
Our results show that intrinsic nucleosome occupancy score (INOS) positively correlate with the nucleosome organization surrounding non-CGI TS-HMRs, suggesting that DNA sequence plays a role in the establishment of HMRs in the genome. Since nucleosomes impact all processes involving the genome, our results provide a link between epigenetic modifications, chromatin structure, and regulatory function.
Electronic supplementary material
The online version of this article (doi:10.1186/1756-8935-7-34) contains supplementary material, which is available to authorized users.
PMCID: PMC4265496  PMID: 25506399
CG methylation; Hypomethylated regions; HMR; Nucleosomes; Epigenomics; Keratinocytes; Fibroblasts
4.  High-resolution genome-wide DNA methylation maps of mouse primary female dermal fibroblasts and keratinocytes 
Genome-wide DNA methylation at a single nucleotide resolution in different primary cells of the mammalian genome helps to determine the characteristics and functions of tissue-specific hypomethylated regions (TS-HMRs). We determined genome-wide cytosine methylation maps at 91X and 36X coverage of newborn female mouse primary dermal fibroblasts and keratinocytes and compared with mRNA-seq gene expression data.
These high coverage methylation maps were used to identify HMRs in both cell types. A total of 2.91% of the genome are in keratinocyte HMRs, and 2.15% of the genome are in fibroblast HMRs with 1.75% being common. Half of the TS-HMRs are extensions of common HMRs, and the remaining are unique TS-HMRs. Four levels of CG methylation are observed: 1) total unmethylation for CG dinucleotides in HMRs in CGIs that are active in all tissues; 2) 10% to 40% methylation for TS-HMRs; 3) 60% methylation for TS-HMRs in cells types where they are not in HMRs; and 4) 70% methylation for the nonfunctioning part of the genome. SINE elements are depleted inside the TS-HMRs, while highly enriched in the surrounding regions. Hypomethylation at the last exon shows gene repression, while demethylation toward the gene body positively correlates with gene expression. The overlapping HMRs have a more complex relationship with gene expression. The common HMRs and TS-HMRs are each enriched for distinct Transcription Factor Binding Sites (TFBS). C/EBPβ binds to methylated regions outside of HMRs while CTCF prefers to bind in HMRs, highlighting these two parts of the genome and their potential interactions.
Keratinocytes and fibroblasts are of epithelial and mesenchymal origin. High-resolution methylation maps in these two cell types can be used as reference methylomes for analyzing epigenetic mechanisms in several diseases including cancer.
Please see related article at the following link:
Electronic supplementary material
The online version of this article (doi:10.1186/1756-8935-7-35) contains supplementary material, which is available to authorized users.
PMCID: PMC4333159  PMID: 25699092
CG methylation; Hypomethylated regions; HMR; Methylome; CTCF; C/EBPβ; Keratinocytes; Fibroblasts
5.  A Single-Nucleotide Polymorphism of Human Neuropeptide S Gene Originated from Europe Shows Decreased Bioactivity 
PLoS ONE  2013;8(12):e83009.
Using accumulating SNP (Single-Nucleotide Polymorphism) data, we performed a genome-wide search for polypeptide hormone ligands showing changes in the mature regions to elucidate genotype/phenotype diversity among various human populations. Neuropeptide S (NPS), a brain peptide hormone highly conserved in vertebrates, has diverse physiological effects on anxiety, fear, hyperactivity, food intake, and sleeping time through its cognate receptor-NPSR. Here, we report a SNP rs4751440 (L6-NPS) causing non-synonymous substitution on the 6th position (V to L) of the NPS mature peptide region. L6-NPS has a higher allele frequency in Europeans than other populations and probably originated from European ancestors ∼25,000 yrs ago based on haplotype analysis and Approximate Bayesian Computation. Functional analyses indicate that L6-NPS exhibits a significant lower bioactivity than the wild type NPS, with ∼20-fold higher EC50 values in the stimulation of NPSR. Additional evolutionary and mutagenesis studies further demonstrate the importance of the valine residue in the 6th position for NPS functions. Given the known physiological roles of NPS receptor in inflammatory bowel diseases, asthma pathogenesis, macrophage immune responses, and brain functions, our study provides the basis to elucidate NPS evolution and signaling diversity among human populations.
PMCID: PMC3873911  PMID: 24386135
6.  Contribution of nucleosome binding preferences and co-occurring DNA sequences to transcription factor binding 
BMC Genomics  2013;14:428.
Chromatin plays a critical role in regulating transcription factors (TFs) binding to their canonical transcription factor binding sites (TFBS). Recent studies in vertebrates show that many TFs preferentially bind to genomic regions that are well bound by nucleosomes in vitro. Co-occurring secondary motifs sometimes correlated with functional TFBS.
We used a logistic regression to evaluate how well the propensity for nucleosome binding and co-occurrence of a secondary motif identify which canonical motifs are bound in vivo. We used ChIP-seq data for three transcription factors binding to their canonical motifs: c-Jun binding the AP-1 motif (TGAC/GTCA), GR (glucocorticoid receptor) binding the GR motif (G-ACA---T/CGT-C), and Hoxa2 (homeobox a2) binding the Pbx (Pre-B-cell leukemia homeobox) motif (TGATTGAT). For all canonical TFBS in the mouse genome, we calculated intrinsic nucleosome occupancy scores (INOS) for its surrounding 150-bps DNA and examined the relationship with in vivo TF binding. In mouse mammary 3134 cells, c-Jun and GR proteins preferentially bound regions calculated to be well-bound by nucleosomes in vitro with the canonical AP-1 and GR motifs themselves contributing to the high INOS. Functional GR motifs are enriched for AP-1 motifs if they are within a nucleosome-sized 150-bps region. GR and Hoxa2 also bind motifs with low INOS, perhaps indicating a different mechanism of action.
Our analysis quantified the contribution of INOS and co-occurring sequence to the identification of functional canonical motifs in the genome. This analysis revealed an inherent competition between some TFs and nucleosomes for binding canonical TFBS. GR and c-Jun cooperate if they are within 150-bps. Binding of Hoxa2 and a fraction of GR to motifs with low INOS values suggesting they are not in competition with nucleosomes and may function using different mechanisms.
PMCID: PMC3700821  PMID: 23805837
TFBS; Nucleosome; GR; c-Jun
7.  Overlapping ETS and CRE Motifs (G/CCGGAAGTGACGTCA) Preferentially Bound by GABPα and CREB Proteins 
G3: Genes|Genomes|Genetics  2012;2(10):1243-1256.
Previously, we identified 8-bps long DNA sequences (8-mers) that localize in human proximal promoters and grouped them into known transcription factor binding sites (TFBS). We now examine split 8-mers consisting of two 4-mers separated by 1-bp to 30-bps (X4-N1-30-X4) to identify pairs of TFBS that localize in proximal promoters at a precise distance. These include two overlapping TFBS: the ETS⇔ETS motif (C/GCCGGAAGCGGAA) and the ETS⇔CRE motif (C/GCGGAAGTGACGTCAC). The nucleotides in bold are part of both TFBS. Molecular modeling shows that the ETS⇔CRE motif can be bound simultaneously by both the ETS and the B-ZIP domains without protein-protein clashes. The electrophoretic mobility shift assay (EMSA) shows that the ETS protein GABPα and the B-ZIP protein CREB preferentially bind to the ETS⇔CRE motif only when the two TFBS overlap precisely. In contrast, the ETS domain of ETV5 and CREB interfere with each other for binding the ETS⇔CRE. The 11-mer (CGGAAGTGACG), the conserved part of the ETS⇔CRE motif, occurs 226 times in the human genome and 83% are in known regulatory regions. In vivo GABPα and CREB ChIP-seq peaks identified the ETS⇔CRE as the most enriched motif occurring in promoters of genes involved in mRNA processing, cellular catabolic processes, and stress response, suggesting that a specific class of genes is regulated by this composite motif.
PMCID: PMC3464117  PMID: 23050235
proximal promoters; transcription factor binding sites; co-localization; transcriptional start site; EMSA
8.  Systematic evaluation of genome-wide methylated DNA enrichment using a CpG island array 
BMC Genomics  2011;12:10.
Recent progress in high-throughput technologies has greatly contributed to the development of DNA methylation profiling. Although there are several reports that describe methylome detection of whole genome bisulfite sequencing, the high cost and heavy demand on bioinformatics analysis prevents its extensive application. Thus, current strategies for the study of mammalian DNA methylomes is still based primarily on genome-wide methylated DNA enrichment combined with DNA microarray detection or sequencing. Methylated DNA enrichment is a key step in a microarray based genome-wide methylation profiling study, and even for future high-throughput sequencing based methylome analysis.
In order to evaluate the sensitivity and accuracy of methylated DNA enrichment, we investigated and optimized a number of important parameters to improve the performance of several enrichment assays, including differential methylation hybridization (DMH), microarray-based methylation assessment of single samples (MMASS), and methylated DNA immunoprecipitation (MeDIP). With advantages and disadvantages unique to each approach, we found that assays based on methylation-sensitive enzyme digestion and those based on immunoprecipitation detected different methylated DNA fragments, indicating that they are complementary in their relative ability to detect methylation differences.
Our study provides the first comprehensive evaluation for widely used methodologies for methylated DNA enrichment, and could be helpful for developing a cost effective approach for DNA methylation profiling.
PMCID: PMC3023747  PMID: 21211017
9.  MethyCancer: the database of human DNA methylation and cancer 
Nucleic Acids Research  2007;36(Database issue):D836-D841.
Cancer is ranked as one of the top killers in all human diseases and continues to have a devastating effect on the population around the globe. Current research efforts are aiming to accelerate our understanding of the molecular basis of cancer and develop effective means for cancer diagnostics, treatment and prognosis. An altered pattern of epigenetic modifications, most importantly DNA methylation events, plays a critical role in tumorigenesis through regulating oncogene activation, tumor suppressor gene silencing and chromosomal instability. To study interplay of DNA methylation, gene expression and cancer, we developed a publicly accessible database for human DNA Methylation and Cancer (MethyCancer, MethyCancer hosts both highly integrated data of DNA methylation, cancer-related gene, mutation and cancer information from public resources, and the CpG Island (CGI) clones derived from our large-scale sequencing. Interconnections between different data types were analyzed and presented. Furthermore, a powerful search tool is developed to provide user-friendly access to all the data and data connections. A graphical MethyView shows DNA methylation in context of genomics and genetics data facilitating the research in cancer to understand genetic and epigenetic mechanisms that make dramatic changes in gene expression of tumor cells.
PMCID: PMC2238864  PMID: 17890243
10.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics 
Nucleic Acids Research  2004;32(Database issue):D377-D382.
Rice is a major food staple for the world’s population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate the application of the rice genomic information and to provide a foundation for functional and evolutionary studies of other important cereal crops, we implemented our Rice Information System (BGI-RIS), the most up-to-date integrated information resource as well as a workbench for comparative genomic analysis. In addition to comprehensive data from Oryza sativa L. ssp. indica sequenced by BGI, BGI-RIS also hosts carefully curated genome information from Oryza sativa L. ssp. japonica and EST sequences available from other cereal crops. In this resource, sequence contigs of indica (93-11) have been further assembled into Mbp-sized scaffolds and anchored onto the rice chromosomes referenced to physical/genetic markers, cDNAs and BAC-end sequences. We have annotated the rice genomes for gene content, repetitive elements, gene duplications (tandem and segmental) and single nucleotide polymorphisms between rice subspecies. Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (
PMCID: PMC308819  PMID: 14681438
11.  The Genomes of Oryza sativa: A History of Duplications 
Yu, Jun | Wang, Jun | Lin, Wei | Li, Songgang | Li, Heng | Zhou, Jun | Ni, Peixiang | Dong, Wei | Hu, Songnian | Zeng, Changqing | Zhang, Jianguo | Zhang, Yong | Li, Ruiqiang | Xu, Zuyuan | Li, Shengting | Li, Xianran | Zheng, Hongkun | Cong, Lijuan | Lin, Liang | Yin, Jianning | Geng, Jianing | Li, Guangyuan | Shi, Jianping | Liu, Juan | Lv, Hong | Li, Jun | Wang, Jing | Deng, Yajun | Ran, Longhua | Shi, Xiaoli | Wang, Xiyin | Wu, Qingfa | Li, Changfeng | Ren, Xiaoyu | Wang, Jingqiang | Wang, Xiaoling | Li, Dawei | Liu, Dongyuan | Zhang, Xiaowei | Ji, Zhendong | Zhao, Wenming | Sun, Yongqiao | Zhang, Zhenpeng | Bao, Jingyue | Han, Yujun | Dong, Lingli | Ji, Jia | Chen, Peng | Wu, Shuming | Liu, Jinsong | Xiao, Ying | Bu, Dongbo | Tan, Jianlong | Yang, Li | Ye, Chen | Zhang, Jingfen | Xu, Jingyi | Zhou, Yan | Yu, Yingpu | Zhang, Bing | Zhuang, Shulin | Wei, Haibin | Liu, Bin | Lei, Meng | Yu, Hong | Li, Yuanzhe | Xu, Hao | Wei, Shulin | He, Ximiao | Fang, Lijun | Zhang, Zengjin | Zhang, Yunze | Huang, Xiangang | Su, Zhixi | Tong, Wei | Li, Jinhong | Tong, Zongzhong | Li, Shuangli | Ye, Jia | Wang, Lishun | Fang, Lin | Lei, Tingting | Chen, Chen | Chen, Huan | Xu, Zhao | Li, Haihong | Huang, Haiyan | Zhang, Feng | Xu, Huayong | Li, Na | Zhao, Caifeng | Li, Shuting | Dong, Lijun | Huang, Yanqing | Li, Long | Xi, Yan | Qi, Qiuhui | Li, Wenjie | Zhang, Bo | Hu, Wei | Zhang, Yanling | Tian, Xiangjun | Jiao, Yongzhi | Liang, Xiaohu | Jin, Jiao | Gao, Lei | Zheng, Weimou | Hao, Bailin | Liu, Siqi | Wang, Wen | Yuan, Longping | Cao, Mengliang | McDermott, Jason | Samudrala, Ram | Wang, Jian | Wong, Gane Ka-Shu | Yang, Huanming
PLoS Biology  2005;3(2):e38.
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes
PMCID: PMC546038  PMID: 15685292

Results 1-11 (11)