Background
Genome-wide comparisons of transcription factor binding sites in different species can be used to evaluate evolutionary constraints that shape gene regulatory circuits and to understand how the interaction between transcription factors shapes their binding landscapes over evolution.
Results
We have compared the PPARG binding landscapes in macrophages to investigate the evolutionary impact on PPARG binding diversity in mouse and humans for this important nuclear receptor. Of note, only 5% of the PPARG binding sites were shared between the two species. In contrast, at the gene level, PPARG target genes conserved between both species constitute more than 30% of the target genes regulated by PPARG ligand in human macrophages. Moreover, the majority of all PPARG binding sites (55–60%) in macrophages show co-occupancy of the lineage-specification factor PU.1 in both species. Exploring the evolutionary dynamics of PPARG binding sites, we observed that PU.1 co-binding to PPARG sites appears to be important for possible PPARG ancestral functions such as lipid metabolism. Thus we speculate that PU.1 may have guided utilization of these species-specific PPARG conserved binding sites in macrophages during evolution.
Conclusions
We propose a model in which PU.1 sites may have served as “anchor” loci for the formation of new and functionally relevant PPARG binding sites throughout evolution. As PU.1 is an essential factor in macrophage biology, such an evolutionary mechanism would allow for the establishment of relevant PPARG regulatory modules in a PU.1-dependent manner and yet permit for nuanced regulatory changes in individual species.
doi:10.1371/journal.pone.0048102
PMCID: PMC3485280
PMID: 23118933
Yao, Fei | Ariyaratne, Pramila N. | Hillmer, Axel M. | Lee, Wah Heng | Li, Guoliang | Teo, Audrey S. M. | Woo, Xing Yi | Zhang, Zhenshui | Chen, Jieqi P. | Poh, Wan Ting | Zawack, Kelson F. B. | Chan, Chee Seng | Leong, See Ting | Neo, Say Chuan | Choi, Poh Sum D. | Gao, Song | Nagarajan, Niranjan | Thoreau, Hervé | Shahab, Atif | Ruan, Xiaoan | Cacheux-Rataboul, Valère | Wei, Chia-Lin | Bourque, Guillaume | Sung, Wing-Kin | Liu, Edison T. | Ruan, Yijun | Aerts, Jan
Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10–20 kb and compared their characteristics with short insert (1 kb) libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.
doi:10.1371/journal.pone.0046152
PMCID: PMC3461012
PMID: 23029419
Handoko, Lusy | Xu, Han | Li, Guoliang | Ngan, Chew Yee | Chew, Elaine | Schnapp, Marie | Lee, Charlie Wah Heng | Ye, Chaopeng | Ping, Joanne Lim Hui | Mulawadi, Fabianus | Wong, Eleanor | Sheng, Jianpeng | Zhang, Yubo | Poh, Thompson | Chan, Chee Seng | Kunarso, Galih | Shahab, Atif | Bourque, Guillaume | Cacheux-Rataboul, Valere | Sung, Wing-Kin | Ruan, Yijun | Wei, Chia-Lin
Mammalian genomes are viewed as functional organizations that orchestrate spatial and temporal gene regulation. CTCF, the most characterized insulator-binding protein, has been implicated as a key genome organizer. Yet, little is known about CTCF-associated higher order chromatin structures at a global scale. Here, we applied Chromatin Interaction Analysis by Paired-End-Tag sequencing to elucidate the CTCF-chromatin interactome in pluripotent cells. From this analysis, 1,480 cis and 336 trans interacting loci were identified with high reproducibility and precision. Associating these chromatin interaction loci with their underlying epigenetic states, promoter activities, enhancer binding and nuclear lamina occupancy, we uncovered five distinct chromatin domains that suggest potential new models of CTCF function in chromatin organization and transcriptional control. Specifically, CTCF interactions demarcate chromatin-nuclear membrane attachments and influence proper gene expression through extensive crosstalk between promoters and regulatory elements. This highly complex nuclear organization offers insights towards the unifying principles governing genome plasticity and function.
doi:10.1038/ng.857
PMCID: PMC3436933
PMID: 21685913
insulator; enhancer; chromatin organization; epigenetic regulation; nuclear lamina
Khuong-Quang, Dong-Anh | Buczkowicz, Pawel | Rakopoulos, Patricia | Liu, Xiao-Yang | Fontebasso, Adam M. | Bouffet, Eric | Bartels, Ute | Albrecht, Steffen | Schwartzentruber, Jeremy | Letourneau, Louis | Bourgey, Mathieu | Bourque, Guillaume | Montpetit, Alexandre | Bourret, Genevieve | Lepage, Pierre | Fleming, Adam | Lichter, Peter | Kool, Marcel | von Deimling, Andreas | Sturm, Dominik | Korshunov, Andrey | Faury, Damien | Jones, David T. | Majewski, Jacek | Pfister, Stefan M. | Jabado, Nada | Hawkins, Cynthia
Pediatric glioblastomas (GBM) including diffuse intrinsic pontine gliomas (DIPG) are devastating brain tumors with no effective therapy. Here, we investigated clinical and biological impacts of histone H3.3 mutations. Forty-two DIPGs were tested for H3.3 mutations. Wild-type versus mutated (K27M-H3.3) subgroups were compared for HIST1H3B, IDH, ATRX and TP53 mutations, copy number alterations and clinical outcome. K27M-H3.3 occurred in 71 %, TP53 mutations in 77 % and ATRX mutations in 9 % of DIPGs. ATRX mutations were more frequent in older children (p < 0.0001). No G34V/R-H3.3, IDH1/2 or H3.1 mutations were identified. K27M-H3.3 DIPGs showed specific copy number changes, including all gains/amplifications of PDGFRA and MYC/PVT1 loci. Notably, all long-term survivors were H3.3 wild type and this group of patients had better overall survival. K27M-H3.3 mutation defines clinically and biologically distinct subgroups and is prevalent in DIPG, which will impact future therapeutic trial design. K27M- and G34V-H3.3 have location-based incidence (brainstem/cortex) and potentially play distinct roles in pediatric GBM pathogenesis. K27M-H3.3 is universally associated with short survival in DIPG, while patients wild-type for H3.3 show improved survival. Based on prognostic and therapeutic implications, our findings argue for H3.3-mutation testing at diagnosis, which should be rapidly integrated into the clinical decision-making algorithm, particularly in atypical DIPG.
Electronic supplementary material
The online version of this article (doi:10.1007/s00401-012-0998-0) contains supplementary material, which is available to authorized users.
doi:10.1007/s00401-012-0998-0
PMCID: PMC3422615
PMID: 22661320
DIPG; H3.3; ATRX; TP53; Survival; Targeted therapy
Background
Identifying DNA sequences (enhancers) that direct the precise spatial and temporal expression of developmental control genes remains a significant challenge in the annotation of vertebrate genomes. Locating these sequences, which in many cases lie at a great distance from the transcription start site, has been a major obstacle in deciphering gene regulation. Coupling of comparative genomics with functional validation to locate such regulatory elements has been a successful method in locating many such regulatory elements. But most of these studies looked either at a single gene only or the whole genome without focusing on any particular process. The pressing need is to integrate the tools of comparative genomics with knowledge of developmental biology to validate enhancers for developmental transcription factors in greater detail
Results
Our results show that near four different genes (nkx3.2, pax9, otx1b and foxa2) in zebrafish, only 20-30% of highly conserved DNA sequences can act as developmental enhancers irrespective of the tissue the gene expresses in. We find that some genes also have multiple conserved enhancers expressing in the same tissue at the same or different time points in development. We also located non-conserved enhancers for two of the genes (pax9 and otx1b). Our modified Bacterial artificial chromosome (BACs) studies for these 4 genes revealed that many of these enhancers work in a synergistic fashion, which cannot be captured by individual DNA constructs and are not conserved at the sequence level. Our detailed biochemical and transgenic analysis revealed Foxa1 binds to the otx1b non-conserved enhancer to direct its activity in forebrain and otic vesicle of zebrafish at 24 hpf.
Conclusion
Our results clearly indicate that high level of functional conservation of genes is not necessarily associated with sequence conservation of its regulatory elements. Moreover certain non conserved DNA elements might have role in gene regulation. The need is to bring together multiple approaches to bear upon individual genes to decipher all its regulatory elements.
doi:10.1186/1471-213X-11-63
PMCID: PMC3210094
PMID: 22011226
The formation of new transcription factor–binding sites (TFBSs) has a major impact on the evolution of gene regulatory networks. Clearly, single nucleotide mutations arising within genomic DNA can lead to the creation of TFBSs. Are molecular processes inducing single nucleotide mutations contributing equally to the creation of TFBSs? In the human genome, a spontaneous deamination of methylated cytosine in the context of CpG dinucleotides results in the creation of thymine (C → T), and this mutation has the highest rate among all base substitutions. CpG deamination has been ascribed a role in silencing of transposons and induction of variation in regional methylation. We have previously shown that CpG deamination created thousands of p53-binding sites within genomic sequences of Alu transposons. Interestingly, we have defined a ∼30 bp region in Alu sequence, which, depending on a pattern of CpG deamination, can be converted to functional p53-, PAX-6-, and Myc-binding sites. Here, we have studied single nucleotide mutational events leading to creation of TFBSs in promoters of human genes and in genomic regions bound by such key transcription factors as Oct4, NANOG, and c-Myc. We document that CpG deamination events can create TFBSs with much higher efficiency than other types of mutational events. Our findings add a new role to CpG methylation: We propose that deamination of methylated CpGs constitutes one of the evolutionary forces acting on mutational trajectories of TFBSs formation contributing to variability in gene regulation.
doi:10.1093/gbe/evr107
PMCID: PMC3228489
PMID: 22016335
CpG methylation; CpG deamination; evolution of transcription factor–binding sites; evolution of gene regulatory elements; Alu transposon
Lam, Siew Hong | Lee, Serene GP | Lin, Chin Y | Thomsen, Jane S | Fu, Pan Y | Murthy, Karuturi RK | Li, Haixia | Govindarajan, Kunde R | Nick, Lin CH | Bourque, Guillaume | Gong, Zhiyuan | Lufkin, Thomas | Liu, Edison T | Mathavan, Sinnakaruppan
Background
The zebrafish is recognized as a versatile cancer and drug screening model. However, it is not known whether the estrogen-responsive genes and signaling pathways that are involved in estrogen-dependent carcinogenesis and human cancer are operating in zebrafish. In order to determine the potential of zebrafish model for estrogen-related cancer research, we investigated the molecular conservation of estrogen responses operating in both zebrafish and human cancer cell lines.
Methods
Microarray experiment was performed on zebrafish exposed to estrogen (17β-estradiol; a classified carcinogen) and an anti-estrogen (ICI 182,780). Zebrafish estrogen-responsive genes sensitive to both estrogen and anti-estrogen were identified and validated using real-time PCR. Human homolog mapping and knowledge-based data mining were performed on zebrafish estrogen responsive genes followed by estrogen receptor binding site analysis and comparative transcriptome analysis with estrogen-responsive human cancer cell lines (MCF7, T47D and Ishikawa).
Results
Our transcriptome analysis captured multiple estrogen-responsive genes and signaling pathways that increased cell proliferation, promoted DNA damage and genome instability, and decreased tumor suppressing effects, suggesting a common mechanism for estrogen-induced carcinogenesis. Comparative analysis revealed a core set of conserved estrogen-responsive genes that demonstrate enrichment of estrogen receptor binding sites and cell cycle signaling pathways. Knowledge-based and network analysis led us to propose that the mechanism involving estrogen-activated estrogen receptor mediated down-regulation of human homolog HES1 followed by up-regulation cell cycle-related genes (human homologs E2F4, CDK2, CCNA, CCNB, CCNE), is highly conserved, and this mechanism may involve novel crosstalk with basal AHR. We also identified mitotic roles of polo-like kinase as a conserved signaling pathway with multiple entry points for estrogen regulation.
Conclusion
The findings demonstrate the use of zebrafish for characterizing estrogen-like environmental carcinogens and anti-estrogen drug screening. From an evolutionary perspective, our findings suggest that estrogen regulation of cell cycle is perhaps one of the earliest forms of steroidal-receptor controlled cellular processes. Our study provides first evidence of molecular conservation of estrogen-responsiveness between zebrafish and human cancer cell lines, hence demonstrating the potential of zebrafish for estrogen-related cancer research.
doi:10.1186/1755-8794-4-41
PMCID: PMC3114699
PMID: 21575170
zebrafish; microarray; estrogen; anti-estrogen ICI 182,780; estrogen-responsive genes; signaling pathways; carcinogenesis; human cancer cell lines; molecular conservation; model organism
Introduction
Octamer-binding transcription factor 4 (Oct4) is a master regulator of early mammalian development. Its expression begins from the oocyte stage, becomes restricted to the inner cell mass of the blastocyst and eventually remains only in primordial germ cells. Unearthing the interactions of Oct4 would provide insight into how this transcription factor is central to cell fate and stem cell pluripotency.
Methods
In the present study, affinity-tagged endogenous Oct4 cell lines were established via homologous recombination gene targeting in embryonic stem (ES) cells to express tagged Oct4. This allows tagged Oct4 to be expressed without altering the total Oct4 levels from their physiological levels.
Results
Modified ES cells remained pluripotent. However, when modified ES cells were tested for their functionality, cells with a large tag failed to produce viable homozygous mice. Use of a smaller tag resulted in mice with normal development, viability and fertility. This indicated that the choice of tags can affect the performance of Oct4. Also, different tags produce a different repertoire of Oct4 interactors.
Conclusions
Using a total of four different tags, we found 33 potential Oct4 interactors, of which 30 are novel. In addition to transcriptional regulation, the molecular function associated with these Oct4-associated proteins includes various other catalytic activities, suggesting that, aside from chromosome remodeling and transcriptional regulation, Oct4 function extends more widely to other essential cellular mechanisms. Our findings show that multiple purification approaches are needed to uncover a comprehensive Oct4 protein interaction network.
doi:10.1186/scrt67
PMCID: PMC3218817
PMID: 21569470
Chromatin immunoprecipitation combined with massively parallel sequencing methods (ChIP-seq) is becoming the standard approach to study interactions of transcription factors (TF) with genomic sequences. At the example of public STAT1 ChIP-seq data sets, we present novel approaches for the interpretation of ChIP-seq data.
We compare recently developed approaches to determine STAT1 binding sites from ChIP-seq data. Assessing the content of the established consensus sequence for STAT1 binding sites, we find that the usage of “negative control” ChIP-seq data fails to provide substantial advantages. We derive a single refined probabilistic model of STAT1 binding sequences from these ChIP-seq data. Contrary to previous claims, we find no evidence that STAT1 binds to multiple distinct motifs upon interferon-gamma stimulation in vivo. While a large majority of genomic sites with high ChIP-seq signal is associated with a nucleotide sequence ressembling a STAT1 binding site, only a very small subset of the over 5 million potential STAT1 binding sites in the human genome is covered by ChIP-seq data. Furthermore a surprisingly large fraction of the ChIP-seq signal (5%) is absorbed by a small family of repetitive sequences (MER41).
The observation of the binding of activated STAT1 protein to a specific repetitive element bolsters similar reports concerning p53 and other TFs, and strengthens the notion of an involvement of repeats in gene regulation. Incidentally MER41 are specific to primates, consequently, regulatory mechanisms in the IFN-STAT pathway might fundamentally differ between primates and rodents.
On a methodological aspect, the presence of large numbers of nearly identical binding sites in repetitive sequences may lead to wrong conclusions about intrinsic binding preferences of TF as illustrated by the spacing analysis STAT1 tandem motifs. Therefore, ChIP-seq data should be analyzed independently within repetitive and non-repetitive sequences.
doi:10.1371/journal.pone.0011425
PMCID: PMC2897888
PMID: 20625510
Fullwood, Melissa J. | Liu, Mei Hui | Pan, You Fu | Liu, Jun | Han, Xu | Mohamed, Yusoff Bin | Orlov, Yuriy L. | Velkov, Stoyan | Ho, Andrea | Mei, Poh Huay | Chew, Elaine G. Y. | Huang, Phillips Yao Hui | Welboren, Willem-Jan | Han, Yuyuan | Ooi, Hong-Sain | Ariyaratne, Pramila N. | Vega, Vinsensius B. | Luo, Yanquan | Tan, Peck Yean | Choy, Pei Ye | Wansa, K. D. Senali Abayratna | Zhao, Bing | Lim, Kar Sian | Leow, Shi Chi | Yow, Jit Sin | Joseph, Roy | Li, Haixia | Desai, Kartiki V. | Thomsen, Jane S. | Lee, Yew Kok | Karuturi, R. Krishna Murthy | Herve, Thoreau | Bourque, Guillaume | Stunnenberg, Hendrik G. | Ruan, Xiaoan | Cacheux-Rataboul, Valere | Sung, Wing-Kin | Liu, Edison T. | Wei, Chia-Lin | Cheung, Edwin | Ruan, Yijun
Genomes are organized into high-level 3-dimensional structures, and DNA elements separated by long genomic distances could functionally interact. Many transcription factors bind to regulatory DNA elements distant from gene promoters. While distal binding sites have been shown to regulate transcription by long-range chromatin interactions at a few loci, chromatin interactions and their impact on transcription regulation have not been investigated in a genome-wide manner. Therefore, we developed Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) for de novo detection of global chromatin interactions, and comprehensively mapped the chromatin interaction network bound by oestrogen receptor α (ERα) in the human genome. We found that most high-confidence remote ERα binding sites are anchored at gene promoters through long-range chromatin interactions, suggesting that ERα functions by extensive chromatin looping to bring genes together for coordinated transcriptional regulation. We propose that chromatin interactions constitute a primary mechanism for regulating transcription in mammalian genomes.
doi:10.1038/nature08497
PMCID: PMC2774924
PMID: 19890323
Our group produced the best predictions overall in the DREAM3 signaling response challenge, being tops by a substantial margin in the cytokine sub-challenge and nearly tied for best in the phosphoprotein sub-challenge. We achieved this success using a simple interpolation strategy. For each combination of a stimulus and inhibitor for which predictions were required, we had noted there were six other datasets using the same stimulus (but different inhibitor treatments) and six other datasets using the same inhibitor (but different stimuli). Therefore, for each treatment combination for which values were to be predicted, we calculated rank correlations for the data that were in common between the treatment combination and each of the 12 related combinations. The data from the 12 related combinations were then used to calculate missing values, weighting the contributions from each experiment based on the rank correlation coefficients. The success of this simple method suggests that the missing data were largely over-determined by similarities in the treatments. We offer some thoughts on the current state and future development of DREAM that are based on our success in this challenge, our success in the earlier DREAM2 transcription factor target challenge, and our experience as the data provider for the gene expression challenge in DREAM3.
doi:10.1371/journal.pone.0008417
PMCID: PMC2811179
PMID: 20126276
Background
A key goal of systems biology is to understand how genomewide mRNA expression levels are controlled by transcription factors (TFs) in a condition-specific fashion. TF activity is frequently modulated at the post-translational level through ligand binding, covalent modification, or changes in sub-cellular localization. In this paper, we demonstrate how prior information about regulatory network connectivity can be exploited to infer condition-specific TF activity as a hidden variable from the genomewide mRNA expression pattern in the yeast Saccharomyces cerevisiae.
Methodology/Principal Findings
We first validate experimentally that by scoring differential expression at the level of gene sets or “regulons” comprised of the putative targets of a TF, we can accurately predict modulation of TF activity at the post-translational level. Next, we create an interactive database of inferred activities for a large number of TFs across a large number of experimental conditions in S. cerevisiae. This allows us to perform TF-centric analysis of the yeast regulatory network.
Conclusions/Significance
We analyze the degree to which the mRNA expression level of each TF is predictive of its regulatory activity. We also organize TFs into “co-modulation networks” based on their inferred activity profile across conditions, and find that this reveals functional and mechanistic relationships. Finally, we present evidence that the PAC and rRPE motifs antagonize TBP-dependent regulation, and function as core promoter elements governed by the transcription regulator NC2. Regulon-based monitoring of TF activity modulation is a powerful tool for analyzing regulatory network function that should be applicable in other organisms. Tools and results are available online at http://bussemakerlab.org/RegulonProfiler/.
doi:10.1371/journal.pone.0003112
PMCID: PMC2518834
PMID: 18769540
Dinov, Ivo D. | Rubin, Daniel | Lorensen, William | Dugan, Jonathan | Ma, Jeff | Murphy, Shawn | Kirschner, Beth | Bug, William | Sherman, Michael | Floratos, Aris | Kennedy, David | Jagadish, H. V. | Schmidt, Jeanette | Athey, Brian | Califano, Andrea | Musen, Mark | Altman, Russ | Kikinis, Ron | Kohane, Isaac | Delp, Scott | Parker, D. Stott | Toga, Arthur W. | Bourque, Guillaume
The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.
doi:10.1371/journal.pone.0002265
PMCID: PMC2386255
PMID: 18509477
Networks of regulatory relations between transcription factors (TF) and their target genes (TG)- implemented through TF binding sites (TFBS)- are key features of biology. An idealized approach to solving such networks consists of starting from a consensus TFBS or a position weight matrix (PWM) to generate a high accuracy list of candidate TGs for biological validation. Developing and evaluating such approaches remains a formidable challenge in regulatory bioinformatics. We perform a benchmark study on 34 Drosophila TFs to assess existing TFBS and cis-regulatory module (CRM) detection methods, with a strong focus on the use of multiple genomes. Particularly, for CRM-modelling we investigate the addition of orthologous sites to a known PWM to construct phyloPWMs and we assess the added value of phylogenentic footprinting to predict contextual motifs around known TFBSs. For CRM-prediction, we compare motif conservation with network-level conservation approaches across multiple genomes. Choosing the optimal training and scoring strategies strongly enhances the performance of TG prediction for more than half of the tested TFs. Finally, we analyse a 35th TF, namely Eyeless, and find a significant overlap between predicted TGs and candidate TGs identified by microarray expression studies. In summary we identify several ways to optimize TF-specific TG predictions, some of which can be applied to all TFs, and others that can be applied only to particular TFs. The ability to model known TF-TG relations, together with the use of multiple genomes, results in a significant step forward in solving the architecture of gene regulatory networks.
doi:10.1371/journal.pone.0001115
PMCID: PMC2047340
PMID: 17973026
Background
Promoter-associated CpG islands (PCIs) mediate methylation-dependent gene silencing, yet tend to co-locate to transcriptionally active genes. To address this paradox, we used data mining to assess the behavior of PCI-positive (PCI+) genes in the human genome.
Results
PCI+ genes exhibit a bimodal distribution: (1) a ‘housekeeping-like’ subset characterized by higher GC content and lower intron length/number, and (2) a ‘pseudogene paralog’ subset characterized by lower GC content and higher intron length/number (p<0.001). These subsets are functionally distinguishable, with the former gene group characterized by higher expression levels and lower evolutionary rate (p<0.001). PCI-negative (PCI-) genes exhibit higher evolutionary rate and narrower expression breadth than PCI+ genes (p<0.001), consistent with more frequent tissue-specific inactivation.
Conclusions
Adaptive evolution of the human genome appears driven in part by declining transcription of a subset of PCI+ genes, predisposing to both CpG→TpA mutation and intron insertion. We propose a model of evolving biological complexity in which environmentally-selected gains or losses of PCI methylation respectively favor positive or negative selection, thus polarizing PCI+ gene structures around a genomic core of ancestral PCI- genes.
doi:10.1371/journal.pone.0000603
PMCID: PMC1904255
PMID: 17622348
Lin, Chin-Yo | Vega, Vinsensius B | Thomsen, Jane S | Zhang, Tao | Kong, Say Li | Xie, Min | Chiu, Kuo Ping | Lipovich, Leonard | Barnett, Daniel H | Stossi, Fabio | Yeo, Ailing | George, Joshy | Kuznetsov, Vladimir A | Lee, Yew Kok | Charn, Tze Howe | Palanisamy, Nallasivam | Miller, Lance D | Cheung, Edwin | Katzenellenbogen, Benita S | Ruan, Yijun | Bourque, Guillaume | Wei, Chia-Lin | Liu, Edison T | Kim, Stuart K
Using a chromatin immunoprecipitation-paired end diTag cloning and sequencing strategy, we mapped estrogen receptor α (ERα) binding sites in MCF-7 breast cancer cells. We identified 1,234 high confidence binding clusters of which 94% are projected to be bona fide ERα binding regions. Only 5% of the mapped estrogen receptor binding sites are located within 5 kb upstream of the transcriptional start sites of adjacent genes, regions containing the proximal promoters, whereas vast majority of the sites are mapped to intronic or distal locations (>5 kb from 5′ and 3′ ends of adjacent transcript), suggesting transcriptional regulatory mechanisms over significant physical distances. Of all the identified sites, 71% harbored putative full estrogen response elements (EREs), 25% bore ERE half sites, and only 4% had no recognizable ERE sequences. Genes in the vicinity of ERα binding sites were enriched for regulation by estradiol in MCF-7 cells, and their expression profiles in patient samples segregate ERα-positive from ERα-negative breast tumors. The expression dynamics of the genes adjacent to ERα binding sites suggest a direct induction of gene expression through binding to ERE-like sequences, whereas transcriptional repression by ERα appears to be through indirect mechanisms. Our analysis also indicates a number of candidate transcription factor binding sites adjacent to occupied EREs at frequencies much greater than by chance, including the previously reported FOXA1 sites, and demonstrate the potential involvement of one such putative adjacent factor, Sp1, in the global regulation of ERα target genes. Unexpectedly, we found that only 22%–24% of the bona fide human ERα binding sites were overlapping conserved regions in whole genome vertebrate alignments, which suggest limited conservation of functional binding sites. Taken together, this genome-scale analysis suggests complex but definable rules governing ERα binding and gene regulation.
Author Summary
Estrogen receptors (ERs) play key roles in facilitating the transcriptional effects of hormone functions in target tissues. To obtain a genome-wide view of ERα binding sites, we applied chromatin immunoprecipitation coupled with a cloning and sequencing strategy using chromatin immunoprecipitation pair end-tagging technology to map ERα binding sites in MCF-7 human breast cancer cells. We identified 1,234 high quality ERα binding sites in the human genome and demonstrated that the binding sites are frequently adjacent to genes significantly associated with breast cancer disease status and outcome. The mapping results also revealed that ERα can influence gene expression across distances of up to 100 kilobases or more, that genes that are induced or repressed utilize sites in different regions relative to the transcript (suggesting different mechanisms of action), and that ERα binding sites are only modestly conserved in evolution. Using computational approaches, we identified potential interactions with other transcription factor binding sites adjacent to the ERα binding elements. Taken together, these findings suggest complex but definable rules governing ERα binding and gene regulation and provide a valuable dataset for mapping the precise control nodes for one of the most important nuclear hormone receptors in breast cancer biology.
doi:10.1371/journal.pgen.0030087
PMCID: PMC1885282
PMID: 17542648
Vega, Vinsensius B | Lin, Chin-Yo | Lai, Koon Siew | Li Kong, Say | Xie, Min | Su, Xiaodi | Teh, Huey Fang | Thomsen, Jane S | Li Yeo, Ai | Sung, Wing Kin | Bourque, Guillaume | Liu, Edison T
Refinement of the functional human estrogen receptor binding site model using a multi-platform genome-wide approach reveals extended binding specificity signal.
Background
Transcription factor binding sites (TFBS) impart specificity to cellular transcriptional responses and have largely been defined by consensus motifs derived from a handful of validated sites. The low specificity of the computational predictions of TFBSs has been attributed to ubiquity of the motifs and the relaxed sequence requirements for binding. We posited that the inadequacy is due to limited input of empirically verified sites, and demonstrated a multiplatform approach to constructing a robust model.
Results
Using the TFBS for the estrogen receptor (ER)α (estrogen response element [ERE]) as a model system, we extracted EREs from multiple molecular and genomic platforms whose binding to ERα has been experimentally confirmed or rejected. In silico analyses revealed significant sequence information flanking the standard binding consensus, discriminating ERE-like sequences that bind ERα from those that are nonbinders. We extended the ERE consensus by three bases, bearing a terminal G at the third position 3' and an initiator C at the third position 5', which were further validated using surface plasmon resonance spectroscopy. Our functional human ERE prediction algorithm (h-ERE) outperformed existing predictive algorithms and produced fewer than 5% false negatives upon experimental validation.
Conclusion
Building upon a larger experimentally validated ERE set, the h-ERE algorithm is able to demarcate better the universe of ERE-like sequences that are potential ER binders. Only 14% of the predicted optimal binding sites were utilized under the experimental conditions employed, pointing to other selective criteria not related to EREs. Other factors, in addition to primary nucleotide sequence, will ultimately determine binding site selection.
doi:10.1186/gb-2006-7-9-r82
PMCID: PMC1794554
PMID: 16961928
Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes.
doi:10.1186/1479-7364-1-1-30
PMCID: PMC3525001
PMID: 15601531
genome evolution; synteny; mammals; ancestral genome