Infectious diseases are responsible for over 25% of deaths globally, but many more individuals are exposed to deadly pathogens. The outcome of infection results from a set of diverse factors including pathogen virulence factors, the environment, and the genetic make-up of the host. The completion of the human reference genome sequence in 2004 along with technological advances have tremendously accelerated and renovated the tools to study the genetic etiology of infectious diseases in humans and its best characterized mammalian model, the mouse. Advancements in mouse genomic resources have accelerated genome-wide functional approaches, such as gene-driven and phenotype-driven mutagenesis, bringing to the fore the use of mouse models that reproduce accurately many aspects of the pathogenesis of human infectious diseases. Treatment with the mutagen N-ethyl-N-nitrosourea (ENU) has become the most popular phenotype-driven approach. Our team and others have employed mouse ENU mutagenesis to identify host genes that directly impact susceptibility to pathogens of global significance. In this review, we first describe the strategies and tools used in mouse genetics to understand immunity to infection with special emphasis on chemical mutagenesis of the mouse germ-line together with current strategies to efficiently identify functional mutations using next generation sequencing. Then, we highlight illustrative examples of genes, proteins, and cellular signatures that have been revealed by ENU screens and have been shown to be involved in susceptibility or resistance to infectious diseases caused by parasites, bacteria, and viruses.
infectious diseases; ENU; immunity; mouse genetic models
Diffuse Intrinsic Pontine Glioma (DIPG) is a fatal brain cancer that arises in the brainstem of children with no effective treatment and near 100% fatality. The failure of most therapies can be attributed to the delicate location of these tumors and choosing therapies based on assumptions that DIPGs are molecularly similar to adult disease. Recent studies have unraveled the unique genetic make-up of this brain cancer with nearly 80% harboring a K27M-H3.3 or K27M-H3.1 mutation. However, DIPGs are still thought of as one disease with limited understanding of the genetic drivers of these tumors. To understand what drives DIPGs we integrated whole-genome-sequencing with methylation, expression and copy-number profiling, discovering that DIPGs are three molecularly distinct subgroups (H3-K27M, Silent, MYCN) and uncovering a novel recurrent activating mutation in the activin receptor ACVR1, in 20% of DIPGs. Mutations in ACVR1 were constitutively activating, leading to SMAD phosphorylation and increased expression of downstream activin signaling targets ID1 and ID2. Our results highlight distinct molecular subgroups and novel therapeutic targets for this incurable pediatric cancer.
PMID: 24705254 CAMSID: cams4215
DIPG; H3F3A; K27M-H3.3; ALT; ACVR1; MYCN; ID2; PDGFRA
Human embryonic stem cells (hESCs) harbour the ability to undergo lineage-specific differentiation into clinically relevant cell types. Transcription factors and epigenetic modifiers are known to play important roles in the maintenance of pluripotency of hESCs. However, little is known about regulation of pluripotency through splicing. In this study, we identify the spliceosome-associated factor SON as a factor essential for the maintenance of hESCs. Depletion of SON in hESCs results in the loss of pluripotency and cell death. Using genome-wide RNA profiling, we identified transcripts that are regulated by SON. Importantly, we confirmed that SON regulates the proper splicing of transcripts encoding for pluripotency regulators such as OCT4, PRDM14, E4F1 and MED24. Furthermore, we show that SON is bound to these transcripts in vivo. In summary, we connect a splicing-regulatory network for accurate transcript production to the maintenance of pluripotency and self-renewal of hESCs.
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily–transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV–derived sequences were activated in a cell type–specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type–specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.
Nearly half of the human genome is composed of repetitive sequences, most of which were derived from transposable elements that have replicated in the genome during the evolution of our species. There is growing evidence showing that some of these transposon-derived sequences have been a source of new binding sites for various mammalian transcription factors. Considering that previous studies were targeting only few transcription factors and cell types, a key question that remains is to what extent the transposable elements have contributed to human transcriptional networks. To systematically survey this contribution, we used datasets generated by the international Encyclopedia of DNA Elements (ENCODE) consortium, identifying the location of active regulatory elements in more than 40 distinct human cell types. Using this resource we measured the contribution of all classes of repetitive sequences and systematically characterized the impact that transposable elements have had on the human chromatin landscape. Our results demonstrate that transposon-derived sequences have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.
Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
An unexpected layer of complexity in the genomes of humans and other vertebrates lies in the abundance of genes that do not appear to encode proteins but produce a variety of non-coding RNAs. In particular, the human genome is currently predicted to contain 5,000–10,000 independent gene units generating long (>200 nucleotides) noncoding RNAs (lncRNAs). While there is growing evidence that a large fraction of these lncRNAs have cellular functions, notably to regulate protein-coding gene expression, almost nothing is known on the processes underlying the evolutionary origins and diversification of lncRNA genes. Here we show that transposable elements, through their capacity to move and spread in genomes in a lineage-specific fashion, as well as their ability to introduce regulatory sequences upon chromosomal insertion, represent a major force shaping the lncRNA repertoire of humans, mice, and zebrafish. Not only do TEs make up a substantial fraction of mature lncRNA transcripts, they are also enriched in the vicinity of lncRNA genes, where they frequently contribute to their transcriptional regulation. Through specific examples we provide evidence that some TE sequences embedded in lncRNAs are critical for the biogenesis of lncRNAs and likely important for their function.
Gastric cancer is the second highest cause of global cancer mortality. To explore the complete repertoire of somatic alterations in gastric cancer, we combined massively parallel short read and DNA paired-end tag sequencing to present the first whole-genome analysis of two gastric adenocarcinomas, one with chromosomal instability and the other with microsatellite instability.
Integrative analysis and de novo assemblies revealed the architecture of a wild-type KRAS amplification, a common driver event in gastric cancer. We discovered three distinct mutational signatures in gastric cancer - against a genome-wide backdrop of oxidative and microsatellite instability-related mutational signatures, we identified the first exome-specific mutational signature. Further characterization of the impact of these signatures by combining sequencing data from 40 complete gastric cancer exomes and targeted screening of an additional 94 independent gastric tumors uncovered ACVR2A, RPL22 and LMAN1 as recurrently mutated genes in microsatellite instability-positive gastric cancer and PAPPA as a recurrently mutated gene in TP53 wild-type gastric cancer.
These results highlight how whole-genome cancer sequencing can uncover information relevant to tissue-specific carcinogenesis that would otherwise be missed from exome-sequencing data.
Genome-wide comparisons of transcription factor binding sites in different species can be used to evaluate evolutionary constraints that shape gene regulatory circuits and to understand how the interaction between transcription factors shapes their binding landscapes over evolution.
We have compared the PPARG binding landscapes in macrophages to investigate the evolutionary impact on PPARG binding diversity in mouse and humans for this important nuclear receptor. Of note, only 5% of the PPARG binding sites were shared between the two species. In contrast, at the gene level, PPARG target genes conserved between both species constitute more than 30% of the target genes regulated by PPARG ligand in human macrophages. Moreover, the majority of all PPARG binding sites (55–60%) in macrophages show co-occupancy of the lineage-specification factor PU.1 in both species. Exploring the evolutionary dynamics of PPARG binding sites, we observed that PU.1 co-binding to PPARG sites appears to be important for possible PPARG ancestral functions such as lipid metabolism. Thus we speculate that PU.1 may have guided utilization of these species-specific PPARG conserved binding sites in macrophages during evolution.
We propose a model in which PU.1 sites may have served as “anchor” loci for the formation of new and functionally relevant PPARG binding sites throughout evolution. As PU.1 is an essential factor in macrophage biology, such an evolutionary mechanism would allow for the establishment of relevant PPARG regulatory modules in a PU.1-dependent manner and yet permit for nuanced regulatory changes in individual species.
Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10–20 kb and compared their characteristics with short insert (1 kb) libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.
Mammalian genomes are viewed as functional organizations that orchestrate spatial and temporal gene regulation. CTCF, the most characterized insulator-binding protein, has been implicated as a key genome organizer. Yet, little is known about CTCF-associated higher order chromatin structures at a global scale. Here, we applied Chromatin Interaction Analysis by Paired-End-Tag sequencing to elucidate the CTCF-chromatin interactome in pluripotent cells. From this analysis, 1,480 cis and 336 trans interacting loci were identified with high reproducibility and precision. Associating these chromatin interaction loci with their underlying epigenetic states, promoter activities, enhancer binding and nuclear lamina occupancy, we uncovered five distinct chromatin domains that suggest potential new models of CTCF function in chromatin organization and transcriptional control. Specifically, CTCF interactions demarcate chromatin-nuclear membrane attachments and influence proper gene expression through extensive crosstalk between promoters and regulatory elements. This highly complex nuclear organization offers insights towards the unifying principles governing genome plasticity and function.
insulator; enhancer; chromatin organization; epigenetic regulation; nuclear lamina
Pediatric glioblastomas (GBM) including diffuse intrinsic pontine gliomas (DIPG) are devastating brain tumors with no effective therapy. Here, we investigated clinical and biological impacts of histone H3.3 mutations. Forty-two DIPGs were tested for H3.3 mutations. Wild-type versus mutated (K27M-H3.3) subgroups were compared for HIST1H3B, IDH, ATRX and TP53 mutations, copy number alterations and clinical outcome. K27M-H3.3 occurred in 71 %, TP53 mutations in 77 % and ATRX mutations in 9 % of DIPGs. ATRX mutations were more frequent in older children (p < 0.0001). No G34V/R-H3.3, IDH1/2 or H3.1 mutations were identified. K27M-H3.3 DIPGs showed specific copy number changes, including all gains/amplifications of PDGFRA and MYC/PVT1 loci. Notably, all long-term survivors were H3.3 wild type and this group of patients had better overall survival. K27M-H3.3 mutation defines clinically and biologically distinct subgroups and is prevalent in DIPG, which will impact future therapeutic trial design. K27M- and G34V-H3.3 have location-based incidence (brainstem/cortex) and potentially play distinct roles in pediatric GBM pathogenesis. K27M-H3.3 is universally associated with short survival in DIPG, while patients wild-type for H3.3 show improved survival. Based on prognostic and therapeutic implications, our findings argue for H3.3-mutation testing at diagnosis, which should be rapidly integrated into the clinical decision-making algorithm, particularly in atypical DIPG.
Electronic supplementary material
The online version of this article (doi:10.1007/s00401-012-0998-0) contains supplementary material, which is available to authorized users.
DIPG; H3.3; ATRX; TP53; Survival; Targeted therapy
Identifying DNA sequences (enhancers) that direct the precise spatial and temporal expression of developmental control genes remains a significant challenge in the annotation of vertebrate genomes. Locating these sequences, which in many cases lie at a great distance from the transcription start site, has been a major obstacle in deciphering gene regulation. Coupling of comparative genomics with functional validation to locate such regulatory elements has been a successful method in locating many such regulatory elements. But most of these studies looked either at a single gene only or the whole genome without focusing on any particular process. The pressing need is to integrate the tools of comparative genomics with knowledge of developmental biology to validate enhancers for developmental transcription factors in greater detail
Our results show that near four different genes (nkx3.2, pax9, otx1b and foxa2) in zebrafish, only 20-30% of highly conserved DNA sequences can act as developmental enhancers irrespective of the tissue the gene expresses in. We find that some genes also have multiple conserved enhancers expressing in the same tissue at the same or different time points in development. We also located non-conserved enhancers for two of the genes (pax9 and otx1b). Our modified Bacterial artificial chromosome (BACs) studies for these 4 genes revealed that many of these enhancers work in a synergistic fashion, which cannot be captured by individual DNA constructs and are not conserved at the sequence level. Our detailed biochemical and transgenic analysis revealed Foxa1 binds to the otx1b non-conserved enhancer to direct its activity in forebrain and otic vesicle of zebrafish at 24 hpf.
Our results clearly indicate that high level of functional conservation of genes is not necessarily associated with sequence conservation of its regulatory elements. Moreover certain non conserved DNA elements might have role in gene regulation. The need is to bring together multiple approaches to bear upon individual genes to decipher all its regulatory elements.
The formation of new transcription factor–binding sites (TFBSs) has a major impact on the evolution of gene regulatory networks. Clearly, single nucleotide mutations arising within genomic DNA can lead to the creation of TFBSs. Are molecular processes inducing single nucleotide mutations contributing equally to the creation of TFBSs? In the human genome, a spontaneous deamination of methylated cytosine in the context of CpG dinucleotides results in the creation of thymine (C → T), and this mutation has the highest rate among all base substitutions. CpG deamination has been ascribed a role in silencing of transposons and induction of variation in regional methylation. We have previously shown that CpG deamination created thousands of p53-binding sites within genomic sequences of Alu transposons. Interestingly, we have defined a ∼30 bp region in Alu sequence, which, depending on a pattern of CpG deamination, can be converted to functional p53-, PAX-6-, and Myc-binding sites. Here, we have studied single nucleotide mutational events leading to creation of TFBSs in promoters of human genes and in genomic regions bound by such key transcription factors as Oct4, NANOG, and c-Myc. We document that CpG deamination events can create TFBSs with much higher efficiency than other types of mutational events. Our findings add a new role to CpG methylation: We propose that deamination of methylated CpGs constitutes one of the evolutionary forces acting on mutational trajectories of TFBSs formation contributing to variability in gene regulation.
CpG methylation; CpG deamination; evolution of transcription factor–binding sites; evolution of gene regulatory elements; Alu transposon
The zebrafish is recognized as a versatile cancer and drug screening model. However, it is not known whether the estrogen-responsive genes and signaling pathways that are involved in estrogen-dependent carcinogenesis and human cancer are operating in zebrafish. In order to determine the potential of zebrafish model for estrogen-related cancer research, we investigated the molecular conservation of estrogen responses operating in both zebrafish and human cancer cell lines.
Microarray experiment was performed on zebrafish exposed to estrogen (17β-estradiol; a classified carcinogen) and an anti-estrogen (ICI 182,780). Zebrafish estrogen-responsive genes sensitive to both estrogen and anti-estrogen were identified and validated using real-time PCR. Human homolog mapping and knowledge-based data mining were performed on zebrafish estrogen responsive genes followed by estrogen receptor binding site analysis and comparative transcriptome analysis with estrogen-responsive human cancer cell lines (MCF7, T47D and Ishikawa).
Our transcriptome analysis captured multiple estrogen-responsive genes and signaling pathways that increased cell proliferation, promoted DNA damage and genome instability, and decreased tumor suppressing effects, suggesting a common mechanism for estrogen-induced carcinogenesis. Comparative analysis revealed a core set of conserved estrogen-responsive genes that demonstrate enrichment of estrogen receptor binding sites and cell cycle signaling pathways. Knowledge-based and network analysis led us to propose that the mechanism involving estrogen-activated estrogen receptor mediated down-regulation of human homolog HES1 followed by up-regulation cell cycle-related genes (human homologs E2F4, CDK2, CCNA, CCNB, CCNE), is highly conserved, and this mechanism may involve novel crosstalk with basal AHR. We also identified mitotic roles of polo-like kinase as a conserved signaling pathway with multiple entry points for estrogen regulation.
The findings demonstrate the use of zebrafish for characterizing estrogen-like environmental carcinogens and anti-estrogen drug screening. From an evolutionary perspective, our findings suggest that estrogen regulation of cell cycle is perhaps one of the earliest forms of steroidal-receptor controlled cellular processes. Our study provides first evidence of molecular conservation of estrogen-responsiveness between zebrafish and human cancer cell lines, hence demonstrating the potential of zebrafish for estrogen-related cancer research.
zebrafish; microarray; estrogen; anti-estrogen ICI 182,780; estrogen-responsive genes; signaling pathways; carcinogenesis; human cancer cell lines; molecular conservation; model organism
Octamer-binding transcription factor 4 (Oct4) is a master regulator of early mammalian development. Its expression begins from the oocyte stage, becomes restricted to the inner cell mass of the blastocyst and eventually remains only in primordial germ cells. Unearthing the interactions of Oct4 would provide insight into how this transcription factor is central to cell fate and stem cell pluripotency.
In the present study, affinity-tagged endogenous Oct4 cell lines were established via homologous recombination gene targeting in embryonic stem (ES) cells to express tagged Oct4. This allows tagged Oct4 to be expressed without altering the total Oct4 levels from their physiological levels.
Modified ES cells remained pluripotent. However, when modified ES cells were tested for their functionality, cells with a large tag failed to produce viable homozygous mice. Use of a smaller tag resulted in mice with normal development, viability and fertility. This indicated that the choice of tags can affect the performance of Oct4. Also, different tags produce a different repertoire of Oct4 interactors.
Using a total of four different tags, we found 33 potential Oct4 interactors, of which 30 are novel. In addition to transcriptional regulation, the molecular function associated with these Oct4-associated proteins includes various other catalytic activities, suggesting that, aside from chromosome remodeling and transcriptional regulation, Oct4 function extends more widely to other essential cellular mechanisms. Our findings show that multiple purification approaches are needed to uncover a comprehensive Oct4 protein interaction network.
Genomes are organized into high-level 3-dimensional structures, and DNA elements separated by long genomic distances could functionally interact. Many transcription factors bind to regulatory DNA elements distant from gene promoters. While distal binding sites have been shown to regulate transcription by long-range chromatin interactions at a few loci, chromatin interactions and their impact on transcription regulation have not been investigated in a genome-wide manner. Therefore, we developed Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) for de novo detection of global chromatin interactions, and comprehensively mapped the chromatin interaction network bound by oestrogen receptor α (ERα) in the human genome. We found that most high-confidence remote ERα binding sites are anchored at gene promoters through long-range chromatin interactions, suggesting that ERα functions by extensive chromatin looping to bring genes together for coordinated transcriptional regulation. We propose that chromatin interactions constitute a primary mechanism for regulating transcription in mammalian genomes.
Our group produced the best predictions overall in the DREAM3 signaling response challenge, being tops by a substantial margin in the cytokine sub-challenge and nearly tied for best in the phosphoprotein sub-challenge. We achieved this success using a simple interpolation strategy. For each combination of a stimulus and inhibitor for which predictions were required, we had noted there were six other datasets using the same stimulus (but different inhibitor treatments) and six other datasets using the same inhibitor (but different stimuli). Therefore, for each treatment combination for which values were to be predicted, we calculated rank correlations for the data that were in common between the treatment combination and each of the 12 related combinations. The data from the 12 related combinations were then used to calculate missing values, weighting the contributions from each experiment based on the rank correlation coefficients. The success of this simple method suggests that the missing data were largely over-determined by similarities in the treatments. We offer some thoughts on the current state and future development of DREAM that are based on our success in this challenge, our success in the earlier DREAM2 transcription factor target challenge, and our experience as the data provider for the gene expression challenge in DREAM3.
Using a chromatin immunoprecipitation-paired end diTag cloning and sequencing strategy, we mapped estrogen receptor α (ERα) binding sites in MCF-7 breast cancer cells. We identified 1,234 high confidence binding clusters of which 94% are projected to be bona fide ERα binding regions. Only 5% of the mapped estrogen receptor binding sites are located within 5 kb upstream of the transcriptional start sites of adjacent genes, regions containing the proximal promoters, whereas vast majority of the sites are mapped to intronic or distal locations (>5 kb from 5′ and 3′ ends of adjacent transcript), suggesting transcriptional regulatory mechanisms over significant physical distances. Of all the identified sites, 71% harbored putative full estrogen response elements (EREs), 25% bore ERE half sites, and only 4% had no recognizable ERE sequences. Genes in the vicinity of ERα binding sites were enriched for regulation by estradiol in MCF-7 cells, and their expression profiles in patient samples segregate ERα-positive from ERα-negative breast tumors. The expression dynamics of the genes adjacent to ERα binding sites suggest a direct induction of gene expression through binding to ERE-like sequences, whereas transcriptional repression by ERα appears to be through indirect mechanisms. Our analysis also indicates a number of candidate transcription factor binding sites adjacent to occupied EREs at frequencies much greater than by chance, including the previously reported FOXA1 sites, and demonstrate the potential involvement of one such putative adjacent factor, Sp1, in the global regulation of ERα target genes. Unexpectedly, we found that only 22%–24% of the bona fide human ERα binding sites were overlapping conserved regions in whole genome vertebrate alignments, which suggest limited conservation of functional binding sites. Taken together, this genome-scale analysis suggests complex but definable rules governing ERα binding and gene regulation.
Estrogen receptors (ERs) play key roles in facilitating the transcriptional effects of hormone functions in target tissues. To obtain a genome-wide view of ERα binding sites, we applied chromatin immunoprecipitation coupled with a cloning and sequencing strategy using chromatin immunoprecipitation pair end-tagging technology to map ERα binding sites in MCF-7 human breast cancer cells. We identified 1,234 high quality ERα binding sites in the human genome and demonstrated that the binding sites are frequently adjacent to genes significantly associated with breast cancer disease status and outcome. The mapping results also revealed that ERα can influence gene expression across distances of up to 100 kilobases or more, that genes that are induced or repressed utilize sites in different regions relative to the transcript (suggesting different mechanisms of action), and that ERα binding sites are only modestly conserved in evolution. Using computational approaches, we identified potential interactions with other transcription factor binding sites adjacent to the ERα binding elements. Taken together, these findings suggest complex but definable rules governing ERα binding and gene regulation and provide a valuable dataset for mapping the precise control nodes for one of the most important nuclear hormone receptors in breast cancer biology.
Refinement of the functional human estrogen receptor binding site model using a multi-platform genome-wide approach reveals extended binding specificity signal.
Transcription factor binding sites (TFBS) impart specificity to cellular transcriptional responses and have largely been defined by consensus motifs derived from a handful of validated sites. The low specificity of the computational predictions of TFBSs has been attributed to ubiquity of the motifs and the relaxed sequence requirements for binding. We posited that the inadequacy is due to limited input of empirically verified sites, and demonstrated a multiplatform approach to constructing a robust model.
Using the TFBS for the estrogen receptor (ER)α (estrogen response element [ERE]) as a model system, we extracted EREs from multiple molecular and genomic platforms whose binding to ERα has been experimentally confirmed or rejected. In silico analyses revealed significant sequence information flanking the standard binding consensus, discriminating ERE-like sequences that bind ERα from those that are nonbinders. We extended the ERE consensus by three bases, bearing a terminal G at the third position 3' and an initiator C at the third position 5', which were further validated using surface plasmon resonance spectroscopy. Our functional human ERE prediction algorithm (h-ERE) outperformed existing predictive algorithms and produced fewer than 5% false negatives upon experimental validation.
Building upon a larger experimentally validated ERE set, the h-ERE algorithm is able to demarcate better the universe of ERE-like sequences that are potential ER binders. Only 14% of the predicted optimal binding sites were utilized under the experimental conditions employed, pointing to other selective criteria not related to EREs. Other factors, in addition to primary nucleotide sequence, will ultimately determine binding site selection.
Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes.
genome evolution; synteny; mammals; ancestral genome