Nucleosomes are an essential component of eukaryotic chromosomes. The impact of nucleosomes is seen not just on processes that directly access the genome such as transcription, but also on an evolutionary timescale. Recent studies in a number of organisms have provided high-resolution maps of nucleosomes throughout the genome. Computational analysis, in conjunction with many other kinds of data, has shed light on several aspects of nucleosome biology. Nucleosomes are positioned by several means, including intrinsic sequence biases, by stacking against a fixed barrier, by DNA-binding proteins and by chromatin remodelers. These studies underscore the critical organizational role of nucleosomes in all eukaryotic genomes. Here, I review recent genomic studies that shed light on the determinants of nucleosome positioning and their impact on the genome.
Nucleosome; chromatin; remodeling; epigenetic; genome packaging
The transition of mammalian cells from quiescence to proliferation is accompanied by the differential expression of several microRNAs (miRNAs) and transcription factors. However, the interplay between transcription factors and miRNAs in modulating gene regulatory networks involved in human cell proliferation is largely unknown. Here we show that the miRNA miR-22 promotes proliferation in primary human cells, and through a combination of Argonaute-2 immunoprecipitation and reporter assays, we identified multiple novel targets of miR-22, including several cell-cycle arrest genes that mediate the effects of the tumor-suppressor p53. In addition, we found that miR-22 suppresses interferon gene expression by directly targeting high mobility group box-1 and interferon regulatory factor (IRF)-5, preventing activation of IRF3 and NF-κB, which are activators of interferon genes. The expression of interferon genes is elevated in quiescent cells and their expression is inhibitory for cell proliferation. In addition, we find that miR-22 is activated by the transcription factor Myc when quiescent cells enter proliferation and that miR-22 inhibits the Myc transcriptional repressor MXD4, mediating a feed-forward loop to elevate Myc expression levels. Our results implicate miR-22 in downregulating the anti-proliferative p53 and interferon pathways and reveal a new transcription factor–miRNA network that regulates the transition of primary human cells from quiescence to proliferation.
Single nucleotide polymorphisms (SNPs) have been associated with many aspects of human development and disease, and many non-coding SNPs associated with disease risk are presumed to affect gene regulation. We have previously shown that SNPs within transcription factor binding sites can affect transcription factor binding in an allele-specific and heritable manner. However, such analysis has relied on prior whole-genome genotypes provided by large external projects such as HapMap and the 1000 Genomes Project. This requirement limits the study of allele-specific effects of SNPs in primary patient samples from diseases of interest, where complete genotypes are not readily available.
In this study, we show that we are able to identify SNPs de novo and accurately from ChIP-seq data generated in the ENCODE Project. Our de novo identified SNPs from ChIP-seq data are highly concordant with published genotypes. Independent experimental verification of more than 100 sites estimates our false discovery rate at less than 5%. Analysis of transcription factor binding at de novo identified SNPs revealed widespread heritable allele-specific binding, confirming previous observations. SNPs identified from ChIP-seq datasets were significantly enriched for disease-associated variants, and we identified dozens of allele-specific binding events in non-coding regions that could distinguish between disease and normal haplotypes.
Our approach combines SNP discovery, genotyping and allele-specific analysis, but is selectively focused on functional regulatory elements occupied by transcription factors or epigenetic marks, and will therefore be valuable for identifying the functional regulatory consequences of non-coding SNPs in primary disease samples.
SNPs; Transcription factors; ChIP-seq; Genotyping; Allele-specific
Next-generation sequencing-based assays to detect gene regulatory elements are enabling the analysis of individual-to-individual and allele-specific variation of chromatin status and transcription factor binding in humans. Recently, a number of studies have explored this area, using lymphoblastoid cell lines. Around 10% of chromatin sites show either individual-level differences or allele-specific behavior. Future studies are likely to be limited by cell line accessibility, meaning that white-bloodcell-based studies are likely to continue to be the main source of samples. A detailed understanding of the relationship between normal genetic variation and chromatin variation can shed light on how polymorphisms in non-coding regions in the human genome might underlie phenotypic variation and disease.
The E2F family of transcription factors has important roles in cell cycle progression. E2F4 is an E2F family member that has been proposed to be primarily a repressor of transcription, but the scope of its binding activity and functions in transcriptional regulation is not fully known. We used ChIP sequencing (ChIP-seq) to identify around 16 000 E2F4 binding sites which potentially regulate 7346 downstream target genes with wide-ranging functions in DNA repair, cell cycle regulation, apoptosis, and other processes. While half of all E2F4 binding sites (56%) occurred near transcription start sites (TSSs), ∼20% of sites occurred more than 20 kb away from any annotated TSS. These distal sites showed histone modifications suggesting that E2F4 may function as a long-range regulator, which we confirmed by functional experimental assays on a subset. Overexpression of E2F4 and its transcriptional cofactors of the retinoblastoma (Rb) family and its binding partner DP-1 revealed that E2F4 acts as an activator as well as a repressor. E2F4 binding sites also occurred near regulatory elements for miRNAs such as let-7a and mir-17, suggestive of regulation of miRNAs by E2F4. Taken together, our genome-wide analysis provided evidence of versatile roles of E2F4 and insights into its functions.
The extent to which variation in chromatin structure and transcription factor binding may influence gene expression, and thus underlie or contribute to variation in phenotype, is unknown. To address this question, we cataloged both individual-to-individual variation and differences between homologous chromosomes within the same individual (allele-specific variation) in chromatin structure and transcription factor binding in lymphoblastoid cells derived from individuals of geographically diverse ancestry. Ten percent of active chromatin sites were individual-specific; a similar proportion were allele-specific. Both individual-specific and allele-specific sites were commonly transmitted from parent to child, which suggests that they are heritable features of the human genome. Our study shows that heritable chromatin status and transcription factor binding differ as a result of genetic variation and may underlie phenotypic variation in humans.
ArrayPlex is a software package that centrally provides a large number of flexible toolsets useful for functional genomics.
ArrayPlex is a software package that centrally provides a large number of flexible toolsets useful for functional genomics, including microarray data storage, quality assessments, data visualization, gene annotation retrieval, statistical tests, genomic sequence retrieval and motif analysis. It uses a client-server architecture based on open source components, provides graphical, command-line, and programmatic access to all needed resources, and is extensible by virtue of a documented application programming interface. ArrayPlex is available at .
Although chromatin structure is known to affect transcriptional activity, it is not clear how broadly patterns of changes in histone modifications and nucleosome occupancy affect the dynamic regulation of transcription in response to perturbations. The identity and role of chromatin remodelers that mediate some of these changes are also unclear. Here, we performed temporal genome-wide analyses of gene expression, nucleosome occupancy, and histone H4 acetylation during the response of yeast (Saccharomyces cerevisiae) to different stresses and report several findings. First, a large class of predominantly ribosomal protein genes, whose transcription was repressed during both heat shock and stationary phase, showed strikingly contrasting histone acetylation patterns. Second, the SWI/SNF complex was required for normal activation as well as repression of genes during heat shock, and loss of SWI/SNF delayed chromatin remodeling at the promoters of activated genes. Third, Snf2 was recruited to ribosomal protein genes and Hsf1 target genes, and its occupancy of this large set of genes was altered during heat shock. Our results suggest a broad and direct dual role for SWI/SNF in chromatin remodeling, during heat shock activation as well as repression, at promoters and coding regions.
Regulation of cell cycle progression is fundamental to cell health and reproduction, and failures in this process are associated with many human diseases. Much of our knowledge of cell cycle regulators derives from loss-of-function studies. To reveal new cell cycle regulatory genes that are difficult to identify in loss-of-function studies, we performed a near-genome-wide flow cytometry assay of yeast gene overexpression-induced cell cycle delay phenotypes. We identified 108 genes whose overexpression significantly delayed the progression of the yeast cell cycle at a specific stage. Many of the genes are newly implicated in cell cycle progression, for example SKO1, RFA1, and YPR015C. The overexpression of RFA1 or YPR015C delayed the cell cycle at G2/M phases by disrupting spindle attachment to chromosomes and activating the DNA damage checkpoint, respectively. In contrast, overexpression of the transcription factor SKO1 arrests cells at G1 phase by activating the pheromone response pathway, revealing new cross-talk between osmotic sensing and mating. More generally, 92%–94% of the genes exhibit distinct phenotypes when overexpressed as compared to their corresponding deletion mutants, supporting the notion that many genes may gain functions upon overexpression. This work thus implicates new genes in cell cycle progression, complements previous screens, and lays the foundation for future experiments to define more precisely roles for these genes in cell cycle progression.
All cells require proper cell cycle regulation; failure leads to numerous human diseases. Cell cycle mechanisms are broadly conserved across eukaryotes, with many key regulatory genes known. Nonetheless, our knowledge of regulators is incomplete. Many classic studies have analyzed yeast loss-of-function mutants to identify cell cycle genes. Studies have also implicated genes based upon their overexpression phenotypes, but the effects of gene overexpression on the cell cycle have not been quantified for all yeast genes. We individually quantified the effect of overexpression on cell cycle progression for nearly all (91%) of yeast genes, and we report the 108 genes causing the most significant and reproducible cell cycle defects, most of which have not been previously observed. We characterize three genes in more detail, implicating one in chromosomal segregation and mitotic spindle formation. A second affects mitotic stability and the DNA damage checkpoint. Curiously, overexpression of a third gene, SKO1, arrests the cell cycle by activating the pheromone response pathway, with cells mistakenly behaving as if mating pheromone is present. These results establish a basis for future experiments elucidating precise cell cycle roles for these genes. Similar assays in human cells could help further clarify the many connections between cell cycle control and cancers.
The Myc oncoprotein is a transcription factor involved in a variety of human cancers. Overexpression of Myc is associated with malignant transformation. In normal cells, Myc is induced by mitotic signals, and in turn, it regulates the expression of downstream target genes. Although diverse roles of Myc have been predicted from many previous studies, detailed functions of Myc targets are still unclear. By combining chromatin immunoprecipitation (ChIP) and promoter microarrays, we identified a total of 1469 Myc direct target genes, the majority of which are novel, in HeLa cells and human primary fibroblasts. We observed dramatic changes of Myc occupancy at its target promoters in foreskin fibroblasts in response to serum stimulation. Among the targets of Myc, 107 were nuclear encoded genes involved in mitochondrial biogenesis. Genes with important roles in mitochondrial replication and biogenesis, such as POLG, POLG2, and NRF1 were identified as direct targets of Myc, confirming a direct role for Myc in regulating mitochondrial biogenesis. Analysis of target promoter sequences revealed a strong preference for Myc occupancy at promoters containing one of several described consensus sequences, CACGTG, in vivo. This study thus sheds light on the transcriptional regulatory networks mediated by Myc in vivo.
The eukaryotic genome is packaged into chromatin, and chromatin modification and remodeling play an important role in transcriptional regulation, DNA replication, recombination and repair. Recent findings have shown that various post-translational histone modifications cooperate to recruit different effector proteins that bring about mobilization of the nucleosomes and cause distinct downstream consequences. The combination of chromatin immunoprecipitation (ChIP) using antibodies directed against the core histones or specific histone modifications, with high-resolution tiling microarray analysis allows the examination of nucleosome occupancy and histone modification status genome-wide. Comparing genome-wide chromatin status with global gene expression patterns can reveal causal connections between specific patterns of histone modifications and the resulting gene expression. Here, we describe current methods based on recent advances in microarray technology to conduct such studies.
S. cerevisiae; chromatin remodeling; chromatin immunoprecipitation; tiling microarray
The eukaryotic genome is packaged as chromatin with nucleosomes comprising its basic structural unit, but the detailed structure of chromatin and its dynamic remodeling in terms of individual nucleosome positions has not been completely defined experimentally for any genome. We used ultra-high–throughput sequencing to map the remodeling of individual nucleosomes throughout the yeast genome before and after a physiological perturbation that causes genome-wide transcriptional changes. Nearly 80% of the genome is covered by positioned nucleosomes occurring in a limited number of stereotypical patterns in relation to transcribed regions and transcription factor binding sites. Chromatin remodeling in response to physiological perturbation was typically associated with the eviction, appearance, or repositioning of one or two nucleosomes in the promoter, rather than broader region-wide changes. Dynamic nucleosome remodeling tends to increase the accessibility of binding sites for transcription factors that mediate transcriptional changes. However, specific nucleosomal rearrangements were also evident at promoters even when there was no apparent transcriptional change, indicating that there is no simple, globally applicable relationship between chromatin remodeling and transcriptional activity. Our study provides a detailed, high-resolution, dynamic map of single-nucleosome remodeling across the yeast genome and its relation to global transcriptional changes.
The eukaryotic genome is packed in a systematic hierarchy to accommodate it within the confines of the cell's nucleus. This packing, however, presents an impediment to the transcription machinery when it must access genomic DNA to regulate gene expression. A fundamental aspect of genome packing is the spooling of DNA around nucleosomes—structures formed from histone proteins—which must be dislodged during transcription. In this study, we identified all the nucleosome displacements associated with a physiological perturbation causing genome-wide transcriptional changes in the eukaryote Saccharomyces cerevisiae. We isolated nucleosomal DNA before and after subjecting cells to heat shock, then identified the ends of these DNA fragments and, thereby, the location of nucleosomes along the genome, using ultra-high–throughput sequencing. We identified localized patterns of nucleosome displacement at gene promoters in response to heat shock, and found that nucleosome eviction was generally associated with activation and their appearance with gene repression. Nucleosome remodeling generally improved the accessibility of DNA to transcriptional regulators mediating the response to stresses like heat shock. However, not all nucleosomal remodeling was associated with transcriptional changes, indicating that the relationship between nucleosome repositioning and transcriptional activity is not merely a reflection of competing access to DNA.
Ultra-high-throughput sequencing is used to show that distinct, localized patterns of nucleosome repositioning at promoters underlie the genome-wide transcriptional response to a physiological stimulus.
Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cis-regulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment.
Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature.
Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.
Cell lines have been used to study cancer for decades, but truly quantitative assessment of their performance as models is often lacking. We used gene expression profiling to quantitatively assess the gene expression of nine cell line models of cervical cancer.
We find a wide variation in the extent to which different cell culture models mimic late-stage invasive cervical cancer biopsies. The lowest agreement was from monolayer HeLa cells, a common cervical cancer model; the highest agreement was from primary epithelial cells, C4-I, and C4-II cell lines. In addition, HeLa and SiHa cell lines cultured in an organotypic environment increased their correlation to cervical cancer significantly. We also find wide variation in agreement when we considered how well individual biological pathways model cervical cancer. Cell lines with an anti-correlation to cervical cancer were also identified and should be avoided.
Using gene expression profiling and quantitative analysis, we have characterized nine cell lines with respect to how well they serve as models of cervical cancer. Applying this method to individual pathways, we identified the appropriateness of particular cell lines for studying specific pathways in cervical cancer. This study will allow researchers to choose a cell line with the highest correlation to cervical cancer at a pathway level. This method is applicable to other cancers and could be used to identify the appropriate cell line and growth condition to employ when studying other cancers.
Global transcriptional profiling of human fibroblasts from two different tissue sources reveals distinct as well as conserved responses to different growth stimuli.
Serum treatment of quiescent human dermal fibroblasts induces proliferation, coupled with a complex physiological response that is indicative of their normal role in wound-healing. However, it is not known to what extent such complex transcriptional events are specific to a given cell type and signal, and how these global changes are coordinately regulated. We have profiled the global transcriptional program of human fibroblasts from two different tissue sources to distinct growth stimuli, and identified a striking conservation in their gene-expression signatures.
We found that the wound-healing program of gene expression was not specific to the response of dermal fibroblasts to serum but was regulated more broadly. However, there were specific differences among different stimuli with regard to signaling pathways that mediate these transcriptional programs. Our data suggest that the PI3-kinase pathway is differentially involved in mediating the responses of cells to serum as compared with individual peptide growth factors. Expression profiling indicated that let7 and other miRNAs with similar expression profiles may be involved in regulating the transcriptional program in response to proliferative signals.
This study provides insights into how different stimuli use distinct as well as conserved signaling and regulatory mechanisms to mediate genome-wide transcriptional reprogramming during cell proliferation. Our results indicate that conservation of transcriptional programs and their regulation among different cell types may be much broader than previously appreciated.
Spotted cell microarrays were developed for measuring cellular phenotypes on a large scale and used to identify genes involved in the response of yeast to mating pheromone.
We have developed spotted cell microarrays for measuring cellular phenotypes on a large scale. Collections of cells are printed, stained for subcellular features, then imaged via automated, high-throughput microscopy, allowing systematic phenotypic characterization. We used this technology to identify genes involved in the response of yeast to mating pheromone. Besides morphology assays, cell microarrays should be valuable for high-throughput in situ hybridization and immunoassays, enabling new classes of genetic assays based on cell imaging.
The recruitment of TATA box-binding protein (TBP) to promoters is one of the rate-limiting steps during transcription initiation. However, the global importance of TBP recruitment in determining the absolute and changing levels of transcription across the genome is not known. We used a genomic approach to explore the relationship between TBP recruitment to promoters and global gene expression profiles in Saccharomyces cerevisiae. Our data indicate that first, RNA polymerase III promoters are the most prominent binding targets of TBP in vivo. Second, the steady-state transcript levels of genes throughout the genome are proportional to the occupancy of their promoters by TBP, and changes in the expression levels of these genes are closely correlated with changes in TBP recruitment to their promoters. Third, a consensus TATA element does not appear to be a major determinant of either TBP binding or gene expression throughout the genome. Our results indicate that the recruitment of TBP to promoters in vivo is of universal importance in determining gene expression levels in yeast, regardless of the nature of the core promoter or the type of activator or repressor that may mediate changes in transcription. The primary data reported here are available at http://www.iyerlab.org/tbp.
Heat shock transcription factor (HSF) and the promoter heat shock element (HSE) are among the most highly conserved transcriptional regulatory elements in nature. HSF mediates the transcriptional response of eukaryotic cells to heat, infection and inflammation, pharmacological agents, and other stresses. While HSF is essential for cell viability in Saccharomyces cerevisiae, oogenesis and early development in Drosophila melanogaster, extended life span in Caenorhabditis elegans, and extraembryonic development and stress resistance in mammals, little is known about its full range of biological target genes. We used whole-genome analyses to identify virtually all of the direct transcriptional targets of yeast HSF, representing nearly 3% of the genomic loci. The majority of the identified loci are heat-inducibly bound by yeast HSF, and the target genes encode proteins that have a broad range of biological functions including protein folding and degradation, energy generation, protein trafficking, maintenance of cell integrity, small molecule transport, cell signaling, and transcription. This genome-wide identification of HSF target genes provides novel insights into the role of HSF in growth, development, disease, and aging and in the complex metabolic reprogramming that occurs in all cells in response to stress.
The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms.
The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at
Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data.
A report on the Genomics, Proteomics and Bioinformatics Thematic Meeting during the 2003 American Society for Biochemistry and Molecular Biology (ASBMB) Annual Meeting, San Diego, USA, 11-15 April 2003.
A report on the Genomics, Proteomics and Bioinformatics Thematic Meeting during the 2003 American Society for Biochemistry and Molecular Biology (ASBMB) Annual Meeting, San Diego, USA, 11-15 April 2003.
We sought to create a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. To this end, we used DNA microarrays and samples from yeast cultures synchronized by three independent methods: α factor arrest, elutriation, and arrest of a cdc15 temperature-sensitive mutant. Using periodicity and correlation algorithms, we identified 800 genes that meet an objective minimum criterion for cell cycle regulation. In separate experiments, designed to examine the effects of inducing either the G1 cyclin Cln3p or the B-type cyclin Clb2p, we found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins. Furthermore, we analyzed our set of cell cycle–regulated genes for known and new promoter elements and show that several known elements (or variations thereof) contain information predictive of cell cycle regulation. A full description and complete data sets are available at http://cellcycle-www.stanford.edu
Understanding the molecular basis for phenotypic differences between humans and other primates remains an outstanding challenge. Mutations in non-coding regulatory DNA that alter gene expression have been hypothesized as a key driver of these phenotypic differences. This has been supported by differential gene expression analyses in general, but not by the identification of specific regulatory elements responsible for changes in transcription and phenotype. To identify the genetic source of regulatory differences, we mapped DNaseI hypersensitive (DHS) sites, which mark all types of active gene regulatory elements, genome-wide in the same cell type isolated from human, chimpanzee, and macaque. Most DHS sites were conserved among all three species, as expected based on their central role in regulating transcription. However, we found evidence that several hundred DHS sites were gained or lost on the lineages leading to modern human and chimpanzee. Species-specific DHS site gains are enriched near differentially expressed genes, are positively correlated with increased transcription, show evidence of branch-specific positive selection, and overlap with active chromatin marks. Species-specific sequence differences in transcription factor motifs found within these DHS sites are linked with species-specific changes in chromatin accessibility. Together, these indicate that the regulatory elements identified here are genetic contributors to transcriptional and phenotypic differences among primate species.
The human genome shares a remarkable amount of genomic sequence with our closest living primate relatives. Researchers have long sought to understand what regions of the genome are responsible for unique species-specific traits. Previous studies have shown that many genes are differentially expressed between species, but the regulatory elements contributing to these differences are largely unknown. Here we report a genome-wide comparison of active gene regulatory elements in human, chimpanzee, and macaque, and we identify hundreds of regulatory elements that have been gained or lost in the human or chimpanzee genomes since their evolutionary divergence. These elements contain evidence of natural selection and correlate with species-specific changes in gene expression. Polymorphic DNA bases in transcription factor motifs that we found in these regulatory elements may be responsible for the varied biological functions across species. This study directly links phenotypic and transcriptional differences between species with changes in chromatin structure.