Reprogramming of somatic cells produces induced pluripotent stem cells (iPSCs) that are invaluable resources for biomedical research. Here, we extended the previous transcriptome studies by performing RNA-seq on cells defined by a combination of multiple cellular surface markers. We found that transcriptome changes during early reprogramming occur independently from the opening of closed chromatin by OCT4, SOX2, KLF4, and MYC (OSKM). Furthermore, our data identify multiple spliced forms of genes uniquely expressed at each progressive stage of reprogramming. In particular, we found a pluripotency-specific spliced form of CCNE1 that is specific to human and significantly enhances reprogramming. In addition, single nucleotide polymorphism (SNP) expression analysis reveals that monoallelic gene expression is induced in the intermediate stages of reprogramming, while biallelic expression is recovered upon completion of reprogramming. Our transcriptome data provide unique opportunities in understanding human iPSC reprogramming.
•Initial transcriptional change relies on histone modifications in fibroblast•Allele-specific gene expression is manifested during reprogramming•A large number of spliced forms of genes are identified during reprogramming•Pluripotent-specific splicing of CCNE1 (pCCNE1) enhances reprogramming
In this article, Park and colleagues provide robust transcriptome data acquired in cells undergoing human iPSC reprogramming. Significant changes in cell signaling pathways, allelic-specific expression, and an uncharacterized splicing of CCNE1 (pCCNE1) were identified during the progressive cell fate change of fibroblasts to iPSCs. The data will broaden the understanding of the reprogramming process and human-specific gene regulation.
As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast, and embryonic stem cell lines. By combining the genome-wide TF occupancy repertoires, associated epigenetic signals, and TF co-association patterns, we deduced several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF occupied sequences (TF OSs). However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Importantly, occupancy conserved TF OSs tend to be pleiotropic; they function in multiple tissues and also co-associate with multiple TFs. Single nucleotide variants (SNVs) at sites with potential regulatory functions are enriched in occupancy conserved TF OSs.
Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology.
autism spectrum disorders; corpus callosum; functional modules; oligodendrocytes; protein interaction network
Transcription factors (TFs) bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 TFs in 458 ChIP-Seq experiments. We found the combinatorial, co-association of TFs to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the TF binding into a hierarchy and integrated it with other genomic information (e.g. miRNA regulation), forming a dense meta-network. Factors at different levels have different properties: for instance, top-level TFs more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs -- e.g. noise-buffering feed-forward loops. Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (i.e., differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
Combined Immunodeficiency with Multiple Intestinal Atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.
We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequence of 5 patients and their healthy direct relatives from 5 unrelated families.
We performed whole exome sequencing on 5 CID-MIA patients and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene TTC7A on 3 additional CID-MIA patients.
Through analysis and comparison of the exomic sequence of the individuals from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in two of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from two patients with CID-MIA.
We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.
Damaging mutations in the gene TTC7A should be scrutinized in patients with CID-MIA. Characterization of the role of this protein in the immune system and intestinal development, as well as in thymic epithelial cells may have important therapeutic implications.
Combined Immunodeficiency with Multiple Intestinal Atresias; Tetracopeptide Repeat Domain 7A; Whole Exome Sequencing; Thymus
Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. Presented here is a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. The Nimblegen platform, which is the only one to use high-density overlapping baits, provides increased efficiency of enrichment and sensitivity for detecting variants but covers fewer genomic regions than the other platforms. As a result, Nimblegen requires the least amount of sequencing to sensitively detect small variants, but Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina in particular captures the untranslated regions, which are missing from the Nimblegen and Agilent platforms. Exome sequencing and whole genome sequencing (WGS) of the same sample were also compared, demonstrating that exome-seq allows for the detection of additional small variants missed by WGS. These data suggest that WGS experiments benefit from being supplemented with targeted exome-seq data. This study serves to assist the community in selecting the optimal exome-seq platform for their experiments, as well as proving that exome-seq is capable of identifying important coding variations that are missed by a typical WGS experiment.
The Y chromosome and the mitochondrial genome (mtDNA) have been used to estimate when the common patrilineal and matrilineal ancestors of humans lived. We sequenced the genomes of 69 males from nine populations, including two in which we find basal branches of the Y chromosome tree. We identify ancient phylogenetic structure within African haplogroups and resolve a long-standing ambiguity deep within the tree. Applying equivalent methodologies to the Y and mtDNA, we estimate the time to the most recent common ancestor (TMRCA) of the Y chromosome to be 120–156 thousand years and the mtDNA TMRCA to be 99–148 ky. Our findings suggest that, contrary to prior claims, male lineages do not coalesce significantly more recently than female lineages.
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here we present an integrative Personal Omics Profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14-month period. Our iPOP analysis revealed various medical risks, including Type II diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high coverage genomic and transcriptomic data, which provide the basis of our iPOP, discovered extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and disease states by connecting genomic information with additional dynamic omics activity.
Higher-order chromosomal organization for transcription regulation is poorly understood in eukaryotes. Using genome-wide Chromatin Interaction Analysis with Paired-End-Tag sequencing (ChIA-PET), we mapped long-range chromatin interactions associated with RNA polymerase II in human cells and uncovered widespread promoter-centered intra-genic, extra-genic and inter-genic interactions. These interactions further aggregated into higher-order clusters, wherein proximal and distal genes were engaged through promoter-promoter interactions. Most genes with promoter-promoter interactions were active and transcribed cooperatively, and some interacting promoters could influence each other implying combinatorial complexity of transcriptional controls. Comparative analyses of different cell lines showed that cell-specific chromatin interactions could provide structural frameworks for cell-specific transcription, and suggested significant enrichment of enhancer-promoter interactions for cell-specific functions. Furthermore, genetically-identified disease-associated non-coding elements were found to be spatially engaged with corresponding genes through long-range interactions. Overall, our study provides insights into the transcription regulation by three-dimensional chromatin interactions for both housekeeping and cell-specific genes in human cells.
Chromatin-remodeling enzymes play essential roles in many biological processes, including gene expression, DNA replication and repair, and cell division. Although one such complex, SWI/SNF, has been extensively studied, new discoveries are still being made. Here, we review SWI/SNF biochemistry; highlight recent genomic and proteomic advances; and address the role of SWI/SNF in human diseases, including cancer and viral infections. These studies have greatly increased our understanding of complex nuclear processes.
Cancer; Chromatin; Chromatin Immunoprecipitation (ChIP); Chromatin Remodeling; DNA Sequencing; HIV-1; Mass Spectrometry (MS); Transcriptional Regulation; Viral Transcription; SWI/SNF
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5′ ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.
Genetic information and programming are not entirely contained in DNA sequence but are also governed by chromatin structure. Gaining a greater understanding of chromatin remodeling complexes can bridge gaps between processes in the genome and the epigenome and can offer insights into diseases such as cancer. We identified targets of the chromatin remodeling complex, SWI/SNF, on a genome-wide scale using ChIP-Seq. We also identify proteins that co-purify with its various components via immunoprecipitation combined with mass spectrometry. By integrating these newly-identified regions with a combination of novel and published data sources, we identify pathways and cellular compartments in which SWI/SNF plays a major role as well as discern general characteristics of SWI/SNF target sites. Our parallel evaluations of multiple SWI/SNF factors indicate that these subunits are found in highly dynamic and combinatorial assemblies. Our study presents the first genome-wide and unified view of multiple SWI/SNF components and also provides a valuable resource to the scientific community as an important data source to be integrated with future genomic and epigenomic studies.
Chromatin immunoprecipitation followed by tag sequencing (ChIP-Seq) using high-throughput next-generation instrumentation is replacing ChIP-chip for mapping of sites of transcription-factor binding and chromatin modification. To develop a scoring approach for this new technique, we produce two deeply sequenced datasets for human RNA polymerase II and STAT1 with matching input-DNA controls. In these, we observe that signal peaks corresponding to sites of potential binding are strongly correlated with peaks in the control, likely revealing features of open chromatin. Based on these observations, we develop a two-pass approach for scoring ChIP-Seq relative to controls. The first pass identifies putative binding sites and compensates for genomic variation in the mappability of sequences. The second pass filters sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Using our scoring we investigate optimal experimental design – i.e. depth of sequencing and value of replicas (showing marginal information gain beyond two).
Acinetobacter baumannii is a common pathogen whose recent resistance to drugs has emerged as a major health problem. Ethanol has been found to increase the virulence of A. baumannii in Dictyostelium discoideum and Caenorhabditis elegans models of infection. To better understand the causes of this effect, we examined the transcriptional profile of A. baumannii grown in the presence or absence of ethanol using RNA-Seq. Using the Illumina/Solexa platform, a total of 43,453,960 reads (35 nt) were obtained, of which 3,596,474 mapped uniquely to the genome. Our analysis revealed that ethanol induces the expression of 49 genes that belong to different functional categories. A strong induction was observed for genes encoding metabolic enzymes, indicating that ethanol is efficiently assimilated. In addition, we detected the induction of genes encoding stress proteins, including upsA, hsp90, groEL and lon as well as permeases, efflux pumps and a secreted phospholipase C. In stationary phase, ethanol strongly induced several genes involved with iron assimilation and a high-affinity phosphate transport system, indicating that A. baumannii makes a better use of the iron and phosphate resources in the medium when ethanol is used as a carbon source. To evaluate the role of phospholipase C (Plc1) in virulence, we generated and analyzed a deletion mutant for plc1. This strain exhibits a modest, but reproducible, reduction in the cytotoxic effect caused by A. baumannii on epithelial cells, suggesting that phospholipase C is important for virulence. Overall, our results indicate the power of applying RNA-Seq to identify key modulators of bacterial pathogenesis. We suggest that the effect of ethanol on the virulence of A. baumannii is multifactorial and includes a general stress response and other specific components such as phospholipase C.
Acinetobacter baumannii has recently emerged as a frequent opportunistic pathogen. In the presence of ethanol A. baumannii increases its pathogenicity towards Dictyostelium discoideum and Caenorhabditis elegans, and community-acquired infections of A. baumannii are associated with alcoholism. Ethanol negatively affects both epithelial cells and alters the bacterial physiology. To explore the underlying basis for the increased virulence of A. baumannii in the presence of ethanol we examined the transcriptional profile of this bacterium using the novel methodology known as RNA-Seq. We show that ethanol induces the expression of a phospholipase C, which contributes to A. baumannii cytotoxicity. We also show that many proteins related to stress were induced and that ethanol is efficiently assimilated as a carbon source leading to induction in stationary phase of two different Fe uptake systems and a phosphate transport system. Interestingly, a previous study showed that a mutant in the high-affinity phosphate uptake system was avirulent. Our work contributes to the understanding of A. baumannii pathogenesis and provides a powerful approach that can be extended to other pathogenic bacteria.
We assess the role of intrinsic histone-DNA interactions by mapping nucleosomes assembled in vitro on genomic DNA. Nucleosomes strongly prefer yeast DNA over E. coli DNA, indicating that the yeast genome evolved to favor nucleosome formation. Many yeast promoter and terminator regions intrinsically disfavor nucleosome formation, and nucleosomes assembled in vitro display strong rotational positioning. Nucleosome arrays generated by the ACF assembly factor display fewer nucleosome-free regions, reduced rotational positioning, and less translational positioning than obtained by intrinsic histone-DNA interactions. Importantly, in vitro assembled nucleosomes display only a limited preference for specific translational positions and do not show the pattern observed in vivo. Our results argue against a genomic code for nucleosome positioning, and they suggest that the nucleosomal pattern in coding regions arises primarily from statistical positioning from a barrier near the promoter that involves some aspect of transcriptional initiation by RNA polymerase II.
Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs.
We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously.
We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.
RACE sequencing of ENCODE regions shows that much of the human genome is represented in poly(A)+ RNA.
Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced.
We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins.
We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.
The cyclic AMP-responsive element-binding protein (CREB) is an important transcription factor that can be activated by hormonal stimulation and regulates neuronal function and development. An unbiased, global analysis of where CREB binds has not been performed. We have mapped for the first time the binding distribution of CREB along an entire human chromosome. Chromatin immunoprecipitation of CREB-associated DNA and subsequent hybridization of the associated DNA to a genomic DNA microarray containing all of the nonrepetitive DNA of human chromosome 22 revealed 215 binding sites corresponding to 192 different loci and 100 annotated potential gene targets. We found binding near or within many genes involved in signal transduction and neuronal function. We also found that only a small fraction of CREB binding sites lay near well-defined 5′ ends of genes; the majority of sites were found elsewhere, including introns and unannotated regions. Several of the latter lay near novel unannotated transcriptionally active regions. Few CREB targets were found near full-length cyclic AMP response element sites; the majority contained shorter versions or close matches to this sequence. Several of the CREB targets were altered in their expression by treatment with forskolin; interestingly, both induced and repressed genes were found. Our results provide novel molecular insights into how CREB mediates its functions in humans.
Previously, antibodies were raised against a nuclear envelope-enriched fraction of yeast, and the essential gene NNF1 was cloned by reverse genetics. Here it is shown that the conditional nnf1-17 mutant has decreased stability of a minichromosome in addition to mitotic spindle defects. I have identified the novel essential genes DSN1, DSN3, and NSL1 through genetic interactions with nnf1-17. Dsn3p was found to be equivalent to the kinetochore protein Mtw1p. By indirect immunofluorescence, all four proteins, Nnf1p, Mtw1p, Dsn1p, and Nsl1p, colocalize and are found in the region of the spindle poles. Based on the colocalization of these four proteins, the minichromosome instability and the spindle defects seen in nnf1 mutants, I propose that Nnf1p is part of a new group of proteins necessary for chromosome segregation.