Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Apoptotic Lymphocytes of H. sapiens Lose Nucleosomes in GC-Rich Promoters 
PLoS Computational Biology  2014;10(7):e1003760.
We analyzed two sets of human CD4+ nucleosomal DNA directly sequenced by Illumina (Solexa) high throughput sequencing method. The first set has ∼40 M sequences and was produced from the normal CD4+ T lymphocytes by micrococcal nuclease. The second set has ∼44 M sequences and was obtained from peripheral blood lymphocytes by apoptotic nucleases. The different nucleosome sets showed similar dinucleotide positioning AA/TT, GG/CC, and RR/YY (R is purine, Y - pyrimidine) patterns with periods of 10–10.4 bp. Peaks of GG/CC and AA/TT patterns were shifted by 5 bp from each other. Two types of promoters in H. sapiens: AT and GC-rich were identified. AT-rich promoters in apoptotic cell had +1 nucleosome shifts 50–60 bp downstream from those in normal lymphocytes. GC-rich promoters in apoptotic cells lost 80% of nucleosomes around transcription start sites as well as in total DNA. Nucleosome positioning was predicted by combination of {AA, TT}, {GG, CC}, {WW, SS} and {RR, YY} patterns. In our study we found that the combinations of {AA, TT} and {GG, CC} provide the best results and successfully mapped 33% of nucleosomes 147 bp long with precision ±15 bp (only 31/147 or 21% is expected).
Author Summary
We analyzed nucleosomal DNA of human CD4+ T normal and apoptotic lymphocytes. Dinucleotide positions (pattern) of AA/TT, GG/CC, WW/SS (W is adenine or thymine, S is guanine or cytosine) and RR/YY (R is purine, Y - pyrimidine) of nucleosome sequences in both cell conditions are similar and have period 10–10.4 bp. We successfully mapped 33% of nucleosomes with precision ±15 bp by combination of {AA, TT}, {GG, CC}, {WW, SS} and {RR, YY} patterns. We identified two types of promoters in H. sapience: AT and GC-rich. AT-rich promoters keep nucleosomes around transcription start site when GC-rich promoters lost 80% of nucleosomes during apoptosis at the same region.
PMCID: PMC4117428  PMID: 25077608
2.  Snf2h-mediated chromatin organization and histone H1 dynamics govern cerebellar morphogenesis and neural maturation 
Nature Communications  2014;5:4181.
Chromatin compaction mediates progenitor to post-mitotic cell transitions and modulates gene expression programs, yet the mechanisms are poorly defined. Snf2h and Snf2l are ATP-dependent chromatin remodelling proteins that assemble, reposition and space nucleosomes, and are robustly expressed in the brain. Here we show that mice conditionally inactivated for Snf2h in neural progenitors have reduced levels of histone H1 and H2A variants that compromise chromatin fluidity and transcriptional programs within the developing cerebellum. Disorganized chromatin limits Purkinje and granule neuron progenitor expansion, resulting in abnormal post-natal foliation, while deregulated transcriptional programs contribute to altered neural maturation, motor dysfunction and death. However, mice survive to young adulthood, in part from Snf2l compensation that restores Engrailed-1 expression. Similarly, Purkinje-specific Snf2h ablation affects chromatin ultrastructure and dendritic arborization, but alters cognitive skills rather than motor control. Our studies reveal that Snf2h controls chromatin organization and histone H1 dynamics for the establishment of gene expression programs underlying cerebellar morphogenesis and neural maturation.
The chromatin remodelling proteins Snf2h and Snf2l regulate nucleosome spacing. Here, the authors show that Snf2h ablation impairs chromatin organization of neuronal lineages during mouse embryonic and post-natal cerebellar development.
PMCID: PMC4083431  PMID: 24946904
3.  Optimized Position Weight Matrices in Prediction of Novel Putative Binding Sites for Transcription Factors in the Drosophila melanogaster Genome 
PLoS ONE  2013;8(8):e68712.
Position weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. DNA-binding proteins often show degeneracy in their binding requirement and thus the overall binding specificity of many proteins is unknown and remains an active area of research. Although existing PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. Our previous study introduced a promising approach to PWM refinement in which known motifs are used to computationally mine putative binding sites directly from aligned promoter regions using composition of similar sites. In the present study, we extended this technique originally tested on single examples of transcription factors (TFs) and showed its capability to optimize PWM performance to predict new binding sites in the fruit fly genome. We propose refined PWMs in mono- and dinucleotide versions similarly computed for a large variety of transcription factors of Drosophila melanogaster. Along with the addition of many auxiliary sites the optimization includes variation of the PWM motif length, the binding sites location on the promoters and the PWM score threshold. To assess the predictive performance of the refined PWMs we compared them to conventional TRANSFAC and JASPAR sources. The results have been verified using performed tests and literature review. Overall, the refined PWMs containing putative sites derived from real promoter content processed using optimized parameters had better general accuracy than conventional PWMs.
PMCID: PMC3735551  PMID: 23936309
4.  Identification of cis-regulatory modules in promoters of human genes exploiting mutual positioning of transcription factors 
Nucleic Acids Research  2013;41(19):8822-8841.
In higher organisms, gene regulation is controlled by the interplay of non-random combinations of multiple transcription factors (TFs). Although numerous attempts have been made to identify these combinations, important details, such as mutual positioning of the factors that have an important role in the TF interplay, are still missing. The goal of the present work is in silico mapping of some of such associating factors based on their mutual positioning, using computational screening. We have selected the process of myogenesis as a study case, and we focused on TF combinations involving master myogenic TF Myogenic differentiation (MyoD) with other factors situated at specific distances from it. The results of our work show that some muscle-specific factors occur together with MyoD within the range of ±100 bp in a large number of promoters. We confirm co-occurrence of the MyoD with muscle-specific factors as described in earlier studies. However, we have also found novel relationships of MyoD with other factors not specific for muscle. Additionally, we have observed that MyoD tends to associate with different factors in proximal and distal promoter areas. The major outcome of our study is establishing the genome-wide connection between biological interactions of TFs and close co-occurrence of their binding sites.
PMCID: PMC3799424  PMID: 23913413
5.  Human milk metagenome: a functional capacity analysis 
BMC Microbiology  2013;13:116.
Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk.
The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P < 0.05). The human milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes.
Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the functionality of the human milk metagenome are warranted.
PMCID: PMC3679945  PMID: 23705844
Human milk; Microbiome; Metagenome; Bacteria; Illumina; DNA; Open reading frames; Immune-modulatory motifs; Infant feces
6.  Optimizing the GATA-3 position weight matrix to improve the identification of novel binding sites 
BMC Genomics  2012;13:416.
The identifying of binding sites for transcription factors is a key component of gene regulatory network analysis. This is often done using position-weight matrices (PWMs). Because of the importance of in silico mapping of tentative binding sites, we previously developed an approach for PWM optimization that substantially improves the accuracy of such mapping.
The present work implements the optimization algorithm applied to the existing PWM for GATA-3 transcription factor and builds a new di-nucleotide PWM. The existing available PWM is based on experimental data adopted from Jaspar. The optimized PWM substantially improves the sensitivity and specificity of the TF mapping compared to the conventional applications. The refined PWM also facilitates in silico identification of novel binding sites that are supported by experimental data. We also describe uncommon positioning of binding motifs for several T-cell lineage specific factors in human promoters.
Our proposed di-nucleotide PWM approach outperforms the conventional mono-nucleotide PWM approach with respect to GATA-3. Therefore our new di-nucleotide PWM provides new insight into plausible transcriptional regulatory interactions in human promoters.
PMCID: PMC3481455  PMID: 22913572
Transcription factor; Binding sites; GATA-3; Human promoter; Position weight matrix; Optimization
7.  Discovery, optimization and validation of an optimal DNA-binding sequence for the Six1 homeodomain transcription factor 
Nucleic Acids Research  2012;40(17):8227-8239.
The Six1 transcription factor is a homeodomain protein involved in controlling gene expression during embryonic development. Six1 establishes gene expression profiles that enable skeletal myogenesis and nephrogenesis, among others. While several homeodomain factors have been extensively characterized with regards to their DNA-binding properties, relatively little is known of the properties of Six1. We have used the genomic binding profile of Six1 during the myogenic differentiation of myoblasts to obtain a better understanding of its preferences for recognizing certain DNA sequences. DNA sequence analyses on our genomic binding dataset, combined with biochemical characterization using binding assays, reveal that Six1 has a much broader DNA-binding sequence spectrum than had been previously determined. Moreover, using a position weight matrix optimization algorithm, we generated a highly sensitive and specific matrix that can be used to predict novel Six1-binding sites with highest accuracy. Furthermore, our results support the idea of a mode of DNA recognition by this factor where Six1 itself is sufficient for sequence discrimination, and where Six1 domains outside of its homeodomain contribute to binding site selection. Together, our results provide new light on the properties of this important transcription factor, and will enable more accurate modeling of Six1 function in bioinformatic studies.
PMCID: PMC3458543  PMID: 22730291
8.  Nucleosome organization in the Drosophila genome 
Nature  2008;453(7193):358-362.
Comparative genomics of nucleosome positions provides a powerful means for understanding how the organization of chromatin and the transcription machinery co-evolve. Here we produce a high resolution reference map of H2A.Z and bulk nucleosome locations across the genome of the fly D. melanogaster, and compare it to that from the yeast S. cerevisiae. Like Saccharomyces, Drosophila nucleosomes are organized around active transcription start sites in a canonical −1, NFR (nucleosome-free region), +1 arrangement. However, Drosophila does not incorporate H2A.Z into the −1 nucleosome and does not bury its transcriptional start site in the +1 nucleosome. At thousands of genes, RNA polymerase II engages the +1 nucleosome and pauses. How the transcription initiation machinery contends with the +1 nucleosome appears to be fundamentally different between lower and higher eukaryotes.
PMCID: PMC2735122  PMID: 18408708
9.  Restriction Landmark Genomic Scanning (RLGS) spot identification by second generation virtual RLGS in multiple genomes with multiple enzyme combinations 
BMC Genomics  2007;8:446.
Restriction landmark genomic scanning (RLGS) is one of the most successfully applied methods for the identification of aberrant CpG island hypermethylation in cancer, as well as the identification of tissue specific methylation of CpG islands. However, a limitation to the utility of this method has been the ability to assign specific genomic sequences to RLGS spots, a process commonly referred to as "RLGS spot cloning."
We report the development of a virtual RLGS method (vRLGS) that allows for RLGS spot identification in any sequenced genome and with any enzyme combination. We report significant improvements in predicting DNA fragment migration patterns by incorporating sequence information into the migration models, and demonstrate a median Euclidian distance between actual and predicted spot migration of 0.18 centimeters for the most complex human RLGS pattern. We report the confirmed identification of 795 human and 530 mouse RLGS spots for the most commonly used enzyme combinations. We also developed a method to filter the virtual spots to reduce the number of extra spots seen on a virtual profile for both the mouse and human genomes. We demonstrate use of this filter to simplify spot cloning and to assist in the identification of spots exhibiting tissue-specific methylation.
The new vRLGS system reported here is highly robust for the identification of novel RLGS spots. The migration models developed are not specific to the genome being studied or the enzyme combination being used, making this tool broadly applicable. The identification of hundreds of mouse and human RLGS spot loci confirms the strong bias of RLGS studies to focus on CpG islands and provides a valuable resource to rapidly study their methylation.
PMCID: PMC2235865  PMID: 18053125
10.  Global dynamics of newly constructed oligonucleosomes of conventional and variant H2A.Z histone 
Complexes of nucleosomes, which often occur in the gene promoter areas, are one of the fundamental levels of chromatin organization and thus are important for transcription regulation. Investigating the dynamic structure of a single nucleosome as well as nucleosome complexes is important for understanding transcription within chromatin. In a previous work, we highlighted the influence of histone variants on the functional dynamics of a single nucleosome using normal mode analysis developed by Bahar et al. The present work further analyzes the dynamics of nucleosome complexes (nucleosome oligomers or oligonucleosomes) such as dimer, trimer and tetramer (beads on a string model) with conventional core histones as well as with the H2A.Z histone variant using normal mode analysis.
The global dynamics of oligonucleosomes reveal larger amplitude of motion within the nucleosomes that contain the H2A.Z variant with in-planar and out-of-planar fluctuations as the common mode of relaxation. The docking region of H2A.Z and the L1:L1 interactions between H2A.Z monomers of nucleosome (that are responsible for the highly stable nucleosome containing variant H2A.Z-histone) are highly dynamic throughout the first two dynamic modes.
Dissection of the dynamics of oligonucleosomes discloses in-plane as well as out-of-plane fluctuations as the common mode of relaxation throughout the global motions. The dynamics of individual nucleosomes and the combination of the relaxation mechanisms expressed by the individual nucleosome are quite interesting and highly dependent on the number of nucleosome fragments present in the complexes. Distortions generated by the non-planar dynamics influence the DNA conformation, and hence the histone-DNA interactions significantly alter the dynamics of the DNA. The variant H2A.Z histone is a major source of weaker intra- and inter-molecular correlations resulting in more disordered motions.
PMCID: PMC2216022  PMID: 17996059
11.  The features of Drosophila core promoters revealed by statistical analysis 
BMC Genomics  2006;7:161.
Experimental investigation of transcription is still a very labor- and time-consuming process. Only a few transcription initiation scenarios have been studied in detail. The mechanism of interaction between basal machinery and promoter, in particular core promoter elements, is not known for the majority of identified promoters. In this study, we reveal various transcription initiation mechanisms by statistical analysis of 3393 nonredundant Drosophila promoters.
Using Drosophila-specific position-weight matrices, we identified promoters containing TATA box, Initiator, Downstream Promoter Element (DPE), and Motif Ten Element (MTE), as well as core elements discovered in Human (TFIIB Recognition Element (BRE) and Downstream Core Element (DCE)). Promoters utilizing known synergetic combinations of two core elements (TATA_Inr, Inr_MTE, Inr_DPE, and DPE_MTE) were identified. We also establish the existence of promoters with potentially novel synergetic combinations: TATA_DPE and TATA_MTE. Our analysis revealed several motifs with the features of promoter elements, including possible novel core promoter element(s). Comparison of Human and Drosophila showed consistent percentages of promoters with TATA, Inr, DPE, and synergetic combinations thereof, as well as most of the same functional and mutual positions of the core elements. No statistical evidence of MTE utilization in Human was found. Distinct nucleosome positioning in particular promoter classes was revealed.
We present lists of promoters that potentially utilize the aforementioned elements/combinations. The number of these promoters is two orders of magnitude larger than the number of promoters in which transcription initiation was experimentally studied. The sequences are ready to be experimentally tested or used for further statistical analysis. The developed approach may be utilized for other species.
PMCID: PMC1538597  PMID: 16790048
13.  Functional Characterization of Core Promoter Elements: the Downstream Core Element Is Recognized by TAF1† 
Molecular and Cellular Biology  2005;25(21):9674-9686.
Downstream elements are a newly appreciated class of core promoter elements of RNA polymerase II-transcribed genes. The downstream core element (DCE) was discovered in the human β-globin promoter, and its sequence composition is distinct from that of the downstream promoter element (DPE). We show here that the DCE is a bona fide core promoter element present in a large number of promoters and with high incidence in promoters containing a TATA motif. Database analysis indicates that the DCE is found in diverse promoters, supporting its functional relevance in a variety of promoter contexts. The DCE consists of three subelements, and DCE function is recapitulated in a TFIID-dependent manner. Subelement 3 can function independently of the other two and shows a TFIID requirement as well. UV photo-cross-linking results demonstrate that TAF1/TAFII250 interacts with the DCE subelement DNA in a sequence-dependent manner. These data show that downstream elements consist of at least two types, those of the DPE class and those of the DCE class; they function via different DNA sequences and interact with different transcription activation factors. Finally, these data argue that TFIID is, in fact, a core promoter recognition complex.
PMCID: PMC1265815  PMID: 16227614
14.  Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites 
Nucleic Acids Research  2005;33(7):2290-2301.
Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden–Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.
PMCID: PMC1084321  PMID: 15849315

Results 1-14 (14)