PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1490397)

Clipboard (0)
None

Related Articles

1.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny 
PLoS Computational Biology  2005;1(7):e67.
A central problem in the bioinformatics of gene regulation is to find the binding sites for regulatory proteins. One of the most promising approaches toward identifying these short and fuzzy sequence patterns is the comparative analysis of orthologous intergenic regions of related species. This analysis is complicated by various factors. First, one needs to take the phylogenetic relationship between the species into account in order to distinguish conservation that is due to the occurrence of functional sites from spurious conservation that is due to evolutionary proximity. Second, one has to deal with the complexities of multiple alignments of orthologous intergenic regions, and one has to consider the possibility that functional sites may occur outside of conserved segments. Here we present a new motif sampling algorithm, PhyloGibbs, that runs on arbitrary collections of multiple local sequence alignments of orthologous sequences. The algorithm searches over all ways in which an arbitrary number of binding sites for an arbitrary number of transcription factors (TFs) can be assigned to the multiple sequence alignments. These binding site configurations are scored by a Bayesian probabilistic model that treats aligned sequences by a model for the evolution of binding sites and “background” intergenic DNA. This model takes the phylogenetic relationship between the species in the alignment explicitly into account. The algorithm uses simulated annealing and Monte Carlo Markov-chain sampling to rigorously assign posterior probabilities to all the binding sites that it reports. In tests on synthetic data and real data from five Saccharomyces species our algorithm performs significantly better than four other motif-finding algorithms, including algorithms that also take phylogeny into account. Our results also show that, in contrast to the other algorithms, PhyloGibbs can make realistic estimates of the reliability of its predictions. Our tests suggest that, running on the five-species multiple alignment of a single gene's upstream region, PhyloGibbs on average recovers over 50% of all binding sites in S. cerevisiae at a specificity of about 50%, and 33% of all binding sites at a specificity of about 85%. We also tested PhyloGibbs on collections of multiple alignments of intergenic regions that were recently annotated, based on ChIP-on-chip data, to contain binding sites for the same TF. We compared PhyloGibbs's results with the previous analysis of these data using six other motif-finding algorithms. For 16 of 21 TFs for which all other motif-finding methods failed to find a significant motif, PhyloGibbs did recover a motif that matches the literature consensus. In 11 cases where there was disagreement in the results we compiled lists of known target genes from the literature, and found that running PhyloGibbs on their regulatory regions yielded a binding motif matching the literature consensus in all but one of the cases. Interestingly, these literature gene lists had little overlap with the targets annotated based on the ChIP-on-chip data. The PhyloGibbs code can be downloaded from http://www.biozentrum.unibas.ch/~nimwegen/cgi-bin/phylogibbs.cgi or http://www.imsc.res.in/~rsidd/phylogibbs. The full set of predicted sites from our tests on yeast are available at http://www.swissregulon.unibas.ch.
Synopsis
Computational discovery of regulatory sites in intergenic DNA is one of the central problems in bioinformatics. Up until recently motif finders would typically take one of the following two general approaches. Given a known set of co-regulated genes, one searches their promoter regions for significantly overrepresented sequence motifs. Alternatively, in a “phylogenetic footprinting” approach one searches multiple alignments of orthologous intergenic regions for short segments that are significantly more conserved than expected based on the phylogeny of the species.
In this work the authors present an algorithm, PhyloGibbs, that combines these two approaches into one integrated Bayesian framework. The algorithm searches over all ways in which an arbitrary number of binding sites for an arbitrary number of transcription factors can be assigned to arbitrary collections of multiple sequence alignments while taking into account the phylogenetic relations between the sequences.
The authors perform a number of tests on synthetic data and real data from Saccharomyces genomes in which PhyloGibbs significantly outperforms other existing methods. Finally, a novel anneal-and-track strategy allows PhyloGibbs to make accurate estimates of the reliability of its predictions.
doi:10.1371/journal.pcbi.0010067
PMCID: PMC1309704  PMID: 16477324
2.  Combinatorial depletion analysis to assemble the network architecture of the SAGA and ADA chromatin remodeling complexes 
A combinatorial depletion strategy is combined with biochemistry, quantitative proteomics and computational approaches to elucidate the structure of the SAGA/ADA complexes. The analysis reveals five connected functional modules capable of independent assembly.
A combinatorial approach of gene depletions with multiple bait proteins coupled with biochemical, proteomic and computational approaches can experimentally determine modules of stable multi-protein complexes.SAGA is a 19-subunit complex consisting of five connected modules with Spt20 being particularly important for the assembly of the intact complex.One of the modules, the HAT/Core module, is also shared with the distinct six-subunit complex ADA.Architectural models of large multi-protein complexes can be assembled using our approach, which is an alternative method to generate novel insight into the organization and architecture of multi-protein complexes.
Determining the architectures of protein complexes improves our understanding of protein cellular functions. In order to efficiently characterize the subunits of protein complexes assembled in vivo, affinity purification followed by proteomics mass spectrometry (APMS) strategies have been devised. Partial or whole protein complexes are first biochemically isolated using tagged components of the complex, followed by an identification of all co-purified proteins using mass spectrometry. However, those approaches are insufficient to provide information about the spatial arrangement and the interrelationship of the proteins of the respective complex.
In this study, we developed and applied a novel method utilizing biochemistry, quantitative proteomics and computational approaches in order to characterize the organization of proteins in a complex. The key of our method is the systematic purification of several tagged components of the protein complex in multiple genetic deletion strains, which serve to compromise the integrity of the complex. Using a series of computational methods, these raw quantitative values are next interpreted in order to determine the modular organization of the complex as well as the interrelationships between its subunits, which in turn can be used to predict a macromolecular model of the complex.
We tested this approach to obtain novel insights into the architecture of multi-protein complexes on the Saccharomyces cerevisiae Spt–Ada–Gcn5 histone acetyltransferase (HAT) (SAGA) and ADA complexes, which are conserved complexes involved in chromatin remodeling (Koutelou et al, 2010). Regular quantitative APMS strategies in wild-type backgrounds were not sufficient to separate tight protein complexes like SAGA/ADA into its distinct modules. However, after perturbing the system using genetic deletions of several subunits located in different topological parts of SAGA, hierarchical cluster analysis performed on 34 purifications (generated using 10 different TAP-tagged baits) resulted in a dissociation of the Gcn5 HAT complexes into five modules: (1) the SA_TAF module, (2) the SA_SPT module, (3) the DUB module, (4) the HAT/Core module and (5) the ADA module (Figure 2A and B).
The approach of purifying a protein in a deletion strain furthermore provides valuable information about the influence of the deleted subunit on the association and interdependency of the bait and the remaining preys. In order to quantify these associations, we calculated a probability between every prey and bait in the deletion strain purifications based on Bayes' theorem (Sardiu et al, 2008). In conjunction with preexisting interaction data obtained from yeast two-hybrid and genetic complementation assays, we finally used these probabilities to predict a low-resolution model for the architecture of the SAGA and ADA complexes (Figure 4).
This novel approach revealed that the SAGA/ADA complexes are composed of five distinct functional modules, of which two were not previously described (SA_SPT and SA_TAF). These modules, which are responsible for different functions of the SAGA complex, are capable of assembling independently from the remaining modules of the complex. Furthermore, we identified a novel subunit of the ADA complex, termed Ahc2, and characterized Sgf29 as an ADA family protein present in all Gcn5 HAT complexes. Compared with other structural studies, which mapped 9 of the 19 known SAGA subunits using single EM reconstruction (Wu et al, 2004) or resolved the structure of the 4 subunits of the DUB module using X-ray crystallography (Kohler et al, 2010; Samara et al, 2010), our approach is not limited to a maximum number of complex subunits. Consequently, we were able to construct a macromolecular model consisting of all 21 SAGA/ADA subunits, which bridges the gap between the previous limited EM analysis and focused X-ray crystallography analysis.
Despite the availability of several large-scale proteomics studies aiming to identify protein interactions on a global scale, little is known about how proteins interact and are organized within macromolecular complexes. Here, we describe a technique that consists of a combination of biochemistry approaches, quantitative proteomics and computational methods using wild-type and deletion strains to investigate the organization of proteins within macromolecular protein complexes. We applied this technique to determine the organization of two well-studied complexes, Spt–Ada–Gcn5 histone acetyltransferase (SAGA) and ADA, for which no comprehensive high-resolution structures exist. This approach revealed that SAGA/ADA is composed of five distinct functional modules, which can persist separately. Furthermore, we identified a novel subunit of the ADA complex, termed Ahc2, and characterized Sgf29 as an ADA family protein present in all Gcn5 histone acetyltransferase complexes. Finally, we propose a model for the architecture of the SAGA and ADA complexes, which predicts novel functional associations within the SAGA complex and provides mechanistic insights into phenotypical observations in SAGA mutants.
doi:10.1038/msb.2011.40
PMCID: PMC3159981  PMID: 21734642
ADA; architecture; protein interaction network; quantitative proteomics; SAGA
3.  Metamotifs - a generative model for building families of nucleotide position weight matrices 
BMC Bioinformatics  2010;11:348.
Background
Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence.
Results
We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain.
Conclusions
We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.
doi:10.1186/1471-2105-11-348
PMCID: PMC2906491  PMID: 20579334
4.  MYBS: a comprehensive web server for mining transcription factor binding sites in yeast 
Nucleic Acids Research  2007;35(Web Server issue):W221-W226.
Correct interactions between transcription factors (TFs) and their binding sites (TFBSs) are of central importance to gene regulation. Recently developed chromatin-immunoprecipitation DNA chip (ChIP-chip) techniques and the phylogenetic footprinting method provide ways to identify TFBSs with high precision. In this study, we constructed a user-friendly interactive platform for dynamic binding site mapping using ChIP-chip data and phylogenetic footprinting as two filters. MYBS (Mining Yeast Binding Sites) is a comprehensive web server that integrates an array of both experimentally verified and predicted position weight matrixes (PWMs) from eleven databases, including 481 binding motif consensus sequences and 71 PWMs that correspond to 183 TFs. MYBS users can search within this platform for motif occurrences (possible binding sites) in the promoters of genes of interest via simple motif or gene queries in conjunction with the above two filters. In addition, MYBS enables users to visualize in parallel the potential regulators for a given set of genes, a feature useful for finding potential regulatory associations between TFs. MYBS also allows users to identify target gene sets of each TF pair, which could be used as a starting point for further explorations of TF combinatorial regulation. MYBS is available at http://cg1.iis.sinica.edu.tw/~mybs/.
doi:10.1093/nar/gkm379
PMCID: PMC1933147  PMID: 17537814
5.  Alteration of the carboxyl-terminal domain of Ada protein influences its inducibility, specificity, and strength as a transcriptional activator. 
Journal of Bacteriology  1988;170(11):5263-5271.
The ada gene of Escherichia coli K-12 encodes the regulatory protein for the adaptive response to alkylating agents. A set of plasmids carrying ordered deletions from the 3' end of the ada gene were isolated and characterized. These ada deletions encode fusion proteins that derive their amino termini from ada and their carboxyl termini from the downstream vector sequence that occurs before an in-frame stop codon. Several of these ada deletions encode Ada derivatives that constitutively activate ada transcription to very high levels. A second class of ada deletions encode Ada derivatives that are dominant inhibitors of the inducible transcription of ada but are inducible activators of alkA transcription. In addition, we found that two Ada derivatives containing the same ada sequences but fused to different vector-derived tails have strikingly different properties. One Ada derivative constitutively activates both ada and alkA expression to very high levels. In contrast, the other Ada derivative is an inducible activator of ada expression, like the wild-type Ada protein, but is not an inducible activator of alkA transcription. Our data suggest that the carboxyl terminus of the Ada protein plays a key role in modulating the ability of the Ada protein to function as a transcriptional activator.
PMCID: PMC211600  PMID: 3141384
6.  Increasing Coverage of Transcription Factor Position Weight Matrices through Domain-level Homology 
PLoS ONE  2012;7(8):e42779.
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.
By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.
The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at http://dodoma.systemsbiology.netdodoma.systemsbiology.net.
doi:10.1371/journal.pone.0042779
PMCID: PMC3428306  PMID: 22952610
7.  Identification of human proteins functionally conserved with the yeast putative adaptors ADA2 and GCN5. 
Molecular and Cellular Biology  1996;16(2):593-602.
Transcriptional adaptor proteins are required for full function of higher eukaryotic acidic activators in the yeast Saccharomyces cerevisiae, suggesting that this pathway of activation is evolutionarily conserved. Consistent with this view, we have identified possible human homologs of yeast ADA2 (yADA2) and yeast GCN5 (yGCN5), components of a putative adaptor complex. While there is overall sequence similarity between the yeast and human proteins, perhaps more significant is conservation of key sequence features with other known adaptors. We show several functional similarities between the human and yeast adaptors. First, as shown for yADA2 and yGCN5, human ADA2 (hADA2) and human GCN5 (hGCN5) interacted in vivo in a yeast two-hybrid assay. Moreover, hGCN5 interacted with yADA2 in this assay, suggesting that the human proteins form similar complexes. Second, both yADA2 and hADA2 contain cryptic activation domains. Third, hGCN5 and yGCN5 had similar stabilizing effects on yADA2 in vivo. Furthermore, the region of yADA2 that interacted with yGCN5 mapped to the amino terminus of yADA2, which is highly conserved in hADA2. Most striking, is the behavior of the human proteins in human cells. First, GAL4-hADA2 activated transcription in HeLa cells, and second, either hADA2 or hGCN5 augmented GAL4-VP16 activation. These data indicated that the human proteins correspond to functional homologs of the yeast adaptors, suggesting that these cofactors play a key role in transcriptional activation.
PMCID: PMC231038  PMID: 8552087
8.  PiDNA: predicting protein–DNA interactions with structural models 
Nucleic Acids Research  2013;41(Web Server issue):W523-W530.
Predicting binding sites of a transcription factor in the genome is an important, but challenging, issue in studying gene regulation. In the past decade, a large number of protein–DNA co-crystallized structures available in the Protein Data Bank have facilitated the understanding of interacting mechanisms between transcription factors and their binding sites. Recent studies have shown that both physics-based and knowledge-based potential functions can be applied to protein–DNA complex structures to deliver position weight matrices (PWMs) that are consistent with the experimental data. To further use the available structural models, the proposed Web server, PiDNA, aims at first constructing reliable PWMs by applying an atomic-level knowledge-based scoring function on numerous in silico mutated complex structures, and then using the PWM constructed by the structure models with small energy changes to predict the interaction between proteins and DNA sequences. With PiDNA, the users can easily predict the relative preference of all the DNA sequences with limited mutations from the native sequence co-crystallized in the model in a single run. More predictions on sequences with unlimited mutations can be realized by additional requests or file uploading. Three types of information can be downloaded after prediction: (i) the ranked list of mutated sequences, (ii) the PWM constructed by the favourable mutated structures, and (iii) any mutated protein–DNA complex structure models specified by the user. This study first shows that the constructed PWMs are similar to the annotated PWMs collected from databases or literature. Second, the prediction accuracy of PiDNA in detecting relatively high-specificity sites is evaluated by comparing the ranked lists against in vitro experiments from protein-binding microarrays. Finally, PiDNA is shown to be able to select the experimentally validated binding sites from 10 000 random sites with high accuracy. With PiDNA, the users can design biological experiments based on the predicted sequence specificity and/or request mutated structure models for further protein design. As well, it is expected that PiDNA can be incorporated with chromatin immunoprecipitation data to refine large-scale inference of in vivo protein–DNA interactions. PiDNA is available at: http://dna.bime.ntu.edu.tw/pidna.
doi:10.1093/nar/gkt388
PMCID: PMC3692134  PMID: 23703214
9.  Discovering Motifs in Ranked Lists of DNA Sequences 
PLoS Computational Biology  2007;3(3):e39.
Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.
Author Summary
A computational problem with many applications in molecular biology is to identify short DNA sequence patterns (motifs) that are significantly overrepresented in a target set of genomic sequences relative to a background set of genomic sequences. One example is a target set that contains DNA sequences to which a specific transcription factor protein was experimentally measured as bound while the background set contains sequences to which the same transcription factor was not bound. Overrepresented sequence motifs in the target set may represent a subsequence that is molecularly recognized by the transcription factor. An inherent limitation of the above formulation of the problem lies in the fact that in many cases data cannot be clearly partitioned into distinct target and background sets in a biologically justified manner. We describe a statistical framework for discovering motifs in a list of genomic sequences that are ranked according to a biological parameter or measurement (e.g., transcription factor to sequence binding measurements). Our approach circumvents the need to partition the data into target and background sets using arbitrarily set parameters. The framework is implemented in a software tool called DRIM. The application of DRIM led to the identification of novel putative transcription factor binding sites in yeast and to the discovery of previously unknown motifs in CpG methylation regions in human cancer cell lines.
doi:10.1371/journal.pcbi.0030039
PMCID: PMC1829477  PMID: 17381235
10.  Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles 
PLoS ONE  2011;6(9):e24210.
Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions.
doi:10.1371/journal.pone.0024210
PMCID: PMC3166302  PMID: 21912677
11.  Activity of the adenosine deaminase promoter in transgenic mice. 
Nucleic Acids Research  1988;16(21):10083-10097.
The promoter of the human gene for adenosine deaminase (ADA) is extremely G/C-rich, contains several G/C-box motifs (GGGCGGG) and lacks any apparent TATA or CAAT boxes. These features are commonly found in promoters of genes that lack a strong tissue specificity, and are referred to as "housekeeping genes". Like other housekeeping genes, the ADA gene is expressed in all tissues. However, there is a considerable variation in the levels of expression of the ADA protein in different tissues. In order to study the activity of the ADA promoter, transgenic mice were generated that harbor a chimeric gene composed of the ADA promoter linked to a reporter gene encoding the bacterial enzyme Chloramphenicol Acetyl Transferase (CAT). These mice reproducibly showed CAT expression in all tissues examined, including the hemopoietic organs (spleen, thymus and bone marrow). However, examination of the actual cell types expressing the CAT gene revealed the ADA promoter to be inactive in the hemopoietic cells. This was substantiated by a transplantation experiment in which bone marrow from ADA-CAT transgenic mice was used to reconstitute the hemopoietic compartment of lethally irradiated mice. The engrafted recipients revealed strongly reduced CAT activity in their hemopoietic organs. The lack of expression in hemopoietic cells was further shown to be correlated with a hypermethylated state of the transgene. Combined, our data suggest that the ADA promoter sequences tested can direct expression in a wide variety of tissues as expected for a regular housekeeping gene promoter. However, the activity of the ADA promoter fragment did not reflect the tissue-specific variations in expression levels of the endogenous ADA gene. Additionally, regulatory elements are needed for expression in the hemopoietic cells.
Images
PMCID: PMC338838  PMID: 3057438
12.  Linear fuzzy gene network models obtained from microarray data by exhaustive search 
BMC Bioinformatics  2004;5:108.
Background
Recent technological advances in high-throughput data collection allow for experimental study of increasingly complex systems on the scale of the whole cellular genome and proteome. Gene network models are needed to interpret the resulting large and complex data sets. Rationally designed perturbations (e.g., gene knock-outs) can be used to iteratively refine hypothetical models, suggesting an approach for high-throughput biological system analysis. We introduce an approach to gene network modeling based on a scalable linear variant of fuzzy logic: a framework with greater resolution than Boolean logic models, but which, while still semi-quantitative, does not require the precise parameter measurement needed for chemical kinetics-based modeling.
Results
We demonstrated our approach with exhaustive search for fuzzy gene interaction models that best fit transcription measurements by microarray of twelve selected genes regulating the yeast cell cycle. Applying an efficient, universally applicable data normalization and fuzzification scheme, the search converged to a small number of models that individually predict experimental data within an error tolerance. Because only gene transcription levels are used to develop the models, they include both direct and indirect regulation of genes.
Conclusion
Biological relationships in the best-fitting fuzzy gene network models successfully recover direct and indirect interactions predicted from previous knowledge to result in transcriptional correlation. Fuzzy models fit on one yeast cell cycle data set robustly predict another experimental data set for the same system. Linear fuzzy gene networks and exhaustive rule search are the first steps towards a framework for an integrated modeling and experiment approach to high-throughput "reverse engineering" of complex biological systems.
doi:10.1186/1471-2105-5-108
PMCID: PMC514698  PMID: 15304201
13.  Toward an Integrated Model of Capsule Regulation in Cryptococcus neoformans 
PLoS Pathogens  2011;7(12):e1002411.
Cryptococcus neoformans is an opportunistic fungal pathogen that causes serious human disease in immunocompromised populations. Its polysaccharide capsule is a key virulence factor which is regulated in response to growth conditions, becoming enlarged in the context of infection. We used microarray analysis of cells stimulated to form capsule over a range of growth conditions to identify a transcriptional signature associated with capsule enlargement. The signature contains 880 genes, is enriched for genes encoding known capsule regulators, and includes many uncharacterized sequences. One uncharacterized sequence encodes a novel regulator of capsule and of fungal virulence. This factor is a homolog of the yeast protein Ada2, a member of the Spt-Ada-Gcn5 Acetyltransferase (SAGA) complex that regulates transcription of stress response genes via histone acetylation. Consistent with this homology, the C. neoformans null mutant exhibits reduced histone H3 lysine 9 acetylation. It is also defective in response to a variety of stress conditions, demonstrating phenotypes that overlap with, but are not identical to, those of other fungi with altered SAGA complexes. The mutant also exhibits significant defects in sexual development and virulence. To establish the role of Ada2 in the broader network of capsule regulation we performed RNA-Seq on strains lacking either Ada2 or one of two other capsule regulators: Cir1 and Nrg1. Analysis of the results suggested that Ada2 functions downstream of both Cir1 and Nrg1 via components of the high osmolarity glycerol (HOG) pathway. To identify direct targets of Ada2, we performed ChIP-Seq analysis of histone acetylation in the Ada2 null mutant. These studies supported the role of Ada2 in the direct regulation of capsule and mating responses and suggested that it may also play a direct role in regulating capsule-independent antiphagocytic virulence factors. These results validate our experimental approach to dissecting capsule regulation and provide multiple targets for future investigation.
Author Summary
Cryptococcus neoformans is a fungal pathogen that causes serious disease in immunocompromised individuals, killing over 600,000 people per year worldwide. A major factor in the ability of this microbe to cause disease is an extensive polysaccharide capsule that surrounds the cell and interferes with the host immune response to infection. This capsule expands dramatically in certain growth conditions, including those found in the mammalian host. We grew cells in multiple conditions and assessed gene expression and capsule size. This allowed us to identify a ‘transcriptional signature’ of genes whose expression correlates with capsule size; we speculated that a subset of these genes acts in capsule regulation. To test this hypothesis, we characterized one previously unstudied gene in this signature and found it to be a novel regulator of capsule expansion, fungal virulence, and mating. This gene encodes cryptococcal Ada2, a well-conserved protein that regulates genes involved in stress response and development. We used phenotypic analysis, RNA sequencing, and chromatin-immunoprecipitation sequencing (ChIP-Seq) to situate Ada2 in the complex network of genes that regulate capsule and other cryptococcal virulence factors. This approach, which yielded insights into the regulation of a critical fungal virulence factor, is applicable to similar questions in other pathogens.
doi:10.1371/journal.ppat.1002411
PMCID: PMC3234223  PMID: 22174677
14.  Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression 
BMC Genomics  2004;5:16.
Background
Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions.
Results
We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI.
Conclusion
Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions.
doi:10.1186/1471-2164-5-16
PMCID: PMC375527  PMID: 15053842
promoter; tissue-specific gene expression; position weight matrix; regulatory motif
15.  The Next Generation of Transcription Factor Binding Site Prediction 
PLoS Computational Biology  2013;9(9):e1003214.
Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
Author Summary
Transcription factors are critical proteins for sequence-specific control of transcriptional regulation. Finding where these proteins bind to DNA is of key importance for global efforts to decipher the complex mechanisms of gene regulation. Greater understanding of the regulation of transcription promises to improve human genetic analysis by specifying critical gene components that have eluded investigators. Classically, computational prediction of transcription factor binding sites (TFBS) is based on models giving weights to each nucleotide at each position. We introduce a novel statistical model for the prediction of TFBS tolerant of a broader range of TFBS configurations than can be conveniently accommodated by existing methods. The new models are designed to address the confounding properties of nucleotide composition, inter-positional sequence dependence and variable lengths (e.g. variable spacing between half-sites) observed in the more comprehensive experimental data now emerging. The new models generate scores consistent with DNA-protein affinities measured experimentally and can be represented graphically, retaining desirable attributes of past methods. It demonstrates the capacity of the new approach to accurately assess DNA-protein interactions. With the rich experimental data generated from chromatin immunoprecipitation experiments, a greater diversity of TFBS properties has emerged that can now be accommodated within a single predictive approach.
doi:10.1371/journal.pcbi.1003214
PMCID: PMC3764009  PMID: 24039567
16.  Tissue-specific prediction of directly regulated genes 
Bioinformatics  2011;27(17):2354-2360.
Direct binding by a transcription factor (TF) to the proximal promoter of a gene is a strong evidence that the TF regulates the gene. Assaying the genome-wide binding of every TF in every cell type and condition is currently impractical. Histone modifications correlate with tissue/cell/condition-specific (‘tissue specific’) TF binding, so histone ChIP-seq data can be combined with traditional position weight matrix (PWM) methods to make tissue-specific predictions of TF–promoter interactions.
Results: We use supervised learning to train a naïve Bayes predictor of TF–promoter binding. The predictor's features are the histone modification levels and a PWM-based score for the promoter. Training and testing uses sets of promoters labeled using TF ChIP-seq data, and we use cross-validation on 23 such datasets to measure the accuracy. A PWM+histone naïve Bayes predictor using a single histone modification (H3K4me3) is substantially more accurate than a PWM score or a conservation-based score (phylogenetic motif model). The naïve Bayes predictor is more accurate (on average) at all sensitivity levels, and makes only half as many false positive predictions at sensitivity levels from 10% to 80%. On average, it correctly predicts 80% of bound promoters at a false positive rate of 20%. Accuracy does not diminish when we test the predictor in a different cell type (and species) from training. Accuracy is barely diminished even when we train the predictor without using TF ChIP-seq data.
Availability: Our tissue-specific predictor of promoters bound by a TF is called Dr Gene and is available at http://bioinformatics.org.au/drgene.
Contact: t.bailey@imb.uq.edu.au
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr399
PMCID: PMC3157924  PMID: 21724591
17.  The Drosophila Histone Acetyltransferase Gcn5 and Transcriptional Adaptor Ada2a Are Involved in Nucleosomal Histone H4 Acetylation▿  
Molecular and Cellular Biology  2006;26(24):9413-9423.
The histone acetyltransferase (HAT) Gcn5 plays a role in chromatin structure and gene expression regulation as a catalytic component of multiprotein complexes, some of which also contain Ada2-type transcriptional coactivators. Data obtained mostly from studies on yeast (Saccharomyces cerevisiae) suggest that Ada2 potentiates Gcn5 activity and substrate recognition. dAda2b, one of two related Ada2 proteins of Drosophila melanogaster, was recently found to play a role in complexes acetylating histone 3 (H3). Evidence of an in vivo functional link between the related coactivator dAda2a and dGcn5, however, is lacking. Here we present data on the genetic interaction of dGcn5 and dAda2a. The loss of either dGcn5 or dAda2a function results in similar chromosome structural and developmental defects. In dAda2a mutants, the nucleosomal H4 acetylation at lysines 12 and 5 is significantly reduced, while the acetylation established by dAda2b-containing Gcn5 complexes at H3 lysines 9 and 14 is unaffected. The data presented here, together with our earlier data on the function of dAda2b, provide evidence that related Ada2 proteins of Drosophila, together with Gcn5 HAT, are involved in the acetylation of specific lysine residues in the N-terminal tails of nucleosomal H3 and H4. Our data suggest dAda2a involvement in both uniformly distributed H4 acetylation and gene-specific transcription regulation.
doi:10.1128/MCB.01401-06
PMCID: PMC1698533  PMID: 17030603
18.  Rule-Based Cell Systems Model of Aging using Feedback Loop Motifs Mediated by Stress Responses 
PLoS Computational Biology  2010;6(6):e1000820.
Investigating the complex systems dynamics of the aging process requires integration of a broad range of cellular processes describing damage and functional decline co-existing with adaptive and protective regulatory mechanisms. We evolve an integrated generic cell network to represent the connectivity of key cellular mechanisms structured into positive and negative feedback loop motifs centrally important for aging. The conceptual network is casted into a fuzzy-logic, hybrid-intelligent framework based on interaction rules assembled from a priori knowledge. Based upon a classical homeostatic representation of cellular energy metabolism, we first demonstrate how positive-feedback loops accelerate damage and decline consistent with a vicious cycle. This model is iteratively extended towards an adaptive response model by incorporating protective negative-feedback loop circuits. Time-lapse simulations of the adaptive response model uncover how transcriptional and translational changes, mediated by stress sensors NF-κB and mTOR, counteract accumulating damage and dysfunction by modulating mitochondrial respiration, metabolic fluxes, biosynthesis, and autophagy, crucial for cellular survival. The model allows consideration of lifespan optimization scenarios with respect to fitness criteria using a sensitivity analysis. Our work establishes a novel extendable and scalable computational approach capable to connect tractable molecular mechanisms with cellular network dynamics underlying the emerging aging phenotype.
Author Summary
The global process of aging disturbs a broad range of cellular mechanisms in a complex fashion and is not well understood. One important goal of computational approaches in aging is to develop integrated models in terms of a unifying aging theory, predicting progression of aging phenotypes grounded on molecular mechanisms. However, current experimental data incoherently reflects many isolated processes from a large diversity of approaches, biological model systems, and species, which makes such integration a challenging task. In an attempt to close this gap, we iteratively develop a fuzzy-logic cell systems model considering the interplay of damage, metabolism, and signaling by positive and negative feedback-loop motifs using relationships drawn from literature data. Because cellular biodynamics may be considered a complex control system, this approach seems particularly suitable. Here, we demonstrate that rule-based fuzzy-logic models provide semi-quantitative predictions that enhance our understanding of complex and interlocked molecular mechanisms and their implications on the aging physiome.
doi:10.1371/journal.pcbi.1000820
PMCID: PMC2887462  PMID: 20585546
19.  Human ADA3 regulates RARα transcriptional activity through direct contact between LxxLL motifs and the receptor coactivator pocket 
Nucleic Acids Research  2010;38(16):5291-5303.
The alternation/deficiency in activation-3 (ADA3) is an essential component of the human p300/CBP-associated factor (PCAF) and yeast Spt-Ada-Gcn5-acetyltransferase (SAGA) histone acetyltransferase complexes. These complexes facilitate transactivation of target genes by association with transcription factors and modification of local chromatin structure. It is known that the yeast ADA3 is required for nuclear receptor (NR)-mediated transactivation in yeast cells; however, the role of mammalian ADA3 in NR signaling remains elusive. In this study, we have investigated how the human (h) ADA3 regulates retinoic acid receptor (RAR) α-mediated transactivation. We show that hADA3 interacts directly with RARα in a hormone-dependent manner and this interaction contributes to RARα transactivation. Intriguingly, this interaction involves classical LxxLL motifs in hADA3, as demonstrated by both ‘loss’ and ‘gain’ of function mutations, as well as a functional coactivator pocket of the receptor. Additionally, we show that hADA3 associates with RARα target gene promoter in a hormone-dependent manner and ADA3 knockdown impairs RARβ2 expression. Furthermore, a structural model was established to illustrate an interaction network within the ADA3/RARα complex. These results suggest that hADA3 is a bona fide transcriptional coactivator for RARα, acting through a conserved mechanism involving direct contacts between NR boxes and the receptor’s co-activator pocket.
doi:10.1093/nar/gkq269
PMCID: PMC2938230  PMID: 20413580
20.  TRStalker: an efficient heuristic for finding fuzzy tandem repeats 
Bioinformatics  2010;26(12):i358-i366.
Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events.
Results: We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the ‘generalized median string’ that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences.
Availability: TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it.
Contact: marco.pellegrini@iit.cnr.it
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq209
PMCID: PMC2881393  PMID: 20529928
21.  Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data 
BMC Genomics  2014;15(1):80.
Background
ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models.
Results
Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets.
Conclusions
The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-80) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-80
PMCID: PMC4234207  PMID: 24472686
ChIP-Seq; EMSA; Transcription factor binding sites; FoxA; SiteGA; PWM; Transcription factor binding model; Dinucleotide frequencies
22.  Transcriptional adaptor and histone acetyltransferase proteins in Arabidopsis and their interactions with CBF1, a transcriptional activator involved in cold-regulated gene expression 
Nucleic Acids Research  2001;29(7):1524-1533.
The Arabidopsis CBF transcriptional activators bind to the CRT/DRE regulatory element present in the promoters of many cold-regulated genes and stimulate their transcription. Expression of the CBF1 proteins in yeast activates reporter genes carrying a minimal promoter with the CRT/DRE as an upstream regulatory element. Here we report that this ability of CBF1 is dependent upon the activities of three key components of the yeast Ada and SAGA complexes, namely the histone acetyltransferase (HAT) Gcn5 and the transcriptional adaptor proteins Ada2 and Ada3. This result suggested that CBF1 might function through the action of similar complexes in Arabidopsis. In support of this hypothesis we found that Arabidopsis has a homolog of the GCN5 gene and two homologs of ADA2, the first report of multiple ADA2 genes in an organism. The Arabidopsis GCN5 protein has intrinsic HAT activity and can physically interact in vitro with both the Arabidopsis ADA2a and ADA2b proteins. In addition, the CBF1 transcriptional activator can interact with the Arabidopsis GCN5 and ADA2 proteins. We conclude that Arabidopsis encodes HAT-containing adaptor complexes that are related to the Ada and SAGA complexes of yeast and propose that the CBF1 transcriptional activator functions through the action of one or more of these complexes.
PMCID: PMC31267  PMID: 11266554
23.  ADA1, a novel component of the ADA/GCN5 complex, has broader effects than GCN5, ADA2, or ADA3. 
Molecular and Cellular Biology  1997;17(6):3220-3228.
The ADA genes encode factors which are proposed to function as transcriptional coactivators. Here we describe the cloning, sequencing, and initial characterization of a novel ADA gene, ADA1. Similar to the previously isolated ada mutants, ada1 mutants display decreases in transcription from various reporters. Furthermore, ADA1 interacts with the other ADAs in the ADA/GCN5 complex as demonstrated by partial purification of the complex and immunoprecipitation experiments. We estimate that the complex has a molecular mass of approximately 2 MDa. Previously, it had been demonstrated that ada5 mutants displayed more severe phenotypic defects than the other ada mutants (G. A. Marcus, J. Horiuchi, N. Silverman, and L. Guarente, Mol. Cell. Biol. 16:3197-3205, 1996; S. M. Roberts and F. Winston, Mol. Cell. Biol. 16:3206-3213, 1996). ada1 mutants display defects similar to those of ada5 mutants and different from those of the other mutants with respect to promoters affected, inositol auxotrophy, and Spt- phenotypes. Thus, the ADAs can be separated into two classes, suggesting that the ADA/GCN5 complex may have two separate functions. We present a speculative model on the possible roles of the ADA/GCN5 complex.
PMCID: PMC232175  PMID: 9154821
24.  Optimized Position Weight Matrices in Prediction of Novel Putative Binding Sites for Transcription Factors in the Drosophila melanogaster Genome 
PLoS ONE  2013;8(8):e68712.
Position weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. DNA-binding proteins often show degeneracy in their binding requirement and thus the overall binding specificity of many proteins is unknown and remains an active area of research. Although existing PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. Our previous study introduced a promising approach to PWM refinement in which known motifs are used to computationally mine putative binding sites directly from aligned promoter regions using composition of similar sites. In the present study, we extended this technique originally tested on single examples of transcription factors (TFs) and showed its capability to optimize PWM performance to predict new binding sites in the fruit fly genome. We propose refined PWMs in mono- and dinucleotide versions similarly computed for a large variety of transcription factors of Drosophila melanogaster. Along with the addition of many auxiliary sites the optimization includes variation of the PWM motif length, the binding sites location on the promoters and the PWM score threshold. To assess the predictive performance of the refined PWMs we compared them to conventional TRANSFAC and JASPAR sources. The results have been verified using performed tests and literature review. Overall, the refined PWMs containing putative sites derived from real promoter content processed using optimized parameters had better general accuracy than conventional PWMs.
doi:10.1371/journal.pone.0068712
PMCID: PMC3735551  PMID: 23936309
25.  The STAGA Subunit ADA2b Is an Important Regulator of Human GCN5 Catalysis▿ †  
Molecular and Cellular Biology  2008;29(1):266-280.
Human STAGA is a multisubunit transcriptional coactivator containing the histone acetyltransferase GCN5L. Previous studies of the related yeast SAGA complex have shown that the yeast Gcn5, Ada2, and Ada3 components form a heterotrimer that is important for the enzymatic function of SAGA. Here, we report that ADA2a and ADA2b, two human homologues of yeast Ada2, each have the ability to form a heterotrimer with ADA3 and GCN5L but that only the ADA2b homologue is found in STAGA. By comparing the patterns of acetylation of several substrates, we found context-dependent requirements for ADA2b and ADA3 for the efficient acetylation of histone tails by GCN5. With human proteins, unlike yeast proteins, the acetylation of free core histones by GCN5 is unaffected by ADA2b or ADA3. In contrast, the acetylation of mononucleosomal substrates by GCN5 is enhanced by ADA2b, with no significant additional effect of ADA3, and the efficient acetylation of nucleosomal arrays (chromatin) by GCN5 requires both ADA2b and ADA3. Thus, ADA2b and ADA3 appear to act at two different levels of histone organization within chromatin to facilitate GCN5 function. Interestingly, although ADA2a forms a complex(es) with GCN5 and ADA3 both in vitro and in vivo, ADA2a-containing complexes are unable to acetylate nucleosomal H3. We have also shown the preferential recruitment of ADA2b, relative to ADA2a, to p53-dependent genes. This finding indicates that the previously demonstrated presence and function of GCN5 on these promoters reflect the action of STAGA and that the ADA2a and ADA2b paralogues have nonredundant functional roles.
doi:10.1128/MCB.00315-08
PMCID: PMC2612497  PMID: 18936164

Results 1-25 (1490397)