Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  Increasing Coverage of Transcription Factor Position Weight Matrices through Domain-level Homology 
PLoS ONE  2012;7(8):e42779.
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.
By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.
The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at
PMCID: PMC3428306  PMID: 22952610
2.  A data integration framework for prediction of transcription factor targets: a BCL6 case study 
We present a computational framework for predicting targets of transcription factor regulation. The framework is based on the integration of a number of sources of evidence, derived from DNA sequence and gene expression data, using a weighted sum approach. Sources of evidence are prioritized based on a training set, and their relative contributions are then optimized. The performance of the proposed framework is demonstrated in the context of BCL6 target prediction. We show that this framework is able to uncover BCL6 targets reliably when biological prior information is utilized effectively, particularly in the case of sequence analysis. The framework results in a considerable gain in performance over scores in which sequence information was not incorporated. This analysis shows that with assessment of the quality and biological relevance of the data, reliable predictions can be obtained with this computational framework.
PMCID: PMC2771581  PMID: 19348642
network inference; transcription factor binding site prediction; data integration
3.  Identification of Tuberculosis Susceptibility Genes with Human Macrophage Gene Expression Profiles 
PLoS Pathogens  2008;4(12):e1000229.
Although host genetics influences susceptibility to tuberculosis (TB), few genes determining disease outcome have been identified. We hypothesized that macrophages from individuals with different clinical manifestations of Mycobacterium tuberculosis (Mtb) infection would have distinct gene expression profiles and that polymorphisms in these genes may also be associated with susceptibility to TB. We measured gene expression levels of >38,500 genes from ex vivo Mtb-stimulated macrophages in 12 subjects with 3 clinical phenotypes: latent, pulmonary, and meningeal TB (n = 4 per group). After identifying differentially expressed genes, we confirmed these results in 34 additional subjects by real-time PCR. We also used a case-control study design to examine whether polymorphisms in differentially regulated genes were associated with susceptibility to these different clinical forms of TB. We compared gene expression profiles in Mtb-stimulated and unstimulated macrophages and identified 1,608 and 199 genes that were differentially expressed by >2- and >5-fold, respectively. In an independent sample set of 34 individuals and a subset of highly regulated genes, 90% of the microarray results were confirmed by RT-PCR, including expression levels of CCL1, which distinguished the 3 clinical groups. Furthermore, 6 single nucleotide polymorphisms (SNPs) in CCL1 were found to be associated with TB in a case-control genetic association study with 273 TB cases and 188 controls. To our knowledge, this is the first identification of CCL1 as a gene involved in host susceptibility to TB and the first study to combine microarray and DNA polymorphism studies to identify genes associated with TB susceptibility. These results suggest that genome-wide studies can provide an unbiased method to identify critical macrophage response genes that are associated with different clinical outcomes and that variation in innate immune response genes regulate susceptibility to TB.
Author Summary
Although TB is a leading cause of death worldwide, the vast majority of infected individuals are asymptomatic and contains the bacillus in a latent form. Among those with active disease, 80% have localized pulmonary disease and 20% have disseminated forms. TB meningitis (TBM) is the most severe form of TB with 20–25% of sufferers dying, and of the survivors, many have disability. We currently do not understand the host factors that regulate this diverse spectrum of clinical outcomes. We hypothesized that variation in innate immune gene function is an important regulator of TB clinical outcomes. We measured the mRNA expression levels of >38,500 genes in macrophages taken from people with a history of latent, pulmonary, or meningeal TB and found genes with unique activation patterns among the clinical groups. Furthermore, we studied one of these genes further and found that CCL1 polymorphisms were associated with pulmonary TB (PTB) but not other types of TB disease. To our knowledge, this is the first study to combine mRNA expression studies with genetic studies to discover a novel gene that is associated with different clinical outcomes in TB. We speculate that this approach can be used to discover novel strategies for modulating immune function to prevent adverse outcomes in TB.
PMCID: PMC2585058  PMID: 19057661
6.  Uncovering a Macrophage Transcriptional Program by Integrating Evidence from Motif Scanning and Expression Dynamics 
PLoS Computational Biology  2008;4(3):e1000021.
Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation.
Author Summary
Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available gene-knockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and co-expressed gene clusters was validated with targeted ChIP-on-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.
PMCID: PMC2265556  PMID: 18369420
7.  The Innate Immune Database (IIDB) 
BMC Immunology  2008;9:7.
As part of a National Institute of Allergy and Infectious Diseases funded collaborative project, we have performed over 150 microarray experiments measuring the response of C57/BL6 mouse bone marrow macrophages to toll-like receptor stimuli. These microarray expression profiles are available freely from our project web site . Here, we report the development of a database of computationally predicted transcription factor binding sites and related genomic features for a set of over 2000 murine immune genes of interest. Our database, which includes microarray co-expression clusters and a host of web-based query, analysis and visualization facilities, is available freely via the internet. It provides a broad resource to the research community, and a stepping stone towards the delineation of the network of transcriptional regulatory interactions underlying the integrated response of macrophages to pathogens.
We constructed a database indexed on genes and annotations of the immediate surrounding genomic regions. To facilitate both gene-specific and systems biology oriented research, our database provides the means to analyze individual genes or an entire genomic locus. Although our focus to-date has been on mammalian toll-like receptor signaling pathways, our database structure is not limited to this subject, and is intended to be broadly applicable to immunology. By focusing on selected immune-active genes, we were able to perform computationally intensive expression and sequence analyses that would currently be prohibitive if applied to the entire genome. Using six complementary computational algorithms and methodologies, we identified transcription factor binding sites based on the Position Weight Matrices available in TRANSFAC. For one example transcription factor (ATF3) for which experimental data is available, over 50% of our predicted binding sites coincide with genome-wide chromatin immnuopreciptation (ChIP-chip) results. Our database can be interrogated via a web interface. Genomic annotations and binding site predictions can be automatically viewed with a customized version of the Argo genome browser.
We present the Innate Immune Database (IIDB) as a community resource for immunologists interested in gene regulatory systems underlying innate responses to pathogens. The database website can be freely accessed at .
PMCID: PMC2268913  PMID: 18321385
8.  Prediction of phenotype and gene expression for combinations of mutations 
Molecular interactions provide paths for information flows. Genetic interactions reveal active information flows and reflect their functional consequences. We integrated these complementary data types to model the transcription network controlling cell differentiation in yeast. Genetic interactions were inferred from linear decomposition of gene expression data and were used to direct the construction of a molecular interaction network mediating these genetic effects. This network included both known and novel regulatory influences, and predicted genetic interactions. For corresponding combinations of mutations, the network model predicted quantitative gene expression profiles and precise phenotypic effects. Multiple predictions were tested and verified.
PMCID: PMC1847951  PMID: 17389876
computational biology; data integration; gene expression; genetic interaction; network model
9.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo 
Genome Biology  2006;7(5):R36.
The Inferelator, a method for deriving genome-wide transcriptional regulatory interactions, successfully predicted global expression in Halobacterium under novel perturbations.
We present a method (the Inferelator) for deriving genome-wide transcriptional regulatory interactions, and apply the method to predict a large portion of the regulatory network of the archaeon Halobacterium NRC-1. The Inferelator uses regression and variable selection to identify transcriptional influences on genes based on the integration of genome annotation and expression data. The learned network successfully predicted Halobacterium's global expression under novel perturbations with predictive power similar to that seen over training data. Several specific regulatory predictions were experimentally tested and verified.
PMCID: PMC1779511  PMID: 16686963
10.  Tools enabling the elucidation of molecular pathways active in human disease: Application to Hepatitis C virus infection 
BMC Bioinformatics  2005;6:154.
The extraction of biological knowledge from genome-scale data sets requires its analysis in the context of additional biological information. The importance of integrating experimental data sets with molecular interaction networks has been recognized and applied to the study of model organisms, but its systematic application to the study of human disease has lagged behind due to the lack of tools for performing such integration.
We have developed techniques and software tools for simplifying and streamlining the process of integration of diverse experimental data types in molecular networks, as well as for the analysis of these networks. We applied these techniques to extract, from genomic expression data from Hepatitis C virus-infected liver tissue, potentially useful hypotheses related to the onset of this disease. Our integration of the expression data with large-scale molecular interaction networks and subsequent analyses identified molecular pathways that appear to be induced or repressed in the response to Hepatitis C viral infection.
The methods and tools we have implemented allow for the efficient dynamic integration and analysis of diverse data in a major human disease system. This integrated data set in turn enabled simple analyses to yield hypotheses related to the response to Hepatitis C viral infection.
PMCID: PMC1181626  PMID: 15967031
11.  Derivation of genetic interaction networks from quantitative phenotype data 
Genome Biology  2005;6(4):R38.
Genetic interaction networks were derived from quantitative phenotype data by analyzing agar-invasion phenotypes of mutant yeast strains, which showed specific modes of genetic interaction with specific biological processes.
We have generalized the derivation of genetic-interaction networks from quantitative phenotype data. Familiar and unfamiliar modes of genetic interaction were identified and defined. A network was derived from agar-invasion phenotypes of mutant yeast. Mutations showed specific modes of genetic interaction with specific biological processes. Mutations formed cliques of significant mutual information in their large-scale patterns of genetic interaction. These local and global interaction patterns reflect the effects of gene perturbations on biological processes and pathways.
PMCID: PMC1088966  PMID: 15833125

Results 1-11 (11)