Search tips
Search criteria

Results 1-2 (2)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions 
BMC Bioinformatics  2007;8:481.
Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered.
To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies.
To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA.
Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies.
Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.
PMCID: PMC2265442  PMID: 18093302
2.  Recognition of interferon-inducible sites, promoters, and enhancers 
BMC Bioinformatics  2007;8:56.
Computational analysis of gene regulatory regions is important for prediction of functions of many uncharacterized genes. With this in mind, search of the target genes for interferon (IFN) induction appears of interest. IFNs are multi-functional cytokines. Their effects are immunomodulatory, antiviral, antibacterial, and antitumor. The interaction of the IFNs with their cell surface receptors produces an activation of several transcription factors. Four regulatory factors, ISGF3, STAT1, IRF1, and NF-κB, are essential for the function of the IFN system. The aim of this work is the development of computational approaches for the recognition of DNA binding sites for these factors and computer programs for the prediction of the IFN-inducible regions.
We developed computational approaches to the recognition of the binding sites for ISGF3, STAT1, IRF1, and NF-κB. Analysis of the distribution of these binding sites demonstrated that the regions -500 upstream of the transcription start site in IFN-inducible genes are enriched in putative binding sites for these transcription factors. Based on selected combinations of the sites whose frequencies were significantly higher than in the other functional gene groups, we developed methods for the prediction of the IFN-inducible promoters and enhancers. We analyzed 1004 sequences of the IFN-inducible genes compiled using microarray data analyses and also about 10,000 human gene sequences from the EPD and RefSeq databases; 74 of 1,664 human genes annotated in EPD were significantly IFN-inducible.
Analyses of several control datasets demonstrated that the developed methods have a high accuracy of prediction of the IFN-inducible genes. Application of these methods to several datasets suggested that the number of the IFN-inducible genes is approximately 1500–2000 in the human genome.
PMCID: PMC1810324  PMID: 17309789

Results 1-2 (2)