PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-9 (9)
 

Clipboard (0)
None
Journals
Year of Publication
1.  Genomic Promoter Analysis Predicts Functional Transcription Factor Binding 
Advances in bioinformatics  2008;2008:3698301-3698309.
Background
The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology.
Results
We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84%.
Conclusion
Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS. They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.
doi:10.1155/2008/369830
PMCID: PMC2768302  PMID: 19865592
2.  Genomic Promoter Analysis Predicts Functional Transcription Factor Binding 
Advances in Bioinformatics  2008;2008:369830.
Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. Results. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84%. Conclusion. Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS. They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.
doi:10.1155/2008/369830
PMCID: PMC2768302  PMID: 19865592
3.  A Pathway Analysis Tool for Analyzing Microarray Data of Species with Low Physiological Information 
Advances in Bioinformatics  2008;2008:719468.
Pathway information provides insight into the biological processes underlying microarray data. Pathway information is widely available for humans and laboratory animals in databases through the internet, but less for other species, for example, livestock. Many software packages use species-specific gene IDs that cannot handle genomics data from other species. We developed a species-independent method to search pathways databases to analyse microarray data. Three PERL scripts were developed that use the names of the genes on the microarray. (1) Add synonyms of gene names by searching the Gene Ontology (GO) database. (2) Search the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database for pathway information using this GO-enriched gene list. (3) Combine the pathway data with the microarray data and visualize the results using color codes indicating regulation. To demonstrate the power of the method, we used a previously reported chicken microarray experiment investigating line-specific reactions to Salmonella infection as an example.
doi:10.1155/2008/719468
PMCID: PMC2775695  PMID: 19920988
4.  NCR-PCOPGene: An Exploratory Tool for Analysis of Sample-Classes Effect on Gene-Expression Relationships 
Advances in Bioinformatics  2008;2008:789026.
Background. Microarray technology is so expensive and powerful that it is essential to extract maximum value from microarray data. Our tools allow researchers to test and formulate from a hypothesis to entire models. Results. The objective of the NCRPCOPGene is to study the relationships among gene expressions under different conditions, to classify these conditions, and to study their effect on the different relationships. The web application makes it easier to define the sample classes, grouping the microarray experiments either by using (a) biological, statistical, or any other previous knowledge or (b) their effect on the expression relationship maintained among specific genes of interest. By means of the type (a) class definition, the researcher can add biological information to the gene-expression relationships. The type (b) class definition allows for linking genes correlated neither linearly nor nonlinearly. Conclusions. The PCOPGene tools are especially suitable for microarrays with large sample series. This application helps to identify cellular states and the genes involved in it in a flexible way. The application takes advantage of the ability of our system to relate gene expressions; even when these relationships are noncontinuous and cannot be found using linear or nonlinear analytical methods.
doi:10.1155/2008/789026
PMCID: PMC2775662  PMID: 19920990
5.  Metagenome Fragment Classification Using N-Mer Frequency Profiles 
Advances in Bioinformatics  2008;2008:205969.
A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST's tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.
doi:10.1155/2008/205969
PMCID: PMC2777009  PMID: 19956701
6.  Comparing Quantitative Trait Loci and Gene Expression Data 
Advances in Bioinformatics  2008;2008:719818.
We develop methods to compare the positions of quantitative trait loci (QTL) with a set of genes selected by other methods, such as microarray experiments, from a sequenced genome. We apply our methods to QTL for addictive behavior in mouse, and a set of genes upregulated in a region of the brain associated with addictive behavior, the nucleus accumbens (NA). The association between the QTL and NA genes is not significantly stronger than expected by chance. However, chromosomes 2 and 16 do show strong associations suggesting that genes on these chromosomes might be associated with addictive behavior. The statistical methodology developed for this study can be applied to similar studies to assess the mutual information in microarray and QTL analyses.
doi:10.1155/2008/719818
PMCID: PMC2775685  PMID: 19920989
7.  Genevestigator V3: A Reference Expression Database for the Meta-Analysis of Transcriptomes 
Advances in Bioinformatics  2008;2008:420747.
The Web-based software tool Genevestigator provides powerful tools for biologists to explore gene expression across a wide variety of biological contexts. Its first releases, however, were limited by the scaling ability of the system architecture, multiorganism data storage and analysis capability, and availability of computationally intensive analysis methods. Genevestigator V3 is a novel meta-analysis system resulting from new algorithmic and software development using a client/server architecture, large-scale manual curation and quality control of microarray data for several organisms, and curation of pathway data for mouse and Arabidopsis. In addition to improved querying features, Genevestigator V3 provides new tools to analyze the expression of genes in many different contexts, to identify biomarker genes, to cluster genes into expression modules, and to model expression responses in the context of metabolic and regulatory networks. Being a reference expression database with user-friendly tools, Genevestigator V3 facilitates discovery research and hypothesis validation.
doi:10.1155/2008/420747
PMCID: PMC2777001  PMID: 19956698
8.  A Tutorial of the Poisson Random Field Model in Population Genetics 
Advances in Bioinformatics  2008;2008:257864.
Population genetics is the study of allele frequency changes driven by various evolutionary forces such as mutation, natural selection, and random genetic drift. Although natural selection is widely recognized as a bona-fide phenomenon, the extent to which it drives evolution continues to remain unclear and controversial. Various qualitative techniques, or so-called “tests of neutrality”, have been introduced to detect signatures of natural selection. A decade and a half ago, Stanley Sawyer and Daniel Hartl provided a mathematical framework, referred to as the Poisson random field (PRF), with which to determine quantitatively the intensity of selection on a particular gene or genomic region. The recent availability of large-scale genetic polymorphism data has sparked widespread interest in genome-wide investigations of natural selection. To that end, the original PRF model is of particular interest for geneticists and evolutionary genomicists. In this article, we will provide a tutorial of the mathematical derivation of the original Sawyer and Hartl PRF model.
doi:10.1155/2008/257864
PMCID: PMC2775679  PMID: 19920987
9.  Automated Quantitative Assessment of Proteins' Biological Function in Protein Knowledge Bases 
Advances in Bioinformatics  2008;2008:897019.
Primary protein sequence data are archived in databases together with information regarding corresponding biological functions. In this respect, UniProt/Swiss-Prot is currently the most comprehensive collection and it is routinely cross-examined when trying to unravel the biological role of hypothetical proteins. Bioscientists frequently extract single entries and further evaluate those on a subjective basis. In lieu of a standardized procedure for scoring the existing knowledge regarding individual proteins, we here report about a computer-assisted method, which we applied to score the present knowledge about any given Swiss-Prot entry. Applying this quantitative score allows the comparison of proteins with respect to their sequence yet highlights the comprehension of functional data. pfs analysis may be also applied for quality control of individual entries or for database management in order to rank entry listings.
doi:10.1155/2008/897019
PMCID: PMC2774577  PMID: 19920991

Results 1-9 (9)