PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-5 (5)
 

Clipboard (0)
None
Journals
Authors
more »
Year of Publication
Document Types
1.  CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments 
Bioinformatics  2010;27(5):715-717.
Summary:CompleteMOTIFs (cMOTIFs) is an integrated web tool developed to facilitate systematic discovery of overrepresented transcription factor binding motifs from high-throughput chromatin immunoprecipitation experiments. Comprehensive annotations and Boolean logic operations on multiple peak locations enable users to focus on genomic regions of interest for de novo motif discovery using tools such as MEME, Weeder and ChIPMunk. The pipeline incorporates a scanning tool for known motifs from TRANSFAC and JASPAR databases, and performs an enrichment test using local or precalculated background models that significantly improve the motif scanning result. Furthermore, using the cMOTIFs pipeline, we demonstrated that multiple transcription factors could cooperatively bind to the upstream of important stem cell differentiation regulators.
Availability: http://cmotifs.tchlab.org
Contact: sekwon.kong@childrens.harvard.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq707
PMCID: PMC3105477  PMID: 21183585
2.  CrossChip: a system supporting comparative analysis of different generations of Affymetrix arrays 
Summary
To increase compatibility between different generations of Affymetrix GeneChip arrays, we propose a method of filtering probes based on their sequences. Our method is implemented as a web-based service for downloading necessary materials for converting the raw data files (*.CEL) for comparative analysis. The user can specify the appropriate level of filtering by setting the criteria for the minimum overlap length between probe sequences and the minimum number of usable probe pairs per probe set. Our website supports a within-species comparison for human and mouse GeneChip arrays.
doi:10.1093/bioinformatics/bti288
PMCID: PMC2819168  PMID: 15684227
3.  A Multivariate Approach for Integrating Genome-wide Expression Data and Biological Knowledge 
Motivation
Several statistical methods that combine analysis of differential gene expression with biological knowledge databases have been proposed for a more rapid interpretation of expression data. However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions.
Results
We present a simple yet effective multivariate statistical procedure for assessing the correlation between a subspace defined by a group of genes and a binary phenotype. A subspace is deemed significant if the samples corresponding to different phenotypes are well separated in that subspace. The separation is measured using Hotelling’s T2 statistic, which captures the covariance structure of the subspace. When the dimension of the subspace is larger than that of the sample space, we project the original data to a smaller orthonormal subspace. We use this method to search through functional pathway subspaces defined by Reactome, KEGG, BioCarta, and Gene Ontology. To demonstrate its performance, we apply this method to the data from two published studies, and we visualize the results in the principal component space.
doi:10.1093/bioinformatics/btl401
PMCID: PMC2813864  PMID: 16877751
4.  Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance 
Bioinformatics  2009;25(23):3121-3127.
Motivation: Type 2 diabetes is a chronic metabolic disease that involves both environmental and genetic factors. To understand the genetics of type 2 diabetes and insulin resistance, the DIabetes Genome Anatomy Project (DGAP) was launched to profile gene expression in a variety of related animal models and human subjects. We asked whether these heterogeneous models can be integrated to provide consistent and robust biological insights into the biology of insulin resistance.
Results: We perform integrative analysis of the 16 DGAP data sets that span multiple tissues, conditions, array types, laboratories, species, genetic backgrounds and study designs. For each data set, we identify differentially expressed genes compared with control. Then, for the combined data, we rank genes according to the frequency with which they were found to be statistically significant across data sets. This analysis reveals RetSat as a widely shared component of mechanisms involved in insulin resistance and sensitivity and adds to the growing importance of the retinol pathway in diabetes, adipogenesis and insulin resistance. Top candidates obtained from our analysis have been confirmed in recent laboratory studies.
Contact: Isaac_kohane@harvard.edu
doi:10.1093/bioinformatics/btp559
PMCID: PMC2778339  PMID: 19786482
5.  Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes 
Bioinformatics (Oxford, England)  2008;24(7):889-896.
Motivation
DNA copy number aberrations (CNAs) and gene expression (GE) changes provide valuable information for studying chromosomal instability and its consequences in cancer. While it is clear that the structural aberrations and the transcript levels are intertwined, their relationship is more complex and subtle than initially suspected. Most studies so far have focused on how a CNA affects the expression levels of those genes contained within that CNA.
Results
To better understand the impact of CNAs on expression, we investigated the correlation of each CNA to all other genes in the genome. The correlations are computed over multiple patients that have both expression and copy number measurements in brain, bladder, and breast cancer data sets. We find that a CNA has a direct impact on the gene amplified or deleted, but it also has a broad, indirect impact elsewhere. To identify a set of CNAs that is coordinately associated with the expression changes of a set of genes, we used a biclustering algorithm on the correlation matrix. For each of the three cancer types examined, the aberrations in several loci are associated with cancer-type specific biological pathways that have been described in the literature: CNAs of chromosome (chr) 7p13 were significantly correlated with epidermal growth factor receptor signaling pathway in glioblastoma multiforme, chr 13q with NF-kappaB cascades in bladder cancer, and chr 11p with Reck pathway in breast cancer. In all three data sets, gene sets related to cell cycle/division such as M phase, DNA replication, and cell division were also associated with CNAs. Our results suggest that CNAs are both directly and indirectly correlated with changes in expression and that it is beneficial to examine the indirect effects of CNAs.
doi:10.1093/bioinformatics/btn034
PMCID: PMC2600603  PMID: 18263644

Results 1-5 (5)