Search tips
Search criteria

Results 1-15 (15)

Clipboard (0)
Year of Publication
1.  Improved breast cancer prognosis through the combination of clinical and genetic markers 
Accurate prognosis of breast cancer can spare a significant number of breast cancer patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical disease recurrence. However, these studies all attempt to develop genetic marker-based prognostic systems to replace the existing clinical criteria, while ignoring the rich information contained in established clinical markers. Given the complexity of breast cancer prognosis, a more practical strategy would be to utilize both clinical and genetic marker information that may be complementary.
A computational study is performed on publicly available microarray data, which has spawned a 70-gene prognostic signature. The recently proposed I-RELIEF algorithm is used to identify a hybrid signature through the combination of both genetic and clinical markers. A rigorous experimental protocol is used to estimate the prognostic performance of the hybrid signature and other prognostic approaches. Survival data analyses is performed to compare different prognostic approaches.
The hybrid signature performs significantly better than other methods, including the 70-gene signature, clinical makers alone and the St. Gallen consensus criterion. At the 90% sensitivity level, the hybrid signature achieves 67% specificity, as compared to 47% for the 70-gene signature and 48% for the clinical makers. The odds ratio of the hybrid signature for developing distant meta-stases within five years between the patients with a good prognosis signature and the patients with a bad prognosis is 21.0 (95% CI: 6.5–68.3), far higher than either genetic or clinical markers alone.
PMCID: PMC3431620  PMID: 17130137
2.  Comments on ‘Bayesian hierarchical error model for analysis of gene expression data’ 
Bioinformatics (Oxford, England)  2006;22(19):2446-2452.
PMCID: PMC2904753  PMID: 16731698
3.  Building chromosome-wide LD maps 
Bioinformatics (Oxford, England)  2006;22(16):1933-1934.
BMapBuilder builds maps of pairwise linkage disequilibrium (LD) in either two or three dimensions. The optimized resolution allows for graphical display of LD for single nucleotide polymorphisms (SNPs) in a whole chromosome.
PMCID: PMC2893229  PMID: 16782726
4.  Bio-Ontologies and Text: Bridging the Modeling Gap Between 
Bioinformatics (Oxford, England)  2006;22(19):2421-2429.
Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. To facilitate the computational reuse and integration of information buried in unstructured text, we propose a schema that represents a comprehensive set of biological entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides links from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination, and knowledge management of heterogeneous information. A comprehensive representation for otherwise heterogeneous datasets, such as the one proposed, are critical for advancing systems biology because they allow for acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text.
A novel representational schema, PGschema, was developed that enables translation of information in textual narratives to a well-defined data structure comprising genotypic and phenotypic concepts from established ontologies along with modifiers and relationships. Initial evaluation for coverage of a selected set of entities showed that 85% of the information could be represented. Moreover, PGschema can be realized automatically in an XML format by using natural language techniques to process the text.
PMCID: PMC2879055  PMID: 16870928
5.  NvMap: automated analysis of NMR chemical shift perturbation data 
Bioinformatics (Oxford, England)  2006;23(3):378-380.
NMR chemical shift perturbation experiments are widely used to define binding sites in biomolecular complexes. Especially in the case of high throughput screening of ligands, rapid analysis of NMR spectra is essential. NvMap extends NMRViewJ and provides a means for rapid assignments and book-keeping of NMR titration spectra. Our module offers options to analyze multiple titration spectra both separately and sequentially, where the sequential spectra are analyzed either two at a time or all simultaneously. The first option is suitable for slow or intermediate exchange rates between free and bound proteins. The latter option is particularly useful for fast exchange situations and can compensate for the lack of indicators for overlapped peaks. Our module also provides a simple user interface to automate the analysis process from dataset to peak list. We demonstrate the effectiveness of our program using NMR spectra of SUMO in complexes with three different peptides.
PMCID: PMC2862991  PMID: 17118956
6.  Ribostral: an RNA 3D alignment analyzer and viewer based on basepair isostericities 
Bioinformatics (Oxford, England)  2006;22(17):2168-2170.
RNA atomic resolution structures have revealed the existance of different families of basepair interactions, each of which with its own isosteric sub-families. Ribostral (Ribonucleic Structural Aligner) is a user-friendly framework for analyzing, evaluating and viewing RNA sequence alignments with at least one available atomic resolution structure. It is the first of its kind that makes direct and easy-to-understand superposition of the isostericity matrices of basepairs observed in the structure onto sequence alignments, easily indicating allowed and unallowed substitutions at each BP position. Potential mistakes in the alignments can then be corrected using other sequence editing software. Ribostral has been developed and tested under Windows XP, and is capable of running on any PC or MAC platform with MATLAB 7.1 (SP3) or higher installed version. A stand-alone version is also available for the PC platform.
PMCID: PMC2837919  PMID: 16820430
7.  It is time to end the patenting of software 
Bioinformatics (Oxford, England)  2006;22(12):1416-1417.
PMCID: PMC2836512  PMID: 16766564
8.  A Multivariate Approach for Integrating Genome-wide Expression Data and Biological Knowledge 
Several statistical methods that combine analysis of differential gene expression with biological knowledge databases have been proposed for a more rapid interpretation of expression data. However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions.
We present a simple yet effective multivariate statistical procedure for assessing the correlation between a subspace defined by a group of genes and a binary phenotype. A subspace is deemed significant if the samples corresponding to different phenotypes are well separated in that subspace. The separation is measured using Hotelling’s T2 statistic, which captures the covariance structure of the subspace. When the dimension of the subspace is larger than that of the sample space, we project the original data to a smaller orthonormal subspace. We use this method to search through functional pathway subspaces defined by Reactome, KEGG, BioCarta, and Gene Ontology. To demonstrate its performance, we apply this method to the data from two published studies, and we visualize the results in the principal component space.
PMCID: PMC2813864  PMID: 16877751
9.  PrepMS: TOF MS Data Graphical Preprocessing Tool 
Bioinformatics (Oxford, England)  2006;23(2):264-265.
We introduce a simple-to-use graphical tool that enables researchers to easily prepare time-of-flight mass spectrometry data for analysis. For ease of use, the graphical executable provides default parameter settings experimentally determined to work well in most situations. These values can be changed by the user if desired. PrepMS is a stand-alone application made freely available (open source), and is under the General Public License (GPL). Its graphical user interface, default parameter settings, and display plots allow PrepMS to be used effectively for data preprocessing, peak detection, and visual data quality assessment.
PMCID: PMC2633108  PMID: 17121773
10.  THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures 
Bioinformatics (Oxford, England)  2006;22(17):2171-2172.
THESEUS is a command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While conventional superpositioning methods use ordinary least-squares (LS) as the optimization criterion, ML superpositions provide substantially improved accuracy by down-weighting variable structural regions and by correcting for correlations among atoms. ML superpositioning is robust and insensitive to the specific atoms included in the analysis, and thus it does not require subjective pruning of selected variable atomic coordinates. Output includes both likelihood-based and frequentist statistics for accurate evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. THESEUS performs principal components analysis for analyzing the complex correlations found among atoms within a structural ensemble.
PMCID: PMC2584349  PMID: 16777907
11.  Babel's tower revisited: A universal resource for cross-referencing across annotation databases 
Bioinformatics (Oxford, England)  2006;22(23):2934-2939.
Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological entities. As we show in this paper, incoherent name spaces between various databases represent a serious impediment to using the existing annotations at their full potential. Navigating between various such name spaces by mapping IDs from one database to another is a very important issue which is not properly addressed at the moment.
We have developed a web-based resource, Onto-Translate (OT), which effectively addresses this problem. OT is able to map onto each other different types of biological entities from the following annotation databases: Swiss-Prot, TrEMBL, NREF, PIR, Gene Ontology, KEGG, Entrez Gene, GenBank, GenPept, IMAGE, RefSeq, UniGene, OMIM, PDB, Eukaryotic Promoter Database, HUGO Gene Nomenclature Committee and NetAffx. Currently, OT is able to perform 462 types of mappings between 29 different types of IDs from 17 databases concerning 53 organisms. Among these, over 300 types of translations and 15 types of IDs are not currently supported by any other tool or resource. On average, OT is able to correctly map between 96% and 99% of the biological entities provided as input. In terms of speed, sets of approximatively 20,000 IDs can be translated in under 30 seconds, in most cases.
Onto-Translate is a part of Onto-Tools, which is freely available at
PMCID: PMC2435247  PMID: 17068090
12.  High-resolution spatial normalization for microarrays containing embedded technical replicates 
Bioinformatics (Oxford, England)  2006;22(24):3054-3060.
Microarray data are susceptible to a wide-range of artifacts, many of which occur on physical scales comparable to the spatial dimensions of the array. These artifacts introduce biases that are spatially correlated. The ability of current methodologies to detect and correct such biases is limited.
We introduce a new approach for analyzing spatial artifacts, termed ‘conditional residual analysis for microarrays’ (CRAM). CRAM requires a microarray design that contains technical replicates of representative features and a limited number of negative controls, but is free of the assumptions that constrain existing analytical procedures. The key idea is to extract residuals from sets of matched replicates to generate residual images. The residual images reveal spatial artifacts with single-feature resolution. Surprisingly, spatial artifacts were found to coexist independently as additive and multiplicative errors. Efficient procedures for bias estimation were devised to correct the spatial artifacts on both intensity scales. In a survey of 484 published single-channel datasets, variance fell 4- to 12-fold in 5% of the datasets after bias correction. Thus, inclusion of technical replicates in a microarray design affords benefits far beyond what one might expect with a conventional ‘n = 5’ averaging, and should be considered when designing any microarray for which randomization is feasible.
PMCID: PMC2262854  PMID: 17060357
13.  Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset 
Bioinformatics (Oxford, England)  2006;22(14):1737-1744.
Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional “noise” introduced by non-informative measurements.
We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters.
The open-source package gimm is available at
Supplementary information
PMCID: PMC1617036  PMID: 16709591
14.  Global topological features of cancer proteins in the human interactome 
Bioinformatics (Oxford, England)  2006;22(18):2291-2297.
The study of interactomes, or networks of protein-protein interactions, is increasingly providing valuable information on biological systems. Here we report a study of cancer proteins in an extensive human protein-protein interaction network constructed by computational methods.
We show that human proteins translated from known cancer genes exhibit a network topology that is different from that of proteins not documented as being mutated in cancer. In particular, cancer proteins show an increase in the number of proteins they interact with. They also appear to participate in central hubs rather than peripheral ones, mirroring their greater centrality and participation in networks that form the backbone of the proteome. Moreover, we show that cancer proteins contain a high ratio of highly promiscuous structural domains, i.e., domains with a high propensity for mediating protein interactions. These observations indicate an underlying evolutionary distinction between the two groups of proteins, reflecting the central roles of proteins, whose mutations lead to cancer.
PMCID: PMC1865486  PMID: 16844706
15.  COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations 
Bioinformatics (Oxford, England)  2006;22(7):779-788.
Determining orthology relations among genes across multiple genomes is an important problem in the post-genomic era. Identifying orthologous genes can not only help predict functional annotations for newly sequenced or poorly characterized genomes, but can also help predict new protein–protein interactions. Unfortunately, determining orthology relation through computational methods is not straightforward due to the presence of paralogs. Traditional approaches have relied on pairwise sequence comparisons to construct graphs, which were then partitioned into putative clusters of orthologous groups. These methods do not attempt to preserve the non-transitivity and hierarchic nature of the orthology relation.
We propose a new method, COCO-CL, for hierarchical clustering of homology relations and identification of orthologous groups of genes. Unlike previous approaches, which are based on pairwise sequence comparisons, our method explores the correlation of evolutionary histories of individual genes in a more global context. COCO-CL can be used as a semi-independent method to delineate the orthology/paralogy relation for a refined set of homologous proteins obtained using a less-conservative clustering approach, or as a refiner that removes putative out-paralogs from clusters computed using a more inclusive approach. We analyze our clustering results manually, with support from literature and functional annotations. Since our orthology determination procedure does not employ a species tree to infer duplication events, it can be used in situations when the species tree is unknown or uncertain.
PMCID: PMC1620014  PMID: 16434444

Results 1-15 (15)