PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-5 (5)
 

Clipboard (0)
None

Select a Filter Below

Journals
Authors
more »
Year of Publication
Document Types
1.  Mapping between the OBO and OWL ontology languages 
Journal of Biomedical Semantics  2011;2(Suppl 1):S3.
Background
Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL.
Results
We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source.
Conclusions
Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner.
doi:10.1186/2041-1480-2-S1-S3
PMCID: PMC3105495  PMID: 21388572
2.  Mining gene functional networks to improve mass-spectrometry-based protein identification 
Bioinformatics  2009;25(22):2955-2961.
Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly.
Results: We develop a method that analyzes MS/MS experiments in the larger context of the biological processes active in a cell. Our method, MSNet, improves protein identification in shotgun proteomics experiments by considering information on functional associations from a gene functional network. MSNet substantially increases the number of proteins identified in the sample at a given error rate. We identify 8–29% more proteins than the original MS experiment when applied to yeast grown in different experimental conditions analyzed on different MS/MS instruments, and 37% more proteins in a human sample. We validate up to 94% of our identifications in yeast by presence in ground-truth reference sets.
Availability and Implementation: Software and datasets are available at http://aug.csres.utexas.edu/msnet
Contact: miranker@cs.utexas.edu, marcotte@icmb.utexas.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp461
PMCID: PMC2773251  PMID: 19633097
3.  Integrating shotgun proteomics and mRNA expression data to improve protein identification 
Bioinformatics  2009;25(11):1397-1403.
Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration.
Results: We develop a Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identification in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, substantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by ∼40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions and organisms (Escherichia coli, human), and predict 19–63% more proteins across the different datasets. MSpresso demonstrates that incorporating prior knowledge of protein presence into shotgun proteomics experiments can substantially improve protein identification scores.
Availability and Implementation: Software is available upon request from the authors. Mass spectrometry datasets and supplementary information are available from http://www.marcottelab.org/MSpresso/.
Contact: marcotte@icmb.utexas.edu; miranker@cs.utexas.edu
Supplementary Information: Supplementary data website: http://www.marcottelab.org/MSpresso/.
doi:10.1093/bioinformatics/btp168
PMCID: PMC2682515  PMID: 19318424
4.  Biclustering as a method for RNA local multiple sequence alignment 
Bioinformatics (Oxford, England)  2007;23(24):3289-3296.
Motivations
Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address.
Results
We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count.
Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions.
doi:10.1093/bioinformatics/btm485
PMCID: PMC2228335  PMID: 17921494
5.  Predicting combinatorial binding of transcription factors to regulatory elements in the human genome by association rule mining 
BMC Bioinformatics  2007;8:445.
Background
Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cis-regulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment.
Results
Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature.
Conclusion
Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.
doi:10.1186/1471-2105-8-445
PMCID: PMC2211755  PMID: 18005433

Results 1-5 (5)