PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-13 (13)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains 
Katayama, Toshiaki | Wilkinson, Mark D | Aoki-Kinoshita, Kiyoko F | Kawashima, Shuichi | Yamamoto, Yasunori | Yamaguchi, Atsuko | Okamoto, Shinobu | Kawano, Shin | Kim, Jin-Dong | Wang, Yue | Wu, Hongyan | Kano, Yoshinobu | Ono, Hiromasa | Bono, Hidemasa | Kocbek, Simon | Aerts, Jan | Akune, Yukie | Antezana, Erick | Arakawa, Kazuharu | Aranda, Bruno | Baran, Joachim | Bolleman, Jerven | Bonnal, Raoul JP | Buttigieg, Pier Luigi | Campbell, Matthew P | Chen, Yi-an | Chiba, Hirokazu | Cock, Peter JA | Cohen, K Bretonnel | Constantin, Alexandru | Duck, Geraint | Dumontier, Michel | Fujisawa, Takatomo | Fujiwara, Toyofumi | Goto, Naohisa | Hoehndorf, Robert | Igarashi, Yoshinobu | Itaya, Hidetoshi | Ito, Maori | Iwasaki, Wataru | Kalaš, Matúš | Katoda, Takeo | Kim, Taehong | Kokubu, Anna | Komiyama, Yusuke | Kotera, Masaaki | Laibe, Camille | Lapp, Hilmar | Lütteke, Thomas | Marshall, M Scott | Mori, Takaaki | Mori, Hiroshi | Morita, Mizuki | Murakami, Katsuhiko | Nakao, Mitsuteru | Narimatsu, Hisashi | Nishide, Hiroyo | Nishimura, Yosuke | Nystrom-Persson, Johan | Ogishima, Soichi | Okamura, Yasunobu | Okuda, Shujiro | Oshita, Kazuki | Packer, Nicki H | Prins, Pjotr | Ranzinger, Rene | Rocca-Serra, Philippe | Sansone, Susanna | Sawaki, Hiromichi | Shin, Sung-Ho | Splendiani, Andrea | Strozzi, Francesco | Tadaka, Shu | Toukach, Philip | Uchiyama, Ikuo | Umezaki, Masahito | Vos, Rutger | Whetzel, Patricia L | Yamada, Issaku | Yamasaki, Chisato | Yamashita, Riu | York, William S | Zmasek, Christian M | Kawamoto, Shoko | Takagi, Toshihisa
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
doi:10.1186/2041-1480-5-5
PMCID: PMC3978116  PMID: 24495517
BioHackathon; Bioinformatics; Semantic Web; Web services; Ontology; Visualization; Knowledge representation; Databases; Semantic interoperability; Data models; Data sharing; Data integration
2.  The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies 
Background
BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research.
Results
The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization.
Conclusion
We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.
doi:10.1186/2041-1480-4-6
PMCID: PMC3598643  PMID: 23398680
BioHackathon; Open source; Software; Semantic Web; Databases; Data integration; Data visualization; Web services; Interfaces
3.  Towards linked open gene mutations data 
BMC Bioinformatics  2012;13(Suppl 4):S7.
Background
With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework.
In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data.
Methods
A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest.
Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite.
Results
We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application.
Conclusions
This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.
The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.
doi:10.1186/1471-2105-13-S4-S7
PMCID: PMC3303732  PMID: 22536974
4.  Semantic Web Applications and Tools for the Life Sciences: SWAT4LS 2010 
BMC Bioinformatics  2012;13(Suppl 1):S1.
As Semantic Web technologies mature and new releases of key elements, such as SPARQL 1.1 and OWL 2.0, become available, the Life Sciences continue to push the boundaries of these technologies with ever more sophisticated tools and applications. Unsurprisingly, therefore, interest in the SWAT4LS (Semantic Web Applications and Tools for the Life Sciences) activities have remained high, as was evident during the third international SWAT4LS workshop held in Berlin in December 2010. Contributors to this workshop were invited to submit extended versions of their papers, the best of which are now made available in the special supplement of BMC Bioinformatics. The papers reflect the wide range of work in this area, covering the storage and querying of Life Sciences data in RDF triple stores, tools for the development of biomedical ontologies and the semantics-based integration of Life Sciences as well as clinicial data.
doi:10.1186/1471-2105-13-S1-S1
PMCID: PMC3471345  PMID: 22373274
5.  Gauging triple stores with actual biological data 
BMC Bioinformatics  2012;13(Suppl 1):S3.
Background
Semantic Web technologies have been developed to overcome the limitations of the current Web and conventional data integration solutions. The Semantic Web is expected to link all the data present on the Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a language designed for querying RDF-based models. The Semantic Web technologies should allow federated queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries as applied to a number of different triple store implementations.
Results
Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology implemented as a triple store. We have now compared the performance of these queries on five non-commercial triple stores: OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We examined three performance aspects: the data uploading time, the query execution time and the scalability. The queries we had chosen addressed diverse ontological or biological questions, and we found that individual store performance was quite query-specific. We identified three groups of queries displaying similar behaviour across the different stores: 1) relatively short response time queries, 2) moderate response time queries and 3) relatively long response time queries. SwiftOWLIM proved to be a winner in the first group, 4Store in the second one and Virtuoso in the third one.
Conclusions
Our analysis showed that some queries behaved idiosyncratically, in a triple store specific manner, mainly with SwiftOWLIM and 4Store. Virtuoso, as expected, displayed a very balanced performance - its load time and its response time for all the tested queries were better than average among the selected stores; it showed a very good scalability and a reasonable run-to-run reproducibility. Jena SDB and Jena TDB were consistently slower than the other three implementations. Our analysis demonstrated that most queries developed for Virtuoso could be successfully used for other implementations.
doi:10.1186/1471-2105-13-S1-S3
PMCID: PMC3471352  PMID: 22373359
6.  Knowledge sharing and collaboration in translational research, and the DC-THERA Directory 
Briefings in Bioinformatics  2011;12(6):562-575.
Biomedical research relies increasingly on large collections of data sets and knowledge whose generation, representation and analysis often require large collaborative and interdisciplinary efforts. This dimension of ‘big data’ research calls for the development of computational tools to manage such a vast amount of data, as well as tools that can improve communication and access to information from collaborating researchers and from the wider community. Whenever research projects have a defined temporal scope, an additional issue of data management arises, namely how the knowledge generated within the project can be made available beyond its boundaries and life-time. DC-THERA is a European ‘Network of Excellence’ (NoE) that spawned a very large collaborative and interdisciplinary research community, focusing on the development of novel immunotherapies derived from fundamental research in dendritic cell immunobiology. In this article we introduce the DC-THERA Directory, which is an information system designed to support knowledge management for this research community and beyond. We present how the use of metadata and Semantic Web technologies can effectively help to organize the knowledge generated by modern collaborative research, how these technologies can enable effective data management solutions during and beyond the project lifecycle, and how resources such as the DC-THERA Directory fit into the larger context of e-science.
doi:10.1093/bib/bbr051
PMCID: PMC3220873  PMID: 21969471
semantic web; ontology; immunology; eScience; data integration
7.  BioPAX – A community standard for pathway data sharing 
Demir, Emek | Cary, Michael P. | Paley, Suzanne | Fukuda, Ken | Lemer, Christian | Vastrik, Imre | Wu, Guanming | D’Eustachio, Peter | Schaefer, Carl | Luciano, Joanne | Schacherer, Frank | Martinez-Flores, Irma | Hu, Zhenjun | Jimenez-Jacinto, Veronica | Joshi-Tope, Geeta | Kandasamy, Kumaran | Lopez-Fuentes, Alejandra C. | Mi, Huaiyu | Pichler, Elgar | Rodchenkov, Igor | Splendiani, Andrea | Tkachev, Sasha | Zucker, Jeremy | Gopinath, Gopal | Rajasimha, Harsha | Ramakrishnan, Ranjani | Shah, Imran | Syed, Mustafa | Anwar, Nadia | Babur, Ozgun | Blinov, Michael | Brauner, Erik | Corwin, Dan | Donaldson, Sylva | Gibbons, Frank | Goldberg, Robert | Hornbeck, Peter | Luna, Augustin | Murray-Rust, Peter | Neumann, Eric | Reubenacker, Oliver | Samwald, Matthias | van Iersel, Martijn | Wimalaratne, Sarala | Allen, Keith | Braun, Burk | Whirl-Carrillo, Michelle | Dahlquist, Kam | Finney, Andrew | Gillespie, Marc | Glass, Elizabeth | Gong, Li | Haw, Robin | Honig, Michael | Hubaut, Olivier | Kane, David | Krupa, Shiva | Kutmon, Martina | Leonard, Julie | Marks, Debbie | Merberg, David | Petri, Victoria | Pico, Alex | Ravenscroft, Dean | Ren, Liya | Shah, Nigam | Sunshine, Margot | Tang, Rebecca | Whaley, Ryan | Letovksy, Stan | Buetow, Kenneth H. | Rzhetsky, Andrey | Schachter, Vincent | Sobral, Bruno S. | Dogrusoz, Ugur | McWeeney, Shannon | Aladjem, Mirit | Birney, Ewan | Collado-Vides, Julio | Goto, Susumu | Hucka, Michael | Le Novère, Nicolas | Maltsev, Natalia | Pandey, Akhilesh | Thomas, Paul | Wingender, Edgar | Karp, Peter D. | Sander, Chris | Bader, Gary D.
Nature biotechnology  2010;28(9):935-942.
BioPAX (Biological Pathway Exchange) is a standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data (http://www.biopax.org). Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.
doi:10.1038/nbt.1666
PMCID: PMC3001121  PMID: 20829833
pathway data integration; pathway database; standard exchange format; ontology; information system
8.  Biomedical semantics in the Semantic Web 
Journal of Biomedical Semantics  2011;2(Suppl 1):S1.
The Semantic Web offers an ideal platform for representing and linking biomedical information, which is a prerequisite for the development and application of analytical tools to address problems in data-intensive areas such as systems biology and translational medicine. As for any new paradigm, the adoption of the Semantic Web offers opportunities and poses questions and challenges to the life sciences scientific community: which technologies in the Semantic Web stack will be more beneficial for the life sciences? Is biomedical information too complex to benefit from simple interlinked representations? What are the implications of adopting a new paradigm for knowledge representation? What are the incentives for the adoption of the Semantic Web, and who are the facilitators? Is there going to be a Semantic Web revolution in the life sciences?
We report here a few reflections on these questions, following discussions at the SWAT4LS (Semantic Web Applications and Tools for Life Sciences) workshop series, of which this Journal of Biomedical Semantics special issue presents selected papers from the 2009 edition, held in Amsterdam on November 20th.
doi:10.1186/2041-1480-2-S1-S1
PMCID: PMC3105493  PMID: 21388570
9.  DC-ATLAS: a systems biology resource to dissect receptor specific signal transduction in dendritic cells 
Immunome Research  2010;6:10.
Background
The advent of Systems Biology has been accompanied by the blooming of pathway databases. Currently pathways are defined generically with respect to the organ or cell type where a reaction takes place. The cell type specificity of the reactions is the foundation of immunological research, and capturing this specificity is of paramount importance when using pathway-based analyses to decipher complex immunological datasets. Here, we present DC-ATLAS, a novel and versatile resource for the interpretation of high-throughput data generated perturbing the signaling network of dendritic cells (DCs).
Results
Pathways are annotated using a novel data model, the Biological Connection Markup Language (BCML), a SBGN-compliant data format developed to store the large amount of information collected. The application of DC-ATLAS to pathway-based analysis of the transcriptional program of DCs stimulated with agonists of the toll-like receptor family allows an integrated description of the flow of information from the cellular sensors to the functional outcome, capturing the temporal series of activation events by grouping sets of reactions that occur at different time points in well-defined functional modules.
Conclusions
The initiative significantly improves our understanding of DC biology and regulatory networks. Developing a systems biology approach for immune system holds the promise of translating knowledge on the immune system into more successful immunotherapy strategies.
doi:10.1186/1745-7580-6-10
PMCID: PMC3000836  PMID: 21092113
10.  Semantic Web Applications and Tools for Life Sciences, 2008 – Introduction 
BMC Bioinformatics  2009;10(Suppl 10):S1.
doi:10.1186/1471-2105-10-S10-S1
PMCID: PMC2755817  PMID: 19796393
11.  RDFScape: Semantic Web meets Systems Biology 
BMC Bioinformatics  2008;9(Suppl 4):S6.
Background
The recent availability of high-throughput data in molecular biology has increased the need for a formal representation of this knowledge domain. New ontologies are being developed to formalize knowledge, e.g. about the functions of proteins. As the Semantic Web is being introduced into the Life Sciences, the basis for a distributed knowledge-base that can foster biological data analysis is laid. However, there still is a dichotomy, in tools and methodologies, between the use of ontologies in biological investigation, that is, in relation to experimental observations, and their use as a knowledge-base.
Results
RDFScape is a plugin that has been developed to extend a software oriented to biological analysis with support for reasoning on ontologies in the semantic web framework. We show with this plugin how the use of ontological knowledge in biological analysis can be extended through the use of inference. In particular, we present two examples relative to ontologies representing biological pathways: we demonstrate how these can be abstracted and visualized as interaction networks, and how reasoning on causal dependencies within elements of pathways can be implemented.
Conclusions
The use of ontologies for the interpretation of high-throughput biological data can be improved through the use of inference. This allows the use of ontologies not only as annotations, but as a knowledge-base from which new information relevant for specific analysis can be derived.
doi:10.1186/1471-2105-9-S4-S6
PMCID: PMC2367633  PMID: 18460179
12.  The Genopolis Microarray Database 
BMC Bioinformatics  2007;8(Suppl 1):S21.
Background
Gene expression databases are key resources for microarray data management and analysis and the importance of a proper annotation of their content is well understood.
Public repositories as well as microarray database systems that can be implemented by single laboratories exist. However, there is not yet a tool that can easily support a collaborative environment where different users with different rights of access to data can interact to define a common highly coherent content. The scope of the Genopolis database is to provide a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions.
Results
The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip® platform. It supports dynamical definition of controlled vocabularies and provides automated and supervised steps to control the coherence of data and annotations. It allows a precise control of the visibility of the database content to different sub groups in the community and facilitates exports of its content to public repositories. It provides an interactive users interface for data analysis: this allows users to visualize data matrices based on functional lists and sample characterization, and to navigate to other data matrices defined by similarity of expression values as well as functional characterizations of genes involved. A collaborative environment is also provided for the definition and sharing of functional annotation by users.
Conclusion
The Genopolis Database supports a community in building a common coherent knowledge base and analyse it. This fills a gap between a local database and a public repository, where the development of a common coherent annotation is important. In its current implementation, it provides a uniform coherently annotated dataset on dendritic cells and macrophage differentiation.
doi:10.1186/1471-2105-8-S1-S21
PMCID: PMC1885851  PMID: 17430566
13.  A power law global error model for the identification of differentially expressed genes in microarray data 
BMC Bioinformatics  2004;5:203.
Background
High-density oligonucleotide microarray technology enables the discovery of genes that are transcriptionally modulated in different biological samples due to physiology, disease or intervention. Methods for the identification of these so-called "differentially expressed genes" (DEG) would largely benefit from a deeper knowledge of the intrinsic measurement variability. Though it is clear that variance of repeated measures is highly dependent on the average expression level of a given gene, there is still a lack of consensus on how signal reproducibility is linked to signal intensity. The aim of this study was to empirically model the variance versus mean dependence in microarray data to improve the performance of existing methods for identifying DEG.
Results
In the present work we used data generated by our lab as well as publicly available data sets to show that dispersion of repeated measures depends on location of the measures themselves following a power law. This enables us to construct a power law global error model (PLGEM) that is applicable to various Affymetrix GeneChip data sets. A new DEG identification method is therefore proposed, consisting of a statistic designed to make explicit use of model-derived measurement spread estimates and a resampling-based hypothesis testing algorithm.
Conclusions
The new method provides a control of the false positive rate, a good sensitivity vs. specificity trade-off and consistent results with varying number of replicates and even using single samples.
doi:10.1186/1471-2105-5-203
PMCID: PMC545082  PMID: 15606915

Results 1-13 (13)