Related Articles
Monitoring global gene expression provides insight into how genes and regulatory signals work together to guide embryo
development. The fields of developmental biology and teratology are now confronted with the need for automated access to a
reference library of gene-expression signatures that benchmark programmed (genetic) and adaptive (environmental) regulation of the
embryonic transcriptome. Such a library must be constructed from highly-distributed microarray data. Birth Defects Systems Manager
(BDSM), an open access knowledge management system, provides custom software to mine public microarray data focused on
developmental health and disease. The present study describes tools for seamless data integration in the BDSM library (MetaSample,
MetaChip, CIAeasy) using the QueryBDSM module. A field test of the prototype was run using published microarray data series
derived from a variety of laboratories, experiments, microarray platforms, organ systems, and developmental stages. The datasets
focused on several developing systems in the mouse embryo, including preimplantation stages, heart and nerve development, testis
and ovary development, and craniofacial development. Using BDSM data integration tools, a gene-expression signature for 346 genes
was resolved that accurately classified samples by organ system and developmental sequence. The module builds a potential for the
BDSM approach to decipher a large number developmental processes through comparative bioinformatics analysis of embryological
systems at-risk for specific defects, using multiple scenarios to define the range of probabilities leading from molecular phenotype to
clinical phenotype. We conclude that an integrative analysis of global gene-expression of the developing embryo can form the
foundation for constructing a reference library of signaling pathways and networks for normal and abnormal regulation of the
embryonic transcriptome. These tools are available free of charge from the web-site http://systemsanalysis.louisville.edu requiring
only a short registration process.
PMCID: PMC1896055
PMID: 17597930
transcriptome; mouse; expression; embryo; integrative analysis; birth defects
CisGenome is a software system for analyzing genome-wide chromatin immunoprecipitation (ChIP) data. It is designed to meet all basic needs of ChIP data analyses, including visualization, data normalization, peak detection, false discovery rate (FDR) computation, gene-peak association, and sequence and motif analysis. In addition to implementing previously published ChIP-chip analysis methods, the software contains new statistical methods designed specifically for ChIP-seq data. CisGenome has a modular design so that it supports interactive analyses through a graphic user interface as well as customized batch-mode computation for advanced data mining. A built-in browser allows visualization of array images, signals, gene structure, conservation, and DNA sequence and motif information. We illustrate the use of these tools by a comparative analysis of ChIP-chip and ChIP-seq data for the transcription factor NRSF/REST, a study of ChIP-seq analysis without negative control sample, and an analysis of a novel motif in Nanog- and Sox2-binding regions.
doi:10.1038/nbt.1505
PMCID: PMC2596672
PMID: 18978777
A well-designed microarray database can provide valuable information on gene expression levels. However, designing an efficient microarray database with minimum space usage is not an easy task since designers need to integrate the microarray data with the information of genes, probe annotation, and the descriptions of each microarray experiment. Developing better methods to store microarray data can greatly improve the efficiency and usefulness of such data. A new schema is proposed to store microarray data by using array data type in an object-relational database management system – PostgreSQL. The implemented database can store all the microarray data from the same chip in an array data structure. The variable length array data type in PostgreSQL can store microarray data from same chip. The implementation of our schema can help to increase the data retrieval and space efficiency.
doi:10.1016/j.compbiolchem.2007.01.004
PMCID: PMC2709412
PMID: 17392028
Microarray; database schema; array data type; PostgreSQL
Motivation: Modern data acquisition methods in biology allow the procurement of different types of data in increasing quantity, facilitating a comprehensive view of biological systems. As data are usually gathered and interpreted by separate domain scientists, it is hard to grasp multidomain properties and structures. Consequently, there is a need for the integration of biological data from different sources and of different types in one application, providing various visualization approaches.
Results: In this article, methods for the integration and visualization of multimodal biological data are presented. This is achieved based on two graphs representing the meta-relations between biological data and the measurement combinations, respectively. Both graphs are linked and serve as different views of the integrated data with navigation and exploration possibilities. Data can be combined and visualized multifariously, resulting in views of the integrated biological data.
Availability: http://vanted.ipk-gatersleben.de/hive/.
Contact: rohn@ipk-gatersleben.de
doi:10.1093/bioinformatics/btr282
PMCID: PMC3117374
PMID: 21551150
Abstract Objective: The aim of the project ARIANE is to model and
implement seamless, natural, and easy-to-use interfaces with various kinds of
heterogeneous biomedical information databases.
Design: A conceptual model of some of the Unified Medical Language
System (UMLS) knowledge sources has been developed to help end-users to query
information databases. A query is represented by a conceptual graph that
translates the deep structure of an end-user's interest in a topic. A
computational model exploits this conceptual model to build a query
interactively represented as query graph. A query graph is then matched to the
data graph built with data issued from each record of a database by means of a
pattern-matching (projection) rule that applies to conceptual graphs.
Results: Prototypes have been implemented to test the feasibility of
the model with different kinds of information databases. Three cases are
studied: 1) information in records is structured according to the UMLS
knowledge sources; 2) information is able to be structured without error in
the frame of the UMLS knowledge; 3) information cannot be structured. In each
case the pattern-matching is processed by the projection rule according to the
structure of information that has been implemented in the databases.
Conclusion: The conceptual graphs theory provides with a homogeneous
and powerful formalism able to represent both concepts, instances of concepts
in medical contexts, and associations by means of relationships, and to
represent data at different levels of details. The conceptual-graphs formalism
allows powerful capabilities to operate a semantic integration of information
databases using the UMLS knowledge sources.
PMCID: PMC61275
PMID: 9452985
We report the design and operation of a Virtual Instrument (VI) system based on LabVIEW 2009 for laser-induced fluorescence experiments. This system achieves synchronous control of equipment and acquisition of real-time fluorescence data communicating with a single computer via GPIB, USB, RS232, and parallel ports. The reported VI system can also accomplish data display, saving, and analysis, and printing the results. The VI system performs sequences of operations automatically, and this system has been successfully applied to obtain the excitation and dispersion spectra of α-methylnaphthalene. The reported VI system opens up new possibilities for researchers and increases the efficiency and precision of experiments. The design and operation of the VI system are described in detail in this paper, and the advantages that this system can provide are highlighted.
doi:10.1155/2011/457156
PMCID: PMC3195440
PMID: 22013388
FitzGerald, Thomas J. | Bishop-Jodoin, Maryann | Bosch, Walter R. | Curran, Walter J. | Followill, David S. | Galvin, James M. | Hanusik, Richard | King, Steven R. | Knopp, Michael V. | Laurie, Fran | O'Meara, Elizabeth | Michalski, Jeff M. | Saltz, Joel H. | Schnall, Mitchell D. | Schwartz, Lawrence | Ulin, Kenneth | Xiao, Ying | Urie, Marcia
The National Cancer Institute clinical cooperative groups have been instrumental over the past 50 years in developing clinical trials and evidence-based process improvements for clinical oncology patient care. The cooperative groups are undergoing a transformation process as we further integrate molecular biology into personalized patient care and move to incorporate international partners in clinical trials. To support this vision, data acquisition and data management informatics tools must become both nimble and robust to support transformational research at an enterprise level. Information, including imaging, pathology, molecular biology, radiation oncology, surgery, systemic therapy, and patient outcome data needs to be integrated into the clinical trial charter using adaptive clinical trial mechanisms for design of the trial. This information needs to be made available to investigators using digital processes for real-time data analysis. Future clinical trials will need to be designed and completed in a timely manner facilitated by nimble informatics processes for data management. This paper discusses both past experience and future vision for clinical trials as we move to develop data management and quality assurance processes to meet the needs of the modern trial.
doi:10.3389/fonc.2013.00031
PMCID: PMC3598226
radiation oncology; quality assurance; oncology clinical trials; National Cancer Institute; Clinical Trials Cooperative Group Program
Background
The goal of information integration in systems biology is to combine information from a number of databases and data sets, which are obtained from both high and low throughput experiments, under one data management scheme such that the cumulative information provides greater biological insight than is possible with individual information sources considered separately.
Results
Here we present PathSys, a graph-based system for creating a combined database of networks of interaction for generating integrated view of biological mechanisms. We used PathSys to integrate over 14 curated and publicly contributed data sources for the budding yeast (S. cerevisiae) and Gene Ontology. A number of exploratory questions were formulated as a combination of relational and graph-based queries to the integrated database. Thus, PathSys is a general-purpose, scalable, graph-data warehouse of biological information, complete with a graph manipulation and a query language, a storage mechanism and a generic data-importing mechanism through schema-mapping.
Conclusion
Results from several test studies demonstrate the effectiveness of the approach in retrieving biologically interesting relations between genes and proteins, the networks connecting them, and of the utility of PathSys as a scalable graph-based warehouse for interaction-network integration and a hypothesis generator system. The PathSys's client software, named BiologicalNetworks, developed for navigation and analyses of molecular networks, is available as a Java Web Start application at .
doi:10.1186/1471-2105-7-55
PMCID: PMC1409799
PMID: 16464251
EvolView is a web application for visualizing, annotating and managing phylogenetic trees. First, EvolView is a phylogenetic tree viewer and customization tool; it visualizes trees in various formats, customizes them through built-in functions that can link information from external datasets, and exports the customized results to publication-ready figures. Second, EvolView is a tree and dataset management tool: users can easily organize related trees into distinct projects, add new datasets to trees and edit and manage existing trees and datasets. To make EvolView easy to use, it is equipped with an intuitive user interface. With a free account, users can save data and manipulations on the EvolView server. EvolView is freely available at: http://www.evolgenius.info/evolview.html.
doi:10.1093/nar/gks576
PMCID: PMC3394307
PMID: 22695796
The Ocean Bottom Seismometer (OBS) is a key instrument for the geophysical study of sea sub-bottom layers. At present, more reliable autonomous instruments capable of recording underwater for long periods of time and therefore handling large data storage are needed. This paper presents a new Ocean Bottom Seismometer designed to be used in long duration seismic surveys. Power consumption and noise level of the acquisition system are the key points to optimize the autonomy and the data quality. To achieve our goals, a new low power data logger with high resolution and Signal–to-Noise Ratio (SNR) based on Compact Flash memory card is designed to enable continuous data acquisition. The equipment represents the achievement of joint work from different scientific and technological disciplines as electronics, mechanics, acoustics, communications, information technology, marine geophysics, etc. This easy to handle and sophisticated equipment allows the recording of useful controlled source and passive seismic data, as well as other time varying data, with multiple applications in marine environment research. We have been working on a series of prototypes for ten years to improve many of the aspects that make the equipment easy to handle and useful to work in deep-water areas. Ocean Bottom Seismometers (OBS) have received growing attention from the geoscience community during the last forty years. OBS sensors recording motion of the ocean floor hold key information in order to study offshore seismicity and to explore the Earth’s crust. In a seismic survey, a series of OBSs are placed on the seabed of the area under study, where they record either natural seismic activity or acoustic signals generated by compressed air-guns on the ocean surface. The resulting data sets are subsequently used to model both the earthquake locations and the crustal structure.
doi:10.3390/s120303693
PMCID: PMC3376630
PMID: 22737032
ocean bottom seismometer; geophone; sensor modeling; refraction seismicity; clock synchronization; precision time protocol
The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems. These biological machines are built from basic biomolecular components analogous to electrical devices, and the information flow among these components requires the augmentation of biological insight with the power of a formal approach to information management. Here we review the informatics challenges in synthetic biology along three dimensions: in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, from the specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitro synthetic biology in terms of information flow, and discuss genetic fidelity in DNA manipulation, development strategies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineering chassis can manipulate biological circuitries in vivo to give rise to future artificial organisms.
doi:10.1093/bib/bbp054
PMCID: PMC2810114
PMID: 19906839
informatics; synthetic biology; systems biology; networks
The growing availability of continuous data from medical devices in diabetes management makes it crucial to define novel information technology architectures for efficient data storage, data transmission, and data visualization. The new paradigm of care demands the sharing of information in interoperable systems as the only way to support patient care in a continuum of care scenario. The technological platforms should support all the services required by the actors involved in the care process, located in different scenarios and managing diverse information for different purposes. This article presents basic criteria for defining flexible and adaptive architectures that are capable of interoperating with external systems, and integrating medical devices and decision support tools to extract all the relevant knowledge to support diabetes care.
PMCID: PMC2769800
PMID: 19885276
diabetes; continuous data management; device interoperability; software architecture
The rapid expansion of biomedical research has brought substantial scientific and administrative data management challenges to modern core facilities. Scientifically, a core facility must be able to manage experimental workflow and the corresponding set of large and complex scientific data. It must also disseminate experimental data to relevant researchers in a secure and expedient manner that facilitates collaboration and provides support for data interpretation and analysis. Administratively, a core facility must be able to manage the scheduling of its equipment and to maintain a flexible and effective billing system to track material, resource, and personnel costs and charge for services to sustain its operation. It must also have the ability to regularly monitor the usage and performance of its equipment and to provide summary statistics on resources spent on different categories of research. To address these informatics challenges, we introduce a comprehensive system called MIMI (multimodality, multiresource, information integration environment) that integrates the administrative and scientific support of a core facility into a single web-based environment. We report the design, development, and deployment experience of a baseline MIMI system at an imaging core facility and discuss the general applicability of such a system in other types of core facilities. These initial results suggest that MIMI will be a unique, cost-effective approach to addressing the informatics infrastructure needs of core facilities and similar research laboratories.
doi:10.1007/s10278-007-9083-y
PMCID: PMC3043715
PMID: 17999114
Computer system; data collection; data extraction; image data; image distribution; information management; information storage and retrieval; information system; internet; management information systems; open source; cost analysis; biomedical core facilities; unified modeling language
Background
Emerging information technologies present new opportunities to reduce the burden of malaria, dengue and other infectious diseases. For example, use of a data management system software package can help disease control programs to better manage and analyze their data, and thus enhances their ability to carry out continuous surveillance, monitor interventions and evaluate control program performance.
Methods and Findings
We describe a novel multi-disease data management system platform (hereinafter referred to as the system) with current capacity for dengue and malaria that supports data entry, storage and query. It also allows for production of maps and both standardized and customized reports. The system is comprised exclusively of software components that can be distributed without the user incurring licensing costs. It was designed to maximize the ability of the user to adapt the system to local conditions without involvement of software developers. Key points of system adaptability include 1) customizable functionality content by disease, 2) configurable roles and permissions, 3) customizable user interfaces and display labels and 4) configurable information trees including a geographical entity tree and a term tree. The system includes significant portions of functionality that is entirely or in large part re-used across diseases, which provides an economy of scope as new diseases downstream are added to the system at decreased cost.
Conclusions
We have developed a system with great potential for aiding disease control programs in their task to reduce the burden of dengue and malaria, including the implementation of integrated vector management programs. Next steps include evaluations of operational implementations of the current system with capacity for dengue and malaria, and the inclusion in the system platform of other important vector-borne diseases.
Author Summary
Emerging information technologies, such as data management system software packages, can help disease control programs to better manage and analyze their data, and thus make it easier to carry out continuous surveillance, monitor interventions and evaluate control program performance. This will lead to better informed decisions and actions. We have developed a multi-disease data management system platform with current capacity for dengue and malaria that supports data entry, storage and query. It also allows for production of maps and both standardized and customized reports. The system includes only software components that can be distributed without the user having to pay licensing costs. It was designed so that the user can adapt many aspects of the system to suit local conditions (for example roles and permissions, user interfaces and display labels and which functionality is included under a given disease) without having to involve software developers. In conclusion, we have developed a system capable of aiding disease control programs in their task to reduce the burden of dengue and malaria, including the implementation of integrated vector management programs. The next steps include operational implementations and evaluations of the current system with capacity for dengue and malaria, and the inclusion in the system platform of other important vector-borne diseases.
doi:10.1371/journal.pntd.0001016
PMCID: PMC3066141
PMID: 21468310
BoCaTFBS, a new method that combines noisy data from ChIP-chip experiments with known binding-site patterns, is described and applied to the ENCODE project.
Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin immunoprecipitation)-chip experiments with known binding site patterns. Our method (BoCaTFBS) uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments. We applied BoCaTFBS within the ENCODE project and showed that it outperforms many traditional binding site identification methods (for instance, profiles).
doi:10.1186/gb-2006-7-11-r102
PMCID: PMC1794589
PMID: 17078876
FastSPECT II is a recently commissioned 16-camera small-animal SPECT imager built with modular scintillation cameras and list-mode data-acquisition electronics. The instrument is housed in a lead-shielded enclosure and has exchangeable aperture assemblies and adjustable camera positions for selection of magnification, pinhole size, and field of view. The calibration of individual cameras and measurement of an overall system imaging matrix (1 mm3 voxels) are supported via a five-axis motion-control system.
Details of the system integration and results of characterization and performance measurements are presented along with first tomographic images. The dynamic imaging capabilities of the instrument are explored and discussed.
doi:10.1109/TNS.2004.830975
PMCID: PMC2945369
PMID: 20877439
Dynamic; high-resolution; list-mode; small animal; SPECT
Background
Sound policy, resource allocation and day-to-day management decisions in the health sector require timely information from routine health information systems (RHIS). In most low- and middle-income countries, the RHIS is viewed as being inadequate in providing quality data and continuous information that can be used to help improve health system performance. In addition, there is limited evidence on the effectiveness of RHIS strengthening interventions in improving data quality and use. The purpose of this study is to evaluate the usefulness of the newly developed Performance of Routine Information System Management (PRISM) framework, which consists of a conceptual framework and associated data collection and analysis tools to assess, design, strengthen and evaluate RHIS. The specific objectives of the study are: a) to assess the reliability and validity of the PRISM instruments and b) to assess the validity of the PRISM conceptual framework.
Methods
Facility- and worker-level data were collected from 110 health care facilities in twelve districts in Uganda in 2004 and 2007 using records reviews, structured interviews and self-administered questionnaires. The analysis procedures include Cronbach's alpha to assess internal consistency of selected instruments, test-retest analysis to assess the reliability and sensitivity of the instruments, and bivariate and multivariate statistical techniques to assess validity of the PRISM instruments and conceptual framework.
Results
Cronbach's alpha analysis suggests high reliability (0.7 or greater) for the indices measuring a promotion of a culture of information, RHIS tasks self-efficacy and motivation. The study results also suggest that a promotion of a culture of information influences RHIS tasks self-efficacy, RHIS tasks competence and motivation, and that self-efficacy and the presence of RHIS staff have a direct influence on the use of RHIS information, a key aspect of RHIS performance.
Conclusions
The study results provide some empirical support for the reliability and validity of the PRISM instruments and the validity of the PRISM conceptual framework, suggesting that the PRISM approach can be effectively used by RHIS policy makers and practitioners to assess the RHIS and evaluate RHIS strengthening interventions. However, additional studies with larger sample sizes are needed to further investigate the value of the PRISM instruments in exploring the linkages between RHIS data quality and use, and health systems performance.
doi:10.1186/1472-6963-10-188
PMCID: PMC2904760
PMID: 20598151
Genetic control of root development in rice is complex and the underlying mechanisms (constitutive and adaptive)
are poorly understood. Lowland and upland varieties of indica and japonica rice with
contrasting root development characteristics have been crossed, mapping populations developed and a number of QTLs in different
chromosomes were identified. As these studies have used different sets of markers and many of the QTLs identified are long, it
is difficult to exploit the varietal difference for improved root traits by marker assisted selection and for identification of
concerned alleles. Intensive data mining of literature resulted in the identification 861 root development QTLs and associated
microsatellite markers located on different chromosomes. The QTL and marker data generated and the genome sequence of rice were
used for construction of a relational database, Rootbrowse, using MySQL relational database management system and Bio::DB::GFF
schema. The data is viewed using GBrowse visualization tool. It graphically displays a section of the genome and all features
annotated on it including the QTLs. The QTLs can be displayed along with SSR markers, protein coding genes and/or known root
development genes for prediction of probable candidate genes.
Availability
Rootbrowse is freely available at
http://www.ricebrowse.org
PMCID: PMC2646863
PMID: 19255649
root development; rice; QTLs; candidate genes; auxin metabolism; gbrowse
The evaluation of new reagents and instruments in clinical chemistry leads to complex studies with large volumes of data, which are difficult to handle. This paper presents the design and development of a program that supports an evaluator in the
definition of a study, the generation of data structures, communication with the instrument (analyser), online and offline data capture and in the processing of the results. The program is called CAEv, and it runs on a standard PC under MS-DOS. Version 1 of the
program was tested in a multicentre instrument evaluation. The concept and the necessary hardware and software are discussed. In addition, requirements for instrument/host communication are given. The application of the laboratory part of CAEv is described from the user's point of view. The design of the program allows users a high degree of flexibility in defining their own standards with regard to study protocol, and/or experiments, without loss of performance. CAEv's main advantages are a pre-programmed study protocol, easy handling of large volumes of data, an immediate validation of the experimental results and the statistical evaluation of the data.
doi:10.1155/S1463924691000317
PMCID: PMC2547932
PMID: 18924902
The Intensive Care Unit is the area in patient care where the amount of patient data from a variety of sources is particularly large. The problem for clinicians lies in the ability to gather, and use these data in the decision making process. A well designed computer based patient data management system, incorporating a variety of data analysis tools, would have a dramatic impact in patient care in an environment such as this. The PDB System has been in continuous use at the Montreal General Hospital's Surgical and Trauma Intensive Care Unit since Jan. 88. Its initial implementation in two beds in our SICU has allowed the complete replacement of the conventional patient paper record. It is used by all ICU staff, including nurses, physicians, and ward clerks for the recording/viewing of all patient vital data, laboratory data, medications, and optionally chart notes. In addition, medical staff has the option to use the entered data to perform a variety of data analysis procedures.
PMCID: PMC2247703
PMID: 1807781
Introduction
A system to support decisions and operations in cases of epidemic emergency has been designed and implemented, in order to improve the decision-making capabilities of Veterinary Services for outbreaks of exotic diseases.
Methods
The system implementation consisted of: 1) drafting contingency plans for OIE List A diseases; 2) implementing an automated information network, linking Local Veterinary Unit and the Regional Epidemiological Centre; 3) implementing a Geographical Information System (GIS), to be automatically connected to the animal identification database and to the ANIMO (Animal Movement) system; 4) supplying the personnel of Veterinary Services with the necessary tools, instruments and materials; 5) personnel training. Integration of activities led to the implementation of a telematic support system for the management of epidemic emergencies, providing the Veterinary Services with the information necessary to the management of exotic disease outbreaks. The system has been implemented from a structural point of view as follows:
Data warehouse design and implementation, fed by ORACLE DATA MART SUITE operational databases,
Implementation within the Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise Web site, of a controlled access telematic system, where:
static pages were implemented in HTML and the dynamic ones in PERL;
GIS was used to design and update maps;
Downloading of documents and forms was made possible as well as the generation of tables and graphs in real time.
Results
In the event of an outbreak it is possible to: map relevant data and information (i.e. Protection and Surveillance Zones); to produce disease trend data both in tabular and graphical form and the indicators for the disease management and control. Contingency plans of OIE list A disease are provided through the Internet for consultation and downloading. All the forms for administrative and epidemiological data collection are provided for and can sent by e-mail to the proper veterinary authority and other stakeholders.
Discussion
The system has been tested both by a simulated foot and mouth disease outbreak and a real Swine Vesicular Disease outbreak. The existence of written and standardised procedures, the availability of updated and pertinent information for outbreaks management and the support of a telematic system has allowed the rationalisation the actions to be implemented and to speed up intervention time.
doi:10.2196/jmir.1.suppl1.e27
PMCID: PMC1761820
Telematic System; Disease Outbreak Management; Epidemic Emergencies;
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.
Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.
Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.
doi:10.1145/1516360.1516416
PMCID: PMC2860326
PMID: 20428322
graph similarity query; graph classification; hashing; graph kernels; k-NNs search
Chip-based DNA quantification systems are widespread, and used in many point-of-care applications. However, instruments for such applications may not be maintained or calibrated regularly. Since machine reliability is a key issue for normal operation, this study presents a system model of the real-time Polymerase Chain Reaction (PCR) machine to analyze the instrument design through numerical experiments. Based on model analysis, a systematic approach was developed to lower the variation of DNA quantification and achieve a robust design for a real-time PCR-on-a-chip system. Accelerated lift testing was adopted to evaluate the reliability of the chip prototype. According to the life test plan, this proposed real-time PCR-on-a-chip system was simulated to work continuously for over three years with similar reproducibility in DNA quantification. This not only shows the robustness of the lab-on-a-chip system, but also verifies the effectiveness of our systematic method for achieving a robust design.
doi:10.3390/s100100697
PMCID: PMC3270864
PMID: 22315563
DNA quantification reliability; robust design; system identification model; real-time PCR machine; real-time PCR on-a-chip
Recording human neurophysiological data in the teaching laboratory generally requires expensive instrumentation. From our experience in developing inexpensive equipment used in teaching neurophysiology laboratory exercises, we offer a strategy for the development of affordable and safe recording of human neurophysiological parameters. There are many resources available to guide the design and construction of electronic equipment that will record human biopotentials. An important consideration is subject safety, and the electrical characteristics of any equipment must meet strict galvanic isolation standards. Wireless data gathering offers the most complete isolation from 120VAC current. As an example, we present a homemade electrocardiogram recording circuit using only inexpensive and readily available components. We outline the feasibility of constructing equipment that meets the needs of the student laboratory for good data collection, and we consider the obstacles likely to be encountered in these projects. If students actively participate in the equipment design and construction, the process can also be a teaching tool. Students may gain a deeper understanding of the human neurobiology by making the electronic data acquisition and its presentation more transparent.
PMCID: PMC3592736
PMID: 23493343
human neurophysiology; electrocardiogram; teaching laboratories, laboratory equipment
Background
Biological processes such as metabolic pathways, gene regulation or protein-protein interactions are often represented as graphs in systems biology. The understanding of such networks, their analysis, and their visualization are today important challenges in life sciences. While a great variety of visualization tools that try to address most of these challenges already exists, only few of them succeed to bridge the gap between visualization and network analysis.
Findings
Medusa is a powerful tool for visualization and clustering analysis of large-scale biological networks. It is highly interactive and it supports weighted and unweighted multi-edged directed and undirected graphs. It combines a variety of layouts and clustering methods for comprehensive views and advanced data analysis. Its main purpose is to integrate visualization and analysis of heterogeneous data from different sources into a single network.
Conclusions
Medusa provides a concise visual tool, which is helpful for network analysis and interpretation. Medusa is offered both as a standalone application and as an applet written in Java. It can be found at: https://sites.google.com/site/medusa3visualization.
doi:10.1186/1756-0500-4-384
PMCID: PMC3197509
PMID: 21978489
graph; visualization; biological networks; clustering analysis; data integration