Azathioprine leads to changes in mean corpuscular volume (MCV) and white blood cell (WBC) indices reflecting efficacy or toxicity. Understanding the interactions between bone marrow stem cells and azathioprine could highlight abnormal response patterns as forerunners for hematologic malignancies. This study gives a statistical description of factors influencing the relationship between MCV and WBC in children with inflammatory bowel disease treated with azathioprine. We found that leukopenia preceded macrocytosis. Macrocytosis is therefore not a good predictor of leukopenia. Further studies will be necessary to determine the subgroup of patients at increased risk of malignancies based on bone marrow response.
Hepatitis B virus (HBV) infection is an increasingly important cause of morbidity and mortality in HIV-infected adults. This study aimed to determine the prevalence and incidence of HBV in the UK CHIC Study, a multicentre observational cohort.
Methods and Findings
12 HIV treatment centres were included. Of 37,331 patients, 27,450 had at least one test (HBsAg, anti-HBs or anti-HBc) result post-1996 available. 16,043 were white, 8,130 black and 3,277 other ethnicity. Route of exposure was homosexual sex 15,223 males, heterosexual sex 3,258 males and 5,384 females, injecting drug use 862 and other 2,723. The main outcome measures used were the cumulative prevalence and the incidence of HBV coinfection. HBV susceptible patients were followed up until HBsAg and/or anti-HBc seroconversion incident infection, evidence of vaccination or last visit. Poisson regression was used to determine associated factors. 25,973 had at least one HBsAg test result. Participants with HBsAg results were typically MSM (57%) and white (59%) (similar to the cohort as a whole). The cumulative prevalence of detectable HBsAg was 6.9% (6.6 to 7.2%). Among the 3,379 initially HBV-susceptible patients, the incidence of HBV infection was 1.7 (1.5 to 1.9)/100 person-years. Factors associated with incident infection were older age and IDU. The main limitation of the study was that 30% of participants did not have any HBsAg results available. However baseline characteristics of those with results did not differ from those of the whole cohort. Efforts are on-going to improve data collection.
The prevalence of HBV in UK CHIC is in line with estimates from other studies and low by international standards. Incident infection continued to occur even after entry to the cohort, emphasising the need to ensure early vaccination.
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open ‘data commoning’ culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared ‘Investigation-Study-Assay’ framework to support that vision.
InvertNet, one of the three Thematic Collection Networks (TCNs) funded in the first round of the U.S. National Science Foundation’s Advancing Digitization of Biological Collections (ADBC) program, is tasked with providing digital access to ~60 million specimens housed in 22 arthropod (primarily insect) collections at institutions distributed throughout the upper midwestern USA. The traditional workflow for insect collection digitization involves manually keying information from specimen labels into a database and attaching a unique identifier label to each specimen. This remains the dominant paradigm, despite some recent attempts to automate various steps in the process using more advanced technologies. InvertNet aims to develop improved semi-automated, high-throughput workflows for digitizing and providing access to invertebrate collections that balance the need for speed and cost-effectiveness with long-term preservation of specimens and accuracy of data capture. The proposed workflows build on recent methods for digitizing and providing access to high-quality images of multiple specimens (e.g., entire drawers of pinned insects) simultaneously. Limitations of previous approaches are discussed and possible solutions are proposed that incorporate advanced imaging and 3-D reconstruction technologies. InvertNet couples efficient digitization workflows with a highly robust network infrastructure capable of managing massive amounts of image data and related metadata and delivering high-quality images, including interactive 3-D reconstructions in real time via the Internet.
Collection digitization; collection database; image processing
Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
This report summarizes the proceedings of the second workshop of the ‘Minimum Information for Biological and Biomedical Investigations’ (MIBBI) consortium held on Dec 1-2, 2010 in Rüdesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role of MIBBI in MI checklist development was discussed. The meeting featured some thirty presentations, wide-ranging discussions and breakout groups. The top outcomes of the two-day workshop as defined by the participants were: 1) the chance to share best practices and to identify areas of synergy; 2) defining a series of tasks for updating the MIBBI Portal; 3) reemphasizing the need to maintain independent MI checklists for various communities while leveraging common terms and workflow elements contained in multiple checklists; and 4) revision of the concept of the MIBBI Foundry to focus on the creation of a core set of MIBBI modules intended for reuse by individual MI checklist projects while maintaining the integrity of each MI project. Further information about MIBBI and its range of activities can be found at http://mibbi.org/.
This report summarizes the proceedings of the one day BioSharing meeting held at the Intelligent Systems for Molecular Biology (ISMB) 2010 conference in Boston, MA, USA This inaugural BioSharing event was hosted by the Genomic Standards Consortium as part of its M3 & BioSharing special interest group (SIG) workshop. The BioSharing event included invited talks from a range of community leaders and a panel discussion at the end of the day. The panel session led to the formal agreement among community leaders to join together to promote cross-community knowledge exchange and collaborations. A key focus of the newly formed Biosharing community will be linking up resources to promote real-world data sharing (virtuous cycle of data) and supporting compliance with data policies through the creation of a one-stop-portal of information. Further information about the newly established BioSharing effort can be found at http://biosharing.org.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
Cartilage thickness from MR images has been identified as a possible biomarker in knee osteoarthritis (OA) research. The ability to acquire MR data at multiple centers by using different vendors' scanners would facilitate patient recruitment and shorten the duration of OA trials. Several vendors manufacture 3T MR scanners, including Siemens, Philips Medical Systems, and GE Healthcare. This study investigates whether quantitative MR assessments of cartilage morphology are comparable between scanners of three different vendors.
Twelve subjects with symptoms of knee OA and one or more risk factors had their symptomatic knee scanned on each of the three vendor's scanners located in three sites in the UK: Manchester (Philips), York (GE), and Liverpool (Siemens). The NIH OAI study protocol was used for the Siemens scanner, and equivalent protocols were developed for the Philips and GE scanners with vendors' advice. Cartilage was segmented manually from sagittal 3D images. By using recently described techniques for Anatomically Corresponded Regional Analysis of Cartilage (ACRAC), a statistical model was used anatomically to align all the images and to produce detailed maps of mean differences in cartilage-thickness measures between scanners. Measures of cartilage mean thickness were computed in anatomically equivalent regions for each subject and scanner image.
The ranges of mean cartilage-thickness measures for this cohort were similar for all regions and across all scanners. Philips intrascanner root-mean-square coefficients of variation were low in the range from 2.6% to 4.6%. No significant differences were found for thickness measures of the weight-bearing femorotibial regions from the Philips and Siemens images except for the central medial femur compartment (P = 0.04). Compared with the other two scanners, the GE scanner provided consistently lower mean thickness measures in the central femoral regions (mean difference, -0.16 mm) and higher measures in the tibial compartments (mean difference, +0.19 mm).
The OAI knee-imaging protocol, developed on the Siemens platform, can be applied to research and trials by using other vendors' 3T scanners giving comparable morphologic results. Accurate sequence optimization, differences in image postprocessing, and extremity coil type are critical factors for interscanner precision of quantitative analysis of cartilage morphology. It is still recommended that longitudinal observations on individuals should be performed on the same scanner and that assessment of intra- and interscanner precision errors is undertaken before commencement of the main study.
Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories.
Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org
The development of the Functional Genomics Investigation Ontology (FuGO) is a collaborative, international effort that will provide a resource for annotating functional genomics investigations, including the study design, protocols and instrumentation used, the data generated and the types of analysis performed on the data. FuGO will contain both terms that are universal to all functional genomics investigations and those that are domain specific. In this way, the ontology will serve as the “semantic glue” to provide a common understanding of data from across these disparate data sources. In addition, FuGO will reference out to existing mature ontologies to avoid the need to duplicate these resources, and will do so in such a way as to enable their ease of use in annotation. This project is in the early stages of development; the paper will describe efforts to initiate the project, the scope and organization of the project, the work accomplished to date, and the challenges encountered, as well as future plans.
The Minimum Information for Biological and Biomedical Investigations (MIBBI) project provides a resource for those exploring the range of extant minimum information checklists and fosters coordinated development of such checklists.
A wide variety of ontologies relevant to the biological and medical domains are available through the OBO Foundry portal, and their number is growing rapidly. Integration of these ontologies, while requiring considerable effort, is extremely desirable. However, heterogeneities in format and style pose serious obstacles to such integration. In particular, inconsistencies in naming conventions can impair the readability and navigability of ontology class hierarchies, and hinder their alignment and integration. While other sources of diversity are tremendously complex and challenging, agreeing a set of common naming conventions is an achievable goal, particularly if those conventions are based on lessons drawn from pooled practical experience and surveys of community opinion.
We summarize a review of existing naming conventions and highlight certain disadvantages with respect to general applicability in the biological domain. We also present the results of a survey carried out to establish which naming conventions are currently employed by OBO Foundry ontologies and to determine what their special requirements regarding the naming of entities might be. Lastly, we propose an initial set of typographic, syntactic and semantic conventions for labelling classes in OBO Foundry ontologies.
Adherence to common naming conventions is more than just a matter of aesthetics. Such conventions provide guidance to ontology creators, help developers avoid flaws and inaccuracies when editing, and especially when interlinking, ontologies. Common naming conventions will also assist consumers of ontologies to more readily understand what meanings were intended by the authors of ontologies used in annotating bodies of data.
As the size and complexity of scientific datasets and the corresponding information stores grow, standards for collecting, describing, formatting, submitting and exchanging information are playing an increasingly active role. Several initiatives occupy strategic positions in the international scenario, both within and across domains. However, the job of harmonising reporting standards is still very much a work in progress; both software interoperability and the data integration remain challenging as things stand.
The status quo with respect to standardization initiatives is summarized here, with particular emphasis on the motivation for, and the challenges of, ongoing synergistic activities amongst the academic community focused on the creation of truly interoperable standards.
Groups generating standards should engage with ongoing cross-domain activities to simplify the integration of heterogeneous data sets to the greatest possible extent.
The XML-based Real-Time PCR Data Markup Language (RDML) has been developed by the RDML consortium (http://www.rdml.org) to enable straightforward exchange of qPCR data and related information between qPCR instruments and third party data analysis software, between colleagues and collaborators and between experimenters and journals or public repositories. We here also propose data related guidelines as a subset of the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) to guarantee inclusion of key data information when reporting experimental results.
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
Incorporation of ontologies into annotations has enabled 'semantic integration' of complex data, making explicit the knowledge within a certain field. One of the major bottlenecks in developing bio-ontologies is the lack of a unified methodology. Different methodologies have been proposed for different scenarios, but there is no agreed-upon standard methodology for building ontologies. The involvement of geographically distributed domain experts, the need for domain experts to lead the design process, the application of the ontologies and the life cycles of bio-ontologies are amongst the features not considered by previously proposed methodologies.
Here, we present a methodology for developing ontologies within the biological domain. We describe our scenario, competency questions, results and milestones for each methodological stage. We introduce the use of concept maps during knowledge acquisition phases as a feasible transition between domain expert and knowledge engineer.
The contributions of this paper are the thorough description of the steps we suggest when building an ontology, example use of concept maps, consideration of applicability to the development of lower-level ontologies and application to decentralised environments. We have found that within our scenario conceptual maps played an important role in the development process.
PRIDE, the ‘PRoteomics IDEntifications database’ () is a database of protein and peptide identifications that have been described in the scientific literature. These identifications will typically be from specific species, tissues and sub-cellular locations, perhaps under specific disease conditions. Any post-translational modifications that have been identified on individual peptides can be described. These identifications may be annotated with supporting mass spectra. At the time of writing, PRIDE includes the full set of identifications as submitted by individual laboratories participating in the HUPO Plasma Proteome Project and a profile of the human platelet proteome submitted by the University of Ghent in Belgium. By late 2005 PRIDE is expected to contain the identifications and spectra generated by the HUPO Brain Proteome Project. Proteomics laboratories are encouraged to submit their identifications and spectra to PRIDE to support their manuscript submissions to proteomics journals. Data can be submitted in PRIDE XML format if identifications are included or mzData format if the submitter is depositing mass spectra without identifications. PRIDE is a web application, so submission, searching and data retrieval can all be performed using an internet browser. PRIDE can be searched by experiment accession number, protein accession number, literature reference and sample parameters including species, tissue, sub-cellular location and disease state. Data can be retrieved as machine-readable PRIDE or mzData XML (the latter for mass spectra without identifications), or as human-readable HTML.