PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (40)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  Standardized Metadata for Human Pathogen/Vector Genomic Sequences 
PLoS ONE  2014;9(6):e99979.
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
doi:10.1371/journal.pone.0099979
PMCID: PMC4061050  PMID: 24936976
3.  CRITICAL ASSESSMENT OF AUTOMATED FLOW CYTOMETRY DATA ANALYSIS TECHNIQUES 
Nature methods  2013;10(3):228-238.
Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks – mammalian cell population identification to determine if automated algorithms can reproduce expert manual gating, and sample classification to determine if analysis pipelines can identify characteristics that correlate with external variables (e.g., clinical outcome). This analysis presents the results of the first of these challenges. Several methods performed well compared to manual gating or external variables using statistical performance measures, suggesting that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
doi:10.1038/nmeth.2365
PMCID: PMC3906045  PMID: 23396282
4.  Influenza Virus Sequence Feature Variant Type Analysis: Evidence of a Role for NS1 in Influenza Virus Host Range Restriction 
Journal of Virology  2012;86(10):5857-5866.
Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.
doi:10.1128/JVI.06901-11
PMCID: PMC3347290  PMID: 22398283
5.  Influenza Research Database: An integrated bioinformatics resource for influenza research and surveillance 
The recent experience with the emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about viral characteristics related to antiviral drug resistance and virulence. The Influenza Research Database (IRD, www.fludb.org) is a free, publicly accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases (NIAID) through the Bioinformatics Resource Centers (BRC) program. The IRD provides a comprehensive, integrated database, and analysis resource for influenza sequence, surveillance, and research data. It also provides user-friendly interfaces for data retrieval and visualization, comparative genomics analysis, and personal log in-protected “workbench” spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. The goal of the IRD is to provide a resource that helps researchers identify root causes of virus pathogenicity and host range restriction, and facilitates the development of vaccines, diagnostics, and therapeutics.
doi:10.1111/j.1750-2659.2011.00331.x
PMCID: PMC3345175  PMID: 22260278
6.  Toward an Ontology-Based Framework for Clinical Research Databases 
Clinical research includes a wide range of study designs from focused observational studies to complex interventional studies with multiple study arms, treatment and assessment events, and specimen procurement procedures. Participant characteristics from case report forms need to be integrated with molecular characteristics from mechanistic experiments on procured specimens. In order to capture and manage this diverse array of data, we have developed the Ontology-Based eXtensible conceptual model (OBX) to serve as a framework for clinical research data in the Immunology Database and Analysis Portal (ImmPort). By designing OBX around the logical structure of the Basic Formal Ontology (BFO) and the Ontology for Biomedical Investigations (OBI), we have found that a relatively simple conceptual model can represent the relatively complex domain of clinical research. In addition, the common framework provided by BFO makes it straightforward to develop data dictionaries based on reference and application ontologies from the OBO Foundry.
doi:10.1016/j.jbi.2010.05.001
PMCID: PMC2953614  PMID: 20460173
Ontology; Clinical Trials; Biomaterial Transformation; Assay; Data Transformation; Conceptual Model
7.  GenePattern flow cytometry suite 
Background
Traditional flow cytometry data analysis is largely based on interactive and time consuming analysis of series two dimensional representations of up to 20 dimensional data. Recent technological advances have increased the amount of data generated by the technology and outpaced the development of data analysis approaches. While there are advanced tools available, including many R/BioConductor packages, these are only accessible programmatically and therefore out of reach for most experimentalists. GenePattern is a powerful genomic analysis platform with over 200 tools for analysis of gene expression, proteomics, and other data. A web-based interface provides easy access to these tools and allows the creation of automated analysis pipelines enabling reproducible research.
Results
In order to bring advanced flow cytometry data analysis tools to experimentalists without programmatic skills, we developed the GenePattern Flow Cytometry Suite. It contains 34 open source GenePattern flow cytometry modules covering methods from basic processing of flow cytometry standard (i.e., FCS) files to advanced algorithms for automated identification of cell populations, normalization and quality assessment. Internally, these modules leverage from functionality developed in R/BioConductor. Using the GenePattern web-based interface, they can be connected to build analytical pipelines.
Conclusions
GenePattern Flow Cytometry Suite brings advanced flow cytometry data analysis capabilities to users with minimal computer skills. Functionality previously available only to skilled bioinformaticians is now easily accessible from a web browser.
doi:10.1186/1751-0473-8-14
PMCID: PMC3717030  PMID: 23822732
Flow cytometry; Data analysis; GenePattern; FCS; Data preprocessing; Quality assessment; Normalization; Clustering
8.  Extracting Actionable Findings of Appendicitis from Radiology Reports Using Natural Language Processing  
Radiology reports often contain findings about the condition of a patient which should be acted upon quickly. These actionable findings in a radiology report can be automatically detected to ensure that the referring physician is notified about such findings and to provide feedback to the radiologist that further action has been taken. In this paper we investigate a method for detecting actionable findings of appendicitis in radiology reports. The method identifies both individual assertions regarding the presence of appendicitis and other findings related to appendicitis using syntactic dependency patterns. All relevant individual statements from a report are collectively considered to determine whether the report is consistent with appendicitis. Evaluation on a corpus of 400 radiology reports annotated by two expert radiologists showed that our approach achieves a precision of 91%, a recall of 83%, and an F1-measure of 87%.
PMCID: PMC3845763  PMID: 24303268
9.  A Machine Learning Approach for Identifying Anatomical Locations of Actionable Findings in Radiology Reports 
Recognizing the anatomical location of actionable findings in radiology reports is an important part of the communication of critical test results between caregivers. One of the difficulties of identifying anatomical locations of actionable findings stems from the fact that anatomical locations are not always stated in a simple, easy to identify manner. Natural language processing techniques are capable of recognizing the relevant anatomical location by processing a diverse set of lexical and syntactic contexts that correspond to the various ways that radiologists represent spatial relations. We report a precision of 86.2%, recall of 85.9%, and F1-measure of 86.0 for extracting the anatomical site of an actionable finding. Additionally, we report a precision of 73.8%, recall of 69.8%, and F1-measure of 71.8 for extracting an additional anatomical site that grounds underspecified locations. This demonstrates promising results for identifying locations, while error analysis reveals challenges under certain contexts. Future work will focus on incorporating new forms of medical language processing to improve performance and transitioning our method to new types of clinical data.
PMCID: PMC3540484  PMID: 23304352
10.  Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research Community 
Viruses  2012;4(11):3209-3226.
Several viruses within the Coronaviridae family have been categorized as either emerging or re-emerging human pathogens, with Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) being the most well known. The NIAID-sponsored Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org) supports bioinformatics workflows for a broad range of human virus pathogens and other related viruses, including the entire Coronaviridae family. ViPR provides access to sequence records, gene and protein annotations, immune epitopes, 3D structures, host factor data, and other data types through an intuitive web-based search interface. Records returned from these queries can then be subjected to web-based analyses including: multiple sequence alignment, phylogenetic inference, sequence variation determination, BLAST comparison, and metadata-driven comparative genomics statistical analysis. Additional tools exist to display multiple sequence alignments, view phylogenetic trees, visualize 3D protein structures, transfer existing reference genome annotations to new genomes, and store or share results from any search or analysis within personal private ‘Workbench’ spaces for future access. All of the data and integrated analysis and visualization tools in ViPR are made available without charge as a service to the Coronaviridae research community to facilitate the research and development of diagnostics, prophylactics, vaccines and therapeutics against these human pathogens.
doi:10.3390/v4113209
PMCID: PMC3509690  PMID: 23202522
virus; database; bioinformatics; Coronavirus; SARS; SARS-CoV; Coronaviridae; comparative genomics
11.  Hematopoietic Cell Types: Prototype for a Revised Cell Ontology 
The Cell Ontology (CL) aims for the representation of in vivo and in vitro cell types from all of biology. The CL is a candidate reference ontology of the OBO Foundry and requires extensive revision to bring it up to current standards for biomedical ontologies, both in its structure and its coverage of various subfields of biology. We have now addressed the specific content of one area of the CL, the section of the ontology dealing with hematopoietic cells. This section has been extensively revised to improve its content and eliminate multiple inheritance in the asserted hierarchy, and the groundwork was laid for structuring the hematopoietic cell type terms as cross-products incorporating logical definitions built from relationships to external ontologies, such as the Protein Ontology and the Gene Ontology. The methods and improvements to the CL in this area represent a paradigm for improvement of the entire ontology over time.
doi:10.1016/j.jbi.2010.01.006
PMCID: PMC2892030  PMID: 20123131
ontology; hematopoietic cells; immunology
12.  A gene selection method for GeneChip array data with small sample sizes 
BMC Genomics  2011;12(Suppl 5):S7.
Background
In microarray experiments with small sample sizes, it is a challenge to estimate p-values accurately and decide cutoff p-values for gene selection appropriately. Although permutation-based methods have proved to have greater sensitivity and specificity than the regular t-test, their p-values are highly discrete due to the limited number of permutations available in very small sample sizes. Furthermore, estimated permutation-based p-values for true nulls are highly correlated and not uniformly distributed between zero and one, making it difficult to use current false discovery rate (FDR)-controlling methods.
Results
We propose a model-based information sharing method (MBIS) that, after an appropriate data transformation, utilizes information shared among genes. We use a normal distribution to model the mean differences of true nulls across two experimental conditions. The parameters of the model are then estimated using all data in hand. Based on this model, p-values, which are uniformly distributed from true nulls, are calculated. Then, since FDR-controlling methods are generally not well suited to microarray data with very small sample sizes, we select genes for a given cutoff p-value and then estimate the false discovery rate.
Conclusion
Simulation studies and analysis using real microarray data show that the proposed method, MBIS, is more powerful and reliable than current methods. It has wide application to a variety of situations.
doi:10.1186/1471-2164-12-S5-S7
PMCID: PMC3287503  PMID: 22369149
13.  Minimum Information about a Genotyping Experiment (MIGEN) 
Standards in Genomic Sciences  2011;5(2):224-229.
Genotyping experiments are widely used in clinical and basic research laboratories to identify associations between genetic variations and normal/abnormal phenotypes. Genotyping assay techniques vary from single genomic regions that are interrogated using PCR reactions to high throughput assays examining genome-wide sequence and structural variation. The resulting genotype data may include millions of markers of thousands of individuals, requiring various statistical, modeling or other data analysis methodologies to interpret the results. To date, there are no standards for reporting genotyping experiments. Here we present the Minimum Information about a Genotyping Experiment (MIGen) standard, defining the minimum information required for reporting genotyping experiments. MIGen standard covers experimental design, subject description, genotyping procedure, quality control and data analysis. MIGen is a registered project under MIBBI (Minimum Information for Biological and Biomedical Investigations) and is being developed by an interdisciplinary group of experts in basic biomedical science, clinical science, biostatistics and bioinformatics. To accommodate the wide variety of techniques and methodologies applied in current and future genotyping experiment, MIGen leverages foundational concepts from the Ontology for Biomedical Investigations (OBI) for the description of the various types of planned processes and implements a hierarchical document structure. The adoption of MIGen by the research community will facilitate consistent genotyping data interpretation and independent data validation. MIGen can also serve as a framework for the development of data models for capturing and storing genotyping results and experiment metadata in a structured way, to facilitate the exchange of metadata.
doi:10.4056/sigs.1994602
PMCID: PMC3235517  PMID: 22180825
14.  ViPR: an open bioinformatics database and analysis resource for virology research 
Nucleic Acids Research  2011;40(Database issue):D593-D598.
The Virus Pathogen Database and Analysis Resource (ViPR, www.ViPRbrc.org) is an integrated repository of data and analysis tools for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. ViPR contains information for human pathogenic viruses belonging to the Arenaviridae, Bunyaviridae, Caliciviridae, Coronaviridae, Flaviviridae, Filoviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Poxviridae, Reoviridae, Rhabdoviridae and Togaviridae families, with plans to support additional virus families in the future. ViPR captures various types of information, including sequence records, gene and protein annotations, 3D protein structures, immune epitope locations, clinical and surveillance metadata and novel data derived from comparative genomics analysis. Analytical and visualization tools for metadata-driven statistical sequence analysis, multiple sequence alignment, phylogenetic tree construction, BLAST comparison and sequence variation determination are also provided. Data filtering and analysis workflows can be combined and the results saved in personal ‘Workbenches’ for future use. ViPR tools and data are available without charge as a service to the virology research community to help facilitate the development of diagnostics, prophylactics and therapeutics for priority pathogens and other viruses.
doi:10.1093/nar/gkr859
PMCID: PMC3245011  PMID: 22006842
15.  Elucidation of Seventeen Human Peripheral Blood B cell Subsets and Quantification of the Tetanus Response Using a Density-Based Method for the Automated Identification of Cell Populations in Multidimensional Flow Cytometry Data 
Cytometry. Part B, Clinical cytometry  2010;78(Suppl 1):S69-S82.
Background
Advances in multi-parameter flow cytometry (FCM) now allow for the independent detection of larger numbers of fluorochromes on individual cells, generating data with increasingly higher dimensionality. The increased complexity of these data has made it difficult to identify cell populations from high-dimensional FCM data using traditional manual gating strategies based on single-color or two-color displays.
Methods
To address this challenge, we developed a novel program, FLOCK (FLOw Clustering without K), that uses a density-based clustering approach to algorithmically identify biologically relevant cell populations from multiple samples in an unbiased fashion, thereby eliminating operator-dependent variability.
Results
FLOCK was used to objectively identify seventeen distinct B cell subsets in a human peripheral blood sample and to identify and quantify novel plasmablast subsets responding transiently to tetanus and other vaccinations in peripheral blood. FLOCK has been implemented in the publically available Immunology Database and Analysis Portal – ImmPort (http://www.immport.org) for open use by the immunology research community.
Conclusions
FLOCK is able to identify cell subsets in experiments that use multi-parameter flow cytometry through an objective, automated computational approach. The use of algorithms like FLOCK for FCM data analysis obviates the need for subjective and labor intensive manual gating to identify and quantify cell subsets. Novel populations identified by these computational approaches can serve as hypotheses for further experimental study.
doi:10.1002/cyto.b.20554
PMCID: PMC3084630  PMID: 20839340
flow cytometry; density-based analysis; data clustering; tetanus vaccination; B lymphocyte subsets
16.  GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach 
Bioinformatics  2010;26(7):905-911.
Motivation: A typical approach for the interpretation of high-throughput experiments, such as gene expression microarrays, is to produce groups of genes based on certain criteria (e.g. genes that are differentially expressed). To gain more mechanistic insights into the underlying biology, overrepresentation analysis (ORA) is often conducted to investigate whether gene sets associated with particular biological functions, for example, as represented by Gene Ontology (GO) annotations, are statistically overrepresented in the identified gene groups. However, the standard ORA, which is based on the hypergeometric test, analyzes each GO term in isolation and does not take into account the dependence structure of the GO-term hierarchy.
Results: We have developed a Bayesian approach (GO-Bayes) to measure overrepresentation of GO terms that incorporates the GO dependence structure by taking into account evidence not only from individual GO terms, but also from their related terms (i.e. parents, children, siblings, etc.). The Bayesian framework borrows information across related GO terms to strengthen the detection of overrepresentation signals. As a result, this method tends to identify sets of closely related GO terms rather than individual isolated GO terms. The advantage of the GO-Bayes approach is demonstrated with a simulation study and an application example.
Contact: song.zhang@utsouthwestern.edu; richard.scheuermann@utsouthwestern.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq059
PMCID: PMC2913669  PMID: 20176581
17.  Novel sequence feature variant type analysis of the HLA genetic association in systemic sclerosis 
Human Molecular Genetics  2009;19(4):707-719.
We describe a novel approach to genetic association analyses with proteins sub-divided into biologically relevant smaller sequence features (SFs), and their variant types (VTs). SFVT analyses are particularly informative for study of highly polymorphic proteins such as the human leukocyte antigen (HLA), given the nature of its genetic variation: the high level of polymorphism, the pattern of amino acid variability, and that most HLA variation occurs at functionally important sites, as well as its known role in organ transplant rejection, autoimmune disease development and response to infection. Further, combinations of variable amino acid sites shared by several HLA alleles (shared epitopes) are most likely better descriptors of the actual causative genetic variants. In a cohort of systemic sclerosis patients/controls, SFVT analysis shows that a combination of SFs implicating specific amino acid residues in peptide binding pockets 4 and 7 of HLA-DRB1 explains much of the molecular determinant of risk.
doi:10.1093/hmg/ddp521
PMCID: PMC2807365  PMID: 19933168
18.  Technical and Policy Approaches to Balancing Patient Privacy and Data Sharing in Clinical and Translational Research 
Clinical researchers need to share data to support scientific validation and information reuse, and to comply with a host of regulations and directives from funders. Various organizations are constructing informatics resources in the form of centralized databases to ensure widespread availability of data derived from sponsored research. The widespread use of such open databases is contingent on the protection of patient privacy.
In this paper, we review several aspects of the privacy-related problems associated with data sharing for clinical research from technical and policy perspectives. We begin with a review of existing policies for secondary data sharing and privacy requirements in the context of data derived from research and clinical settings. In particular, we focus on policies specified by the U.S. National Institutes of Health and the Health Insurance Portability and Accountability Act and touch upon how these policies are related to current, as well as future, use of data stored in public database archives.
Next, we address aspects of data privacy and “identifiability” from a more technical perspective, and review how biomedical databanks can be exploited and seemingly anonymous records can be “re-identified” using various resources without compromising or hacking into secure computer systems. We highlight which data features specified in clinical research data models are potentially vulnerable or exploitable. In the process, we recount a recent privacy-related concern associated with the publication of aggregate statistics from pooled genome-wide association studies that has had a significant impact on the data sharing policies of NIH-sponsored databanks.
Finally, we conclude with a list of recommendations that cover various technical, legal, and policy mechanisms that open clinical databases can adopt to strengthen data privacy protections as they move toward wider deployment and adoption.
doi:10.231/JIM.0b013e3181c9b2ea
PMCID: PMC2836827  PMID: 20051768
Clinical Research; Translational Research; Databases; Privacy
19.  SEQUENCE FEATURE VARIANT TYPE (SFVT) ANALYSIS OF THE HLA GENETIC ASSOCIATION IN JUVENILE IDIOPATHIC ARTHRITIS 
The immune response HLA class II DRB1 gene provides the major genetic contribution to Juvenile Idiopathic Arthritis (JIA), with a hierarchy of predisposing through intermediate to protective effects. With JIA, and the many other HLA associated diseases, it is difficult to identify the combinations of biologically relevant amino acid (AA) residues directly involved in disease due to the high level of HLA polymorphism, the pattern of AA variability, including varying degrees of linkage disequilibrium (LD), and the fact that most HLA variation occurs at functionally important sites. In a subset of JIA patients with the clinical phenotype oligoarticular-persistent (OP), we have applied a recently developed novel approach to genetic association analyses with genes/proteins sub-divided into biologically relevant smaller sequence features (SFs), and their “alleles” which are called variant types (VTs). With SFVT analysis, association tests are performed on variation at biologically relevant SFs based on structural (e.g., beta-strand 1) and functional (e.g., peptide binding site) features of the protein. We have extended the SFVT analysis pipeline to additionally include pairwise comparisons of DRB1 alleles within serogroup classes, our extension of the Salamon Unique Combinations algorithm, and LD patterns of AA variability to evaluate the SFVT results; all of which contributed additional complementary information. With JIA-OP, we identified a set of single AA SFs, and SFs in which they occur, particularly pockets of the peptide binding site, that account for the major disease risk attributable to HLA DRB1. These are (in numeric order): AAs 13 (pockets 4 and 6), 37 and 57 (both pocket 9), 67 (pocket 7), 74 (pocket 4), and 86 (pocket 1), and to a lesser extent 30 (pockets 6 and 7) and 71 (pockets 4, 5, and 7).
PMCID: PMC2958177  PMID: 19908388
20.  Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance 
Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416.
Background
The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics.
Design
The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly-accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user-friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in-protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature.
Results
To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross-protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks.
Conclusions
The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics.
doi:10.1111/j.1750-2659.2011.00331.x
PMCID: PMC3345175  PMID: 22260278
Bioinformatics; epitope; influenza virus; integrated; surveillance
21.  Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop 
Viruses  2010;2(10):2258-2268.
Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
doi:10.3390/v2102258
PMCID: PMC3185566  PMID: 21994619
virus; genome; annotation
22.  Synergies and Distinctions between Computational Disciplines in Biomedical Research: Perspective from the Clinical and Translational Science Award Programs 
Clinical and translational research increasingly requires computation. Projects may involve multiple computationally-oriented groups including information technology (IT) professionals, computer scientists and biomedical informaticians. However, many biomedical researchers are not aware of the distinctions among these complementary groups, leading to confusion, delays and sub-optimal results. Although written from the perspective of clinical and translational science award (CTSA) programs within academic medical centers, the paper addresses issues that extend beyond clinical and translational research. The authors describe the complementary but distinct roles of operational IT, research IT, computer science and biomedical informatics using a clinical data warehouse as a running example. In general, IT professionals focus on technology. The authors distinguish between two types of IT groups within academic medical centers: central or administrative IT (supporting the administrative computing needs of large organizations) and research IT (supporting the computing needs of researchers). Computer scientists focus on general issues of computation such as designing faster computers or more efficient algorithms, rather than specific applications. In contrast, informaticians are concerned with data, information and knowledge. Biomedical informaticians draw on a variety of tools, including but not limited to computers, to solve information problems in health care and biomedicine. The paper concludes with recommendations regarding administrative structures that can help to maximize the benefit of computation to biomedical research within academic health centers.
doi:10.1097/ACM.0b013e3181a8144d
PMCID: PMC2884382  PMID: 19550198
23.  The Human Studies Database Project: Federating Human Studies Design Data Using the Ontology of Clinical Research 
Human studies, encompassing interventional and observational studies, are the most important source of evidence for advancing our understanding of health, disease, and treatment options. To promote discovery, the design and results of these studies should be made machine-readable for large-scale data mining, synthesis, and re-analysis. The Human Studies Database Project aims to define and implement an informatics infrastructure for institutions to share the design of their human studies. We have developed the Ontology of Clinical Research (OCRe) to model study features such as design type, interventions, and outcomes to support scientific query and analysis. We are using OCRe as the reference semantics for federated data sharing of human studies over caGrid, and are piloting this implementation with several Clinical and Translational Science Award (CTSA) institutions.
PMCID: PMC3041546  PMID: 21347149
24.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration 
Nature biotechnology  2007;25(11):1251.
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
doi:10.1038/nbt1346
PMCID: PMC2814061  PMID: 17989687
25.  MIFlowCyt: The Minimum Information about a Flow Cytometry Experiment 
Background
A fundamental tenet of scientific research is that published results are open to independent validation and refutation. Minimum data standards aid data providers, users and publishers by providing a specification of what is required to unambiguously interpret experimental findings. Here, we present the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard, stating the minimum information required to report flow cytometry (FCM) experiments.
Methods
We brought together a cross-disciplinary international collaborative group of bioinformaticians, computational statisticians, software developers, instrument manufacturers, and clinical and basic research scientists to develop the standard. The standard was subsequently vetted by the International Society for the Advancement of Cytometry (ISAC) Data Standards Task Force, Standards Committee, membership and Council.
Results
The MIFlowCyt standard includes recommendations about descriptions of the specimens and reagents included in the FCM experiment, the configuration of the instrument used to perform the assays and the data processing approaches used to interpret the primary output data.
MIFlowCyt has been adopted as a standard by ISAC, representing the flow cytometry scientific community including scientists as well as software and hardware manufactures.
Conclusions
MIFlowCyt’s adoption by the scientific and publishing communities will facilitate third-party understanding and reuse of flow cytometry data.
doi:10.1002/cyto.a.20623
PMCID: PMC2773297  PMID: 18752282
immunology; fluorescence-activated cell sorting; knowledge representation

Results 1-25 (40)