Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)
Year of Publication
Document Types
1.  Machine learning to predict extubation outcome in premature infants 
Though treatment of the ventilated premature infant has experienced many advances over the past decades, determining the best time point for extubation of these infants remains challenging and the incidence of extubation failures largely unchanged. The objective was to provide clinicians with a decision-support tool to determine whether to extubate a mechanically ventilated premature infant by using a set of machine learning algorithms on a dataset assembled from 486 premature infants receiving mechanical ventilation.
Algorithms included artificial neural networks (ANN), support vector machine (SVM), naïve Bayesian classifier (NBC), boosted decision trees (BDT), and multivariable logistic regression (MLR). Results for ANN, MLR, and NBC were satisfactory (area under the curve [AUC]: 0.63–0.76); however, SVM and BDT consistently showed poor performance (AUC ~0.5).
Complex medical data such as the data set used for this study require further preprocessing steps before prediction models can be developed that achieve similar or better performance than clinicians.
PMCID: PMC4255563  PMID: 25485175
2.  Can Machine Learning Methods Predict Extubation Outcome in Premature Infants as well as Clinicians? 
Journal of neonatal biology  2013;2:1000118.
Though treatment of the prematurely born infant breathing with assistance of a mechanical ventilator has much advanced in the past decades, predicting extubation outcome at a given point in time remains challenging. Numerous studies have been conducted to identify predictors for extubation outcome; however, the rate of infants failing extubation attempts has not declined.
To develop a decision-support tool for the prediction of extubation outcome in premature infants using a set of machine learning algorithms
A dataset assembled from 486 premature infants on mechanical ventilation was used to develop predictive models using machine learning algorithms such as artificial neural networks (ANN), support vector machine (SVM), naïve Bayesian classifier (NBC), boosted decision trees (BDT), and multivariable logistic regression (MLR). Performance of all models was evaluated using area under the curve (AUC).
For some of the models (ANN, MLR and NBC) results were satisfactory (AUC: 0.63–0.76); however, two algorithms (SVM and BDT) showed poor performance with AUCs of ~0.5.
Clinician's predictions still outperform machine learning due to the complexity of the data and contextual information that may not be captured in clinical data used as input for the development of the machine learning algorithms. Inclusion of preprocessing steps in future studies may improve the performance of prediction models.
PMCID: PMC4238927  PMID: 25419493
Premature infant; mechanical ventilation; extubation; prediction; machine learning
3.  The Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards Initiative 
Proteomics  2010;10(17):3073-3081.
The Human Proteome Organisation’s Proteomics Standards Initiative (HUPO-PSI) has developed the GelML data exchange format for representing gel electrophoresis experiments performed in proteomics investigations. The format closely follows the reporting guidelines for gel electrophoresis, which are part of the Minimum Information About a Proteomics Experiment (MIAPE) set of modules. GelML supports the capture of metadata (such as experimental protocols) and data (such as gel images) resulting from gel electrophoresis so that laboratories can be compliant with the MIAPE Gel Electrophoresis guidelines, while allowing such data sets to be exchanged or downloaded from public repositories. The format is sufficiently flexible to capture data from a broad range of experimental processes, and complements other PSI formats for mass spectrometry data and the results of protein and peptide identifications to capture entire gel-based proteome workflows. GelML has resulted from the open standardisation process of PSI consisting of both public consultation and anonymous review of the specifications.
PMCID: PMC3193076  PMID: 20677327
data standard; gel electrophoresis; database; ontology
4.  S3QL: A distributed domain specific language for controlled semantic integration of life sciences data 
BMC Bioinformatics  2011;12:285.
The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.
We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data.
Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases.
S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.
PMCID: PMC3155508  PMID: 21756325
S3DB; Linked Data; KOS; RDF; SPARQL; knowledge organization system, policy
5.  Prediction of urinary protein markers in lupus nephritis 
Kidney international  2005;68(6):2588-2592.
Lupus nephritis is divided into six classes and scored according to activity and chronicity indices based on histologic findings. Treatment differs based on the pathologic findings. Renal biopsy is currently the only way to accurately predict class and activity and chronicity indices. We propose to use patterns of abundance of urine proteins to identify class and disease indices.
Urine was collected from 20 consecutive patients immediately prior to biopsy for evaluation of lupus nephritis. The International Society of Nephrology/Renal Pathology Society (ISN/RPS) class of lupus nephritis, activity, and chronicity indices were determined by a renal pathologist. Proteins were separated by two-dimensional gel electrophoresis. Artificial neural networks were trained on normalized spot abundance values.
Biopsy specimens were classified in the database according to ISN/RPS class, activity, and chronicity. Nine samples had characteristics of more than one class present. Receiver operating characteristic (ROC) curves of the trained networks demonstrated areas under the curve ranging from 0.85 to 0.95. The sensitivity and specificity for the ISN/RPS classes were class II 100%, 100%; III 86%, 100%; IV 100%, 92%; and V 92%, 50%. Activity and chronicity indices had r values of 0.77 and 0.87, respectively. A list of spots was obtained that provided diagnostic sensitivity to the analysis.
We have identified a list of protein spots that can be used to develop a clinical assay to predict ISN/RPS class and chronicity for patients with lupus nephritis. An assay based on antibodies against these spots could eliminate the need for renal biopsy, allow frequent evaluation of disease status, and begin specific therapy for patients with lupus nephritis.
PMCID: PMC2667626  PMID: 16316334
lupus nephritis; biomarkers; urine; electrophoresis; two-dimensional gel
6.  RPPAML/RIMS: A metadata format and an information management system for reverse phase protein arrays 
BMC Bioinformatics  2008;9:555.
Reverse Phase Protein Arrays (RPPA) are convenient assay platforms to investigate the presence of biomarkers in tissue lysates. As with other high-throughput technologies, substantial amounts of analytical data are generated. Over 1000 samples may be printed on a single nitrocellulose slide. Up to 100 different proteins may be assessed using immunoperoxidase or immunoflorescence techniques in order to determine relative amounts of protein expression in the samples of interest.
In this report an RPPA Information Management System (RIMS) is described and made available with open source software. In order to implement the proposed system, we propose a metadata format known as reverse phase protein array markup language (RPPAML). RPPAML would enable researchers to describe, document and disseminate RPPA data. The complexity of the data structure needed to describe the results and the graphic tools necessary to visualize them require a software deployment distributed between a client and a server application. This was achieved without sacrificing interoperability between individual deployments through the use of an open source semantic database, S3DB. This data service backbone is available to multiple client side applications that can also access other server side deployments. The RIMS platform was designed to interoperate with other data analysis and data visualization tools such as Cytoscape.
The proposed RPPAML data format hopes to standardize RPPA data. Standardization of data would result in diverse client applications being able to operate on the same set of data. Additionally, having data in a standard format would enable data dissemination and data analysis.
PMCID: PMC2639439  PMID: 19102773
7.  A Semantic Web Management Model for Integrative Biomedical Informatics 
PLoS ONE  2008;3(8):e2946.
Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.
Methodology/Principal Findings
The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MDAnderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.
The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis.
PMCID: PMC2491554  PMID: 18698353
8.  An open-source representation for 2-DE-centric proteomics and support infrastructure for data storage and analysis 
BMC Bioinformatics  2008;9:4.
In spite of two-dimensional gel electrophoresis (2-DE) being an effective and widely used method to screen the proteome, its data standardization has still not matured to the level of microarray genomics data or mass spectrometry approaches. The trend toward identifying encompassing data standards has been expanding from genomics to transcriptomics, and more recently to proteomics. The relative success of genomic and transcriptomic data standardization has enabled the development of central repositories such as GenBank and Gene Expression Omnibus. An equivalent 2-DE-centric data structure would similarly have to include a balance among raw data, basic feature detection results, sufficiency in the description of the experimental context and methods, and an overall structure that facilitates a diversity of usages, from central reposition to local data representation in LIMs systems.
Results & Conclusion
Achieving such a balance can only be accomplished through several iterations involving bioinformaticians, bench molecular biologists, and the manufacturers of the equipment and commercial software from which the data is primarily generated. Such an encompassing data structure is described here, developed as the mature successor to the well established and broadly used earlier version. A public repository, AGML Central, is configured with a suite of tools for the conversion from a variety of popular formats, web-based visualization, and interoperation with other tools and repositories, and is particularly mass-spectrometry oriented with I/O for annotation and data analysis.
PMCID: PMC2231339  PMID: 18179696
9.  N-acetyl-L-cysteine ameliorates the inflammatory disease process in experimental autoimmune encephalomyelitis in Lewis rats 
We report that N-acetyl-L-cysteine (NAC) treatment blocked induction of TNF-α, IL-1β, IFN-γ and iNOS in the CNS and attenuated clinical disease in the myelin basic protein induced model of experimental allergic encephalomyelitis (EAE) in Lewis rats. Infiltration of mononuclear cells into the CNS and induction of inflammatory cytokines and iNOS in multiple sclerosis (MS) and EAE have been implicated in subsequent disease progression and pathogenesis. To understand the mechanism of efficacy of NAC against EAE, we examined its effect on the production of cytokines and the infiltration of inflammatory cells into the CNS. NAC treatment attenuated the transmigration of mononuclear cells thereby lessening the neuroinflammatory disease. Splenocytes from NAC-treated EAE animals showed reduced IFN-γ production, a Th1 cytokine and increased IL-10 production, an anti-inflammatory cytokine. Further, splenocytes from NAC-treated EAE animals also showed decreased nitrite production when stimulated in vitro by LPS. These observations indicate that NAC treatment may be of therapeutic value in MS against the inflammatory disease process associated with the infiltration of activated mononuclear cells into the CNS.
PMCID: PMC1097751  PMID: 15869713
EAE; Macrophages; infiltration N-acetyl-L-cysteine; CNS
10.  An XML standard for the dissemination of annotated 2D gel electrophoresis data complemented with mass spectrometry results 
BMC Bioinformatics  2004;5:9.
Many proteomics initiatives require a seamless bioinformatics integration of a range of analytical steps between sample collection and systems modeling immediately assessable to the participants involved in the process. Proteomics profiling by 2D gel electrophoresis to the putative identification of differentially expressed proteins by comparison of mass spectrometry results with reference databases, includes many components of sample processing, not just analysis and interpretation, are regularly revisited and updated. In order for such updates and dissemination of data, a suitable data structure is needed. However, there are no such data structures currently available for the storing of data for multiple gels generated through a single proteomic experiments in a single XML file. This paper proposes a data structure based on XML standards to fill the void that exists between data generated by proteomics experiments and storing of data.
In order to address the resulting procedural fluidity we have adopted and implemented a data model centered on the concept of annotated gel (AG) as the format for delivery and management of 2D Gel electrophoresis results. An eXtensible Markup Language (XML) schema is proposed to manage, analyze and disseminate annotated 2D Gel electrophoresis results. The structure of AG objects is formally represented using XML, resulting in the definition of the AGML syntax presented here.
The proposed schema accommodates data on the electrophoresis results as well as the mass-spectrometry analysis of selected gel spots. A web-based software library is being developed to handle data storage, analysis and graphic representation. Computational tools described will be made available at . Our development of AGML provides a simple data structure for storing 2D gel electrophoresis data.
PMCID: PMC341449  PMID: 15005801

Results 1-10 (10)