Search tips
Search criteria

Results 1-23 (23)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
author:("hong, Julia")
1.  Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge 
Bioinformatics  2013;29(22):2892-2899.
Motivation: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein.
Results: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team.
The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams.
Availability: The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3810846  PMID: 23966112
2.  Characterization of the Vitrocell® 24/48 in vitro aerosol exposure system using mainstream cigarette smoke 
Only a few exposure systems are presently available that enable cigarette smoke exposure of living cells at the air–liquid interface, of which one of the most versatile is the Vitrocell® system (Vitrocell® Systems GmbH). To assess its performance and optimize the exposure conditions, we characterized a Vitrocell® 24/48 system connected to a 30-port carousel smoking machine. The Vitrocell® 24/48 system allows for simultaneous exposure of 48 cell culture inserts using dilution airflow rates of 0–3.0 L/min and exposes six inserts per dilution. These flow rates represent cigarette smoke concentrations of 7–100%.
By characterizing the exposure inside the Vitrocell® 24/48, we verified that (I) the cigarette smoke aerosol distribution is uniform across all inserts, (II) the utility of Vitrocell® crystal quartz microbalances for determining the online deposition of particle mass on the inserts, and (III) the amount of particles deposited per surface area and the amounts of trapped carbonyls and nicotine were concentration dependent. At a fixed dilution airflow of 0.5 L/min, the results showed a coefficient of variation of 12.2% between inserts of the Vitrocell® 24/48 module, excluding variations caused by different runs. Although nicotine and carbonyl concentrations were linear over the tested dilution range, particle mass deposition increased nonlinearly. The observed effect on cell viability was well-correlated with increasing concentration of cigarette smoke.
Overall, the obtained results highlight the suitability of the Vitrocell® 24/48 system to assess the effect of cigarette smoke on cells under air–liquid interface exposure conditions, which is closely related to the conditions occurring in human airways.
PMCID: PMC4236458  PMID: 25411580
Air–liquid interface; Cigarette smoke; Nicotine; Carbonyl; In vitro exposure system; Vitrocell®
3.  Characterization of the Vitrocell® 24/48 in vitro aerosol exposure system using mainstream cigarette smoke 
Only a few exposure systems are presently available that enable cigarette smoke exposure of living cells at the air–liquid interface, of which one of the most versatile is the Vitrocell® system (Vitrocell® Systems GmbH). To assess its performance and optimize the exposure conditions, we characterized a Vitrocell® 24/48 system connected to a 30-port carousel smoking machine. The Vitrocell® 24/48 system allows for simultaneous exposure of 48 cell culture inserts using dilution airflow rates of 0–3.0 L/min and exposes six inserts per dilution. These flow rates represent cigarette smoke concentrations of 7–100%.
By characterizing the exposure inside the Vitrocell® 24/48, we verified that (I) the cigarette smoke aerosol distribution is uniform across all inserts, (II) the utility of Vitrocell® crystal quartz microbalances for determining the online deposition of particle mass on the inserts, and (III) the amount of particles deposited per surface area and the amounts of trapped carbonyls and nicotine were concentration dependent. At a fixed dilution airflow of 0.5 L/min, the results showed a coefficient of variation of 12.2% between inserts of the Vitrocell® 24/48 module, excluding variations caused by different runs. Although nicotine and carbonyl concentrations were linear over the tested dilution range, particle mass deposition increased nonlinearly. The observed effect on cell viability was well-correlated with increasing concentration of cigarette smoke.
Overall, the obtained results highlight the suitability of the Vitrocell® 24/48 system to assess the effect of cigarette smoke on cells under air–liquid interface exposure conditions, which is closely related to the conditions occurring in human airways.
PMCID: PMC4236458  PMID: 25411580
Air–liquid interface; Cigarette smoke; Nicotine; Carbonyl; In vitro exposure system; Vitrocell®
4.  Proteomics for systems toxicology 
Current toxicology studies frequently lack measurements at molecular resolution to enable a more mechanism-based and predictive toxicological assessment. Recently, a systems toxicology assessment framework has been proposed, which combines conventional toxicological assessment strategies with system-wide measurement methods and computational analysis approaches from the field of systems biology. Proteomic measurements are an integral component of this integrative strategy because protein alterations closely mirror biological effects, such as biological stress responses or global tissue alterations. Here, we provide an overview of the technical foundations and highlight select applications of proteomics for systems toxicology studies. With a focus on mass spectrometry-based proteomics, we summarize the experimental methods for quantitative proteomics and describe the computational approaches used to derive biological/mechanistic insights from these datasets. To illustrate how proteomics has been successfully employed to address mechanistic questions in toxicology, we summarized several case studies. Overall, we provide the technical and conceptual foundation for the integration of proteomic measurements in a more comprehensive systems toxicology assessment framework. We conclude that, owing to the critical importance of protein-level measurements and recent technological advances, proteomics will be an integral part of integrative systems toxicology approaches in the future.
PMCID: PMC4212285  PMID: 25379146
Systems toxicology; Quantitative proteomics; Computational analysis
5.  Quantification of biological network perturbations for mechanistic insight and diagnostics using two-layer causal models 
BMC Bioinformatics  2014;15:238.
High-throughput measurement technologies such as microarrays provide complex datasets reflecting mechanisms perturbed in an experiment, typically a treatment vs. control design. Analysis of these information rich data can be guided based on a priori knowledge, such as networks or set of related proteins or genes. Among those, cause-and-effect network models are becoming increasingly popular and more than eighty such models, describing processes involved in cell proliferation, cell fate, cell stress, and inflammation have already been published. A meaningful systems toxicology approach to study the response of a cell system, or organism, exposed to bio-active substances requires a quantitative measure of dose-response at network level, to go beyond the differential expression of single genes.
We developed a method that quantifies network response in an interpretable manner. It fully exploits the (signed graph) structure of cause-and-effect networks models to integrate and mine transcriptomics measurements. The presented approach also enables the extraction of network-based signatures for predicting a phenotype of interest. The obtained signatures are coherent with the underlying network perturbation and can lead to more robust predictions across independent studies. The value of the various components of our mathematically coherent approach is substantiated using several in vivo and in vitro transcriptomics datasets. As a proof-of-principle, our methodology was applied to unravel mechanisms related to the efficacy of a specific anti-inflammatory drug in patients suffering from ulcerative colitis. A plausible mechanistic explanation of the unequal efficacy of the drug is provided. Moreover, by utilizing the underlying mechanisms, an accurate and robust network-based diagnosis was built to predict the response to the treatment.
The presented framework efficiently integrates transcriptomics data and “cause and effect” network models to enable a mathematically coherent framework from quantitative impact assessment and data interpretation to patient stratification for diagnosis purposes.
PMCID: PMC4227138  PMID: 25015298
Systems biology; Causal network model; Transcriptomics data
6.  CSEO – the Cigarette Smoke Exposure Ontology 
In the past years, significant progress has been made to develop and use experimental settings for extensive data collection on tobacco smoke exposure and tobacco smoke exposure-associated diseases. Due to the growing number of such data, there is a need for domain-specific standard ontologies to facilitate the integration of tobacco exposure data.
The CSEO (version 1.0) is composed of 20091 concepts. The ontology in its current form is able to capture a wide range of cigarette smoke exposure concepts within the knowledge domain of exposure science with a reasonable sensitivity and specificity. Moreover, it showed a promising performance when used to answer domain expert questions. The CSEO complies with standard upper-level ontologies and is freely accessible to the scientific community through a dedicated wiki at
The CSEO has potential to become a widely used standard within the academic and industrial community. Mainly because of the emerging need of systems toxicology to controlled vocabularies and also the lack of suitable ontologies for this domain, the CSEO prepares the ground for integrative systems-based research in the exposure science.
PMCID: PMC4120729  PMID: 25093069
Exposure; Cigarette smoke; Environmental risk; Ontology; Knowledge representation
7.  A vascular biology network model focused on inflammatory processes to investigate atherogenesis and plaque instability 
Numerous inflammation-related pathways have been shown to play important roles in atherogenesis. Rapid and efficient assessment of the relative influence of each of those pathways is a challenge in the era of “omics” data generation. The aim of the present work was to develop a network model of inflammation-related molecular pathways underlying vascular disease to assess the degree of translatability of preclinical molecular data to the human clinical setting.
We constructed and evaluated the Vascular Inflammatory Processes Network (V-IPN), a model representing a collection of vascular processes modulated by inflammatory stimuli that lead to the development of atherosclerosis.
Utilizing the V-IPN as a platform for biological discovery, we have identified key vascular processes and mechanisms captured by gene expression profiling data from four independent datasets from human endothelial cells (ECs) and human and murine intact vessels. Primary ECs in culture from multiple donors revealed a richer mapping of mechanisms identified by the V-IPN compared to an immortalized EC line. Furthermore, an evaluation of gene expression datasets from aortas of old ApoE-/- mice (78 weeks) and human coronary arteries with advanced atherosclerotic lesions identified significant commonalities in the two species, as well as several mechanisms specific to human arteries that are consistent with the development of unstable atherosclerotic plaques.
We have generated a new biological network model of atherogenic processes that demonstrates the power of network analysis to advance integrative, systems biology-based knowledge of cross-species translatability, plaque development and potential mechanisms leading to plaque instability.
PMCID: PMC4227037  PMID: 24965703
Vascular systems biology; Plaque destabilization; Vascular biology networks; Computational modeling; Atherosclerosis modeling
8.  Assessment of a novel multi-array normalization method based on spike-in control probes suitable for microRNA datasets with global decreases in expression 
BMC Research Notes  2014;7:302.
High-quality expression data are required to investigate the biological effects of microRNAs (miRNAs). The goal of this study was, first, to assess the quality of miRNA expression data based on microarray technologies and, second, to consolidate it by applying a novel normalization method. Indeed, because of significant differences in platform designs, miRNA raw data cannot be normalized blindly with standard methods developed for gene expression. This fundamental observation motivated the development of a novel multi-array normalization method based on controllable assumptions, which uses the spike-in control probes to adjust the measured intensities across arrays.
Raw expression data were obtained with the Exiqon dual-channel miRCURY LNA™ platform in the “common reference design” and processed as “pseudo-single-channel”. They were used to apply several quality metrics based on the coefficient of variation and to test the novel spike-in controls based normalization method. Most of the considerations presented here could be applied to raw data obtained with other platforms. To assess the normalization method, it was compared with 13 other available approaches from both data quality and biological outcome perspectives. The results showed that the novel multi-array normalization method reduced the data variability in the most consistent way. Further, the reliability of the obtained differential expression values was confirmed based on a quantitative reverse transcription–polymerase chain reaction experiment performed for a subset of miRNAs. The results reported here support the applicability of the novel normalization method, in particular to datasets that display global decreases in miRNA expression similarly to the cigarette smoke-exposed mouse lung dataset considered in this study.
Quality metrics to assess between-array variability were used to confirm that the novel spike-in controls based normalization method provided high-quality miRNA expression data suitable for reliable downstream analysis. The multi-array miRNA raw data normalization method was implemented in an R software package called ExiMiR and deposited in the Bioconductor repository.
PMCID: PMC4077261  PMID: 24886675
MicroRNA; Microarray; Spike-in controls; Normalization; Differential expression; Data quality metrics
9.  Systems Toxicology: From Basic Research to Risk Assessment 
Chemical Research in Toxicology  2014;27(3):314-329.
Systems Toxicology is the integration of classical toxicology with quantitative analysis of large networks of molecular and functional changes occurring across multiple levels of biological organization. Society demands increasingly close scrutiny of the potential health risks associated with exposure to chemicals present in our everyday life, leading to an increasing need for more predictive and accurate risk-assessment approaches. Developing such approaches requires a detailed mechanistic understanding of the ways in which xenobiotic substances perturb biological systems and lead to adverse outcomes. Thus, Systems Toxicology approaches offer modern strategies for gaining such mechanistic knowledge by combining advanced analytical and computational tools. Furthermore, Systems Toxicology is a means for the identification and application of biomarkers for improved safety assessments. In Systems Toxicology, quantitative systems-wide molecular changes in the context of an exposure are measured, and a causal chain of molecular events linking exposures with adverse outcomes (i.e., functional and apical end points) is deciphered. Mathematical models are then built to describe these processes in a quantitative manner. The integrated data analysis leads to the identification of how biological networks are perturbed by the exposure and enables the development of predictive mathematical models of toxicological processes. This perspective integrates current knowledge regarding bioanalytical approaches, computational analysis, and the potential for improved risk assessment.
PMCID: PMC3964730  PMID: 24446777
10.  Discovery of Emphysema Relevant Molecular Networks from an A/J Mouse Inhalation Study Using Reverse Engineering and Forward Simulation (REFS™) 
Chronic obstructive pulmonary disease (COPD) is a respiratory disorder caused by extended exposure of the airways to noxious stimuli, principally cigarette smoke (CS). The mechanisms through which COPD develops are not fully understood, though it is believed that the disease process includes a genetic component, as not all smokers develop COPD. To investigate the mechanisms that lead to the development of COPD/emphysema, we measured whole genome gene expression and several COPD-relevant biological endpoints in mouse lung tissue after exposure to two CS doses for various lengths of time. A novel and powerful method, Reverse Engineering and Forward Simulation (REFS™), was employed to identify key molecular drivers by integrating the gene expression data and four measured COPD-relevant endpoints (matrix metalloproteinase (MMP) activity, MMP-9 levels, tissue inhibitor of metalloproteinase-1 levels and lung weight). An ensemble of molecular networks was generated using REFS™, and simulations showed that it could successfully recover the measured experimental data for gene expression and COPD-relevant endpoints. The ensemble of networks was then employed to simulate thousands of in silico gene knockdown experiments. Thirty-three molecular key drivers for the above four COPD-relevant endpoints were therefore identified, with the majority shown to be enriched in inflammation and COPD.
PMCID: PMC3937248  PMID: 24596455
Bayesian network; chronic obstructive pulmonary disease (COPD); reverse engineering and forward simulation (REFS™)
11.  On Crowd-verification of Biological Networks 
Biological networks with a structured syntax are a powerful way of representing biological information generated from high density data; however, they can become unwieldy to manage as their size and complexity increase. This article presents a crowd-verification approach for the visualization and expansion of biological networks.
Web-based graphical interfaces allow visualization of causal and correlative biological relationships represented using Biological Expression Language (BEL). Crowdsourcing principles enable participants to communally annotate these relationships based on literature evidences. Gamification principles are incorporated to further engage domain experts throughout biology to gather robust peer-reviewed information from which relationships can be identified and verified.
The resulting network models will represent the current status of biological knowledge within the defined boundaries, here processes related to human lung disease. These models are amenable to computational analysis. For some period following conclusion of the challenge, the published models will remain available for continuous use and expansion by the scientific community.
PMCID: PMC3798292  PMID: 24151423
community curation; biological network models; reputation system; Biological Expression Language
12.  Systems Approaches Evaluating the Perturbation of Xenobiotic Metabolism in Response to Cigarette Smoke Exposure in Nasal and Bronchial Tissues 
BioMed Research International  2013;2013:512086.
Capturing the effects of exposure in a specific target organ is a major challenge in risk assessment. Exposure to cigarette smoke (CS) implicates the field of tissue injury in the lung as well as nasal and airway epithelia. Xenobiotic metabolism in particular becomes an attractive tool for chemical risk assessment because of its responsiveness against toxic compounds, including those present in CS. This study describes an efficient integration from transcriptomic data to quantitative measures, which reflect the responses against xenobiotics that are captured in a biological network model. We show here that our novel systems approach can quantify the perturbation in the network model of xenobiotic metabolism. We further show that this approach efficiently compares the perturbation upon CS exposure in bronchial and nasal epithelial cells in vivo samples obtained from smokers. Our observation suggests the xenobiotic responses in the bronchial and nasal epithelial cells of smokers were similar to those observed in their respective organotypic models exposed to CS. Furthermore, the results suggest that nasal tissue is a reliable surrogate to measure xenobiotic responses in bronchial tissue.
PMCID: PMC3808713  PMID: 24224167
13.  Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data 
BMC Genomics  2013;14:514.
High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).
To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.
Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from
PMCID: PMC3750322  PMID: 23895370
Gene expression; Contrast data; Gene set; Gene set enrichment; Omics; Microarray; Next-generation sequencing; Reproducible research system; Knowledge acquisition
14.  Systematic Verification of Upstream Regulators of a Computable Cellular Proliferation Network Model on Non-Diseased Lung Cells Using a Dedicated Dataset 
We recently constructed a computable cell proliferation network (CPN) model focused on lung tissue to unravel complex biological processes and their exposure-related perturbations from molecular profiling data. The CPN consists of edges and nodes representing upstream controllers of gene expression largely generated from transcriptomics datasets using Reverse Causal Reasoning (RCR). Here, we report an approach to biologically verify the correctness of upstream controller nodes using a specifically designed, independent lung cell proliferation dataset. Normal human bronchial epithelial cells were arrested at G1/S with a cell cycle inhibitor. Gene expression changes and cell proliferation were captured at different time points after release from inhibition. Gene set enrichment analysis demonstrated cell cycle response specificity via an overrepresentation of proliferation related gene sets. Coverage analysis of RCR-derived hypotheses returned statistical significance for cell cycle response specificity across the whole model as well as for the Growth Factor and Cell Cycle sub-network models.
PMCID: PMC3733638  PMID: 23926424
cell proliferation; biological network model; reverse causal reasoning
15.  A Modular Cell-Type Focused Inflammatory Process Network Model for Non-Diseased Pulmonary Tissue 
Exposure to environmental stressors such as cigarette smoke (CS) elicits a variety of biological responses in humans, including the induction of inflammatory responses. These responses are especially pronounced in the lung, where pulmonary cells sit at the interface between the body’s internal and external environments. We combined a literature survey with a computational analysis of multiple transcriptomic data sets to construct a computable causal network model (the Inflammatory Process Network (IPN)) of the main pulmonary inflammatory processes. The IPN model predicted decreased epithelial cell barrier defenses and increased mucus hypersecretion in human bronchial epithelial cells, and an attenuated pro-inflammatory (M1) profile in alveolar macrophages following exposure to CS, consistent with prior results. The IPN provides a comprehensive framework of experimentally supported pathways related to CS-induced pulmonary inflammation. The IPN is freely available to the scientific community as a resource with broad applicability to study the pathogenesis of pulmonary disease.
PMCID: PMC3700945  PMID: 23843693
inflammation; cigarette smoke; network model; gene expression; biological expression language (BEL); reverse causal reasoning (RCR)
16.  Construction of a Computable Network Model for DNA Damage, Autophagy, Cell Death, and Senescence 
Towards the development of a systems biology-based risk assessment approach for environmental toxicants, including tobacco products in a systems toxicology setting such as the “21st Century Toxicology”, we are building a series of computable biological network models specific to non-diseased pulmonary and cardiovascular cells/tissues which capture the molecular events that can be activated following exposure to environmental toxicants. Here we extend on previous work and report on the construction and evaluation of a mechanistic network model focused on DNA damage response and the four main cellular fates induced by stress: autophagy, apoptosis, necroptosis, and senescence. In total, the network consists of 34 sub-models containing 1052 unique nodes and 1538 unique edges which are supported by 1231 PubMed-referenced literature citations. Causal node-edge relationships are described using the Biological Expression Language (BEL), which allows for the semantic representation of life science relationships in a computable format. The Network is provided in .XGMML format and can be viewed using freely available network visualization software, such as Cytoscape.
PMCID: PMC3596057  PMID: 23515068
computable; network model; DNA damage; autophagy; apoptosis; necroptosis; senescence; Biological Expression Language (BEL)
18.  Assessment of network perturbation amplitudes by applying high-throughput data to causal biological networks 
BMC Systems Biology  2012;6:54.
High-throughput measurement technologies produce data sets that have the potential to elucidate the biological impact of disease, drug treatment, and environmental agents on humans. The scientific community faces an ongoing challenge in the analysis of these rich data sources to more accurately characterize biological processes that have been perturbed at the mechanistic level. Here, a new approach is built on previous methodologies in which high-throughput data was interpreted using prior biological knowledge of cause and effect relationships. These relationships are structured into network models that describe specific biological processes, such as inflammatory signaling or cell cycle progression. This enables quantitative assessment of network perturbation in response to a given stimulus.
Four complementary methods were devised to quantify treatment-induced activity changes in processes described by network models. In addition, companion statistics were developed to qualify significance and specificity of the results. This approach is called Network Perturbation Amplitude (NPA) scoring because the amplitudes of treatment-induced perturbations are computed for biological network models. The NPA methods were tested on two transcriptomic data sets: normal human bronchial epithelial (NHBE) cells treated with the pro-inflammatory signaling mediator TNFα, and HCT116 colon cancer cells treated with the CDK cell cycle inhibitor R547. Each data set was scored against network models representing different aspects of inflammatory signaling and cell cycle progression, and these scores were compared with independent measures of pathway activity in NHBE cells to verify the approach. The NPA scoring method successfully quantified the amplitude of TNFα-induced perturbation for each network model when compared against NF-κB nuclear localization and cell number. In addition, the degree and specificity to which CDK-inhibition affected cell cycle and inflammatory signaling were meaningfully determined.
The NPA scoring method leverages high-throughput measurements and a priori literature-derived knowledge in the form of network models to characterize the activity change for a broad collection of biological processes at high-resolution. Applications of this framework include comparative assessment of the biological impact caused by environmental factors, toxic substances, or drug treatments.
PMCID: PMC3433335  PMID: 22651900
19.  Industrial methodology for process verification in research (IMPROVER): toward systems biology verification 
Bioinformatics  2012;28(9):1193-1201.
Motivation: Analyses and algorithmic predictions based on high-throughput data are essential for the success of systems biology in academic and industrial settings. Organizations, such as companies and academic consortia, conduct large multi-year scientific studies that entail the collection and analysis of thousands of individual experiments, often over many physical sites and with internal and outsourced components. To extract maximum value, the interested parties need to verify the accuracy and reproducibility of data and methods before the initiation of such large multi-year studies. However, systematic and well-established verification procedures do not exist for automated collection and analysis workflows in systems biology which could lead to inaccurate conclusions.
Results: We present here, a review of the current state of systems biology verification and a detailed methodology to address its shortcomings. This methodology named ‘Industrial Methodology for Process Verification in Research’ or IMPROVER, consists on evaluating a research program by dividing a workflow into smaller building blocks that are individually verified. The verification of each building block can be done internally by members of the research program or externally by ‘crowd-sourcing’ to an interested community.
Implementation: This methodology could become the preferred choice to verify systems biology research workflows that are becoming increasingly complex and sophisticated in industrial and academic settings.
PMCID: PMC3338013  PMID: 22423044
20.  A computable cellular stress network model for non-diseased pulmonary and cardiovascular tissue 
BMC Systems Biology  2011;5:168.
Humans and other organisms are equipped with a set of responses that can prevent damage from exposure to a multitude of endogenous and environmental stressors. If these stress responses are overwhelmed, this can result in pathogenesis of diseases, which is reflected by an increased development of, e.g., pulmonary and cardiac diseases in humans exposed to chronic levels of environmental stress, including inhaled cigarette smoke (CS). Systems biology data sets (e.g., transcriptomics, phosphoproteomics, metabolomics) could enable comprehensive investigation of the biological impact of these stressors. However, detailed mechanistic networks are needed to determine which specific pathways are activated in response to different stressors and to drive the qualitative and eventually quantitative assessment of these data. A current limiting step in this process is the availability of detailed mechanistic networks that can be used as an analytical substrate.
We have built a detailed network model that captures the biology underlying the physiological cellular response to endogenous and exogenous stressors in non-diseased mammalian pulmonary and cardiovascular cells. The contents of the network model reflect several diverse areas of signaling, including oxidative stress, hypoxia, shear stress, endoplasmic reticulum stress, and xenobiotic stress, that are elicited in response to common pulmonary and cardiovascular stressors. We then tested the ability of the network model to identify the mechanisms that are activated in response to CS, a broad inducer of cellular stress. Using transcriptomic data from the lungs of mice exposed to CS, the network model identified a robust increase in the oxidative stress response, largely mediated by the anti-oxidant NRF2 pathways, consistent with previous reports on the impact of CS exposure in the mammalian lung.
The results presented here describe the construction of a cellular stress network model and its application towards the analysis of environmental stress using transcriptomic data. The proof-of-principle analysis described here, coupled with the future development of additional network models covering distinct areas of biology, will help to further clarify the integrated biological responses elicited by complex environmental stressors such as CS, in pulmonary and cardiovascular cells.
PMCID: PMC3224482  PMID: 22011616
21.  Construction of a computable cell proliferation network focused on non-diseased lung cells 
BMC Systems Biology  2011;5:105.
Critical to advancing the systems-level evaluation of complex biological processes is the development of comprehensive networks and computational methods to apply to the analysis of systems biology data (transcriptomics, proteomics/phosphoproteomics, metabolomics, etc.). Ideally, these networks will be specifically designed to capture the normal, non-diseased biology of the tissue or cell types under investigation, and can be used with experimentally generated systems biology data to assess the biological impact of perturbations like xenobiotics and other cellular stresses. Lung cell proliferation is a key biological process to capture in such a network model, given the pivotal role that proliferation plays in lung diseases including cancer, chronic obstructive pulmonary disease (COPD), and fibrosis. Unfortunately, no such network has been available prior to this work.
To further a systems-level assessment of the biological impact of perturbations on non-diseased mammalian lung cells, we constructed a lung-focused network for cell proliferation. The network encompasses diverse biological areas that lead to the regulation of normal lung cell proliferation (Cell Cycle, Growth Factors, Cell Interaction, Intra- and Extracellular Signaling, and Epigenetics), and contains a total of 848 nodes (biological entities) and 1597 edges (relationships between biological entities). The network was verified using four published gene expression profiling data sets associated with measured cell proliferation endpoints in lung and lung-related cell types. Predicted changes in the activity of core machinery involved in cell cycle regulation (RB1, CDKN1A, and MYC/MYCN) are statistically supported across multiple data sets, underscoring the general applicability of this approach for a network-wide biological impact assessment using systems biology data.
To the best of our knowledge, this lung-focused Cell Proliferation Network provides the most comprehensive connectivity map in existence of the molecular mechanisms regulating cell proliferation in the lung. The network is based on fully referenced causal relationships obtained from extensive evaluation of the literature. The computable structure of the network enables its application to the qualitative and quantitative evaluation of cell proliferation using systems biology data sets. The network is available for public use.
PMCID: PMC3160372  PMID: 21722388
22.  An ATP-gate controls tubulin binding by the tethered head of kinesin-1 
Science (New York, N.Y.)  2007;316(5821):120-123.
Kinesin-1 is a 2-headed molecular motor that walks along microtubules, with each step gated by ATP binding. Existing models for the gating mechanism propose a role for the microtubule lattice. We show that unpolymerised tubulin binds to kinesin-1, causing tubulin-activated ADP release. With no added nucleotide, each kinesin-1 dimer binds one tubulin heterodimer. In AMPPNP, a nonhydrolysable ATP analogue, each kinesin-1 dimer binds two tubulin heterodimers. The data reveal an ATP-gate that operates independently of the microtubule lattice, by ATP-dependent release of a steric or allosteric block on the tubulin binding site of the tethered kinesin-ADP head.
PMCID: PMC2504013  PMID: 17412962
23.  In vitro systems toxicology approach to investigate the effects of repeated cigarette smoke exposure on human buccal and gingival organotypic epithelial tissue cultures 
Toxicology Mechanisms and Methods  2014;24(7):470-487.
Smoking has been associated with diseases of the lung, pulmonary airways and oral cavity. Cytologic, genomic and transcriptomic changes in oral mucosa correlate with oral pre-neoplasia, cancer and inflammation (e.g. periodontitis). Alteration of smoking-related gene expression changes in oral epithelial cells is similar to that in bronchial and nasal epithelial cells. Using a systems toxicology approach, we have previously assessed the impact of cigarette smoke (CS) seen as perturbations of biological processes in human nasal and bronchial organotypic epithelial culture models. Here, we report our further assessment using in vitro human oral organotypic epithelium models. We exposed the buccal and gingival organotypic epithelial tissue cultures to CS at the air–liquid interface. CS exposure was associated with increased secretion of inflammatory mediators, induction of cytochrome P450s activity and overall weak toxicity in both tissues. Using microarray technology, gene-set analysis and a novel computational modeling approach leveraging causal biological network models, we identified CS impact on xenobiotic metabolism-related pathways accompanied by a more subtle alteration in inflammatory processes. Gene-set analysis further indicated that the CS-induced pathways in the in vitro buccal tissue models resembled those in the in vivo buccal biopsies of smokers from a published dataset. These findings support the translatability of systems responses from in vitro to in vivo and demonstrate the applicability of oral organotypical tissue models for an impact assessment of CS on various tissues exposed during smoking, as well as for impact assessment of reduced-risk products.
PMCID: PMC4219813  PMID: 25046638
Air–liquid interface; causal biological network model; oral keratinocytes; organotypic cultures; transcriptomics

Results 1-23 (23)