Search tips
Search criteria

Results 1-25 (1046032)

Clipboard (0)

Related Articles

1.  De-Convoluting the “Omics” for Organ Transplantation 
Purpose of review
The desire for biomarkers for diagnosis and prognosis of diseases has never been greater. With the availability of genome data and an increased availability of proteome data, the discovery of biomarkers has become increasingly feasible. This article reviews some recent applications of the many evolving “omic” technologies to organ transplantation.
Recent findings
With the advancement of many high throughput “omic” techniques such as genomics, metabolomics, antibiomics, peptidomics and proteomics, efforts have been made to understand potential mechanisms of specific graft injuries and develop novel biomarkers for acute rejection, chronic rejection, and operational tolerance.
The translation of potential biomarkers from the lab bench to the clinical bedside is not an easy task and will require the concerted effort of the immunologists, molecular biologists, transplantation specialists, geneticists, and experts in bioinformatics. Rigorous prospective validation studies will be needed using large sets of independent patient samples. The appropriate and timely exploitation of evolving “omic” technologies will lay the cornerstone for a new age of translational research for organ transplant monitoring.
PMCID: PMC2993238  PMID: 19644370
genomics; proteomics; organ transplant; biomarker; translational medicine
2.  Joint analysis of transcriptional and post- transcriptional brain tumor data: searching for emergent properties of cellular systems 
BMC Bioinformatics  2011;12:86.
Advances in biotechnology offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. However, to date, most computational and algorithmic efforts have been directed at mining data from each of these molecular levels (genomic, transcriptional, etc.) separately. In view of the rapid advances in technology (new generation sequencing, high-throughput proteomics) it is important to address the problem of analyzing these data as a whole, i.e. preserving the emergent properties that appear in the cellular system when all molecular levels are interacting. We analyzed one of the (currently) few datasets that provide both transcriptional and post-transcriptional data of the same samples to investigate the possibility to extract more information, using a joint analysis approach.
We use Factor Analysis coupled with pre-established knowledge as a theoretical base to achieve this goal. Our intention is to identify structures that contain information from both mRNAs and miRNAs, and that can explain the complexity of the data. Despite the small sample available, we can show that this approach permits identification of meaningful structures, in particular two polycistronic miRNA genes related to transcriptional activity and likely to be relevant in the discrimination between gliosarcomas and other brain tumors.
This suggests the need to develop methodologies to simultaneously mine information from different levels of biological organization, rather than linking separate analyses performed in parallel.
PMCID: PMC3078861  PMID: 21450054
3.  Reverse engineering biomolecular systems using −omic data: challenges, progress and opportunities 
Briefings in Bioinformatics  2012;13(4):430-445.
Recent advances in high-throughput biotechnologies have led to the rapid growing research interest in reverse engineering of biomolecular systems (REBMS). ‘Data-driven’ approaches, i.e. data mining, can be used to extract patterns from large volumes of biochemical data at molecular-level resolution while ‘design-driven’ approaches, i.e. systems modeling, can be used to simulate emergent system properties. Consequently, both data- and design-driven approaches applied to –omic data may lead to novel insights in reverse engineering biological systems that could not be expected before using low-throughput platforms. However, there exist several challenges in this fast growing field of reverse engineering biomolecular systems: (i) to integrate heterogeneous biochemical data for data mining, (ii) to combine top–down and bottom–up approaches for systems modeling and (iii) to validate system models experimentally. In addition to reviewing progress made by the community and opportunities encountered in addressing these challenges, we explore the emerging field of synthetic biology, which is an exciting approach to validate and analyze theoretical system models directly through experimental synthesis, i.e. analysis-by-synthesis. The ultimate goal is to address the present and future challenges in reverse engineering biomolecular systems (REBMS) using integrated workflow of data mining, systems modeling and synthetic biology.
PMCID: PMC3404400  PMID: 22833495
reverse engineering biological systems; high-throughput technology; –omic data; synthetic biology; analysis-by-synthesis
4.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? 
Briefings in Bioinformatics  2012;14(3):315-326.
In the Life Sciences ‘omics’ data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.
PMCID: PMC3659301  PMID: 22786785
Random Forest; variable importance; local importance; conditional relationships; variable interaction; proximity
5.  The KUPNetViz: a biological network viewer for multiple -omics datasets in kidney diseases 
BMC Bioinformatics  2013;14:235.
Constant technological advances have allowed scientists in biology to migrate from conventional single-omics to multi-omics experimental approaches, challenging bioinformatics to bridge this multi-tiered information. Ongoing research in renal biology is no exception. The results of large-scale and/or high throughput experiments, presenting a wealth of information on kidney disease are scattered across the web. To tackle this problem, we recently presented the KUPKB, a multi-omics data repository for renal diseases.
In this article, we describe KUPNetViz, a biological graph exploration tool allowing the exploration of KUPKB data through the visualization of biomolecule interactions. KUPNetViz enables the integration of multi-layered experimental data over different species, renal locations and renal diseases to protein-protein interaction networks and allows association with biological functions, biochemical pathways and other functional elements such as miRNAs. KUPNetViz focuses on the simplicity of its usage and the clarity of resulting networks by reducing and/or automating advanced functionalities present in other biological network visualization packages. In addition, it allows the extrapolation of biomolecule interactions across different species, leading to the formulations of new plausible hypotheses, adequate experiment design and to the suggestion of novel biological mechanisms. We demonstrate the value of KUPNetViz by two usage examples: the integration of calreticulin as a key player in a larger interaction network in renal graft rejection and the novel observation of the strong association of interleukin-6 with polycystic kidney disease.
The KUPNetViz is an interactive and flexible biological network visualization and exploration tool. It provides renal biologists with biological network snapshots of the complex integrated data of the KUPKB allowing the formulation of new hypotheses in a user friendly manner.
PMCID: PMC3725151  PMID: 23883183
6.  The Proteogenomic Path towards Biomarker Discovery 
Pediatric transplantation  2008;12(7):737-747.
The desire for biomarkers for diagnosis and prognosis of diseases has never been greater. With the availability of genome data and an increased availability of proteome data, the discovery of biomarkers has become increasingly feasible. However, the task is daunting and requires collaborations among researchers working in the fields of transplantation, immunology, genetics, molecular biology, biostatistics, and bioinformatics. With the advancement of high throughput omic techniques such as genomics and proteomics (collectively known as proteogenomics), efforts have been made to develop diagnostic tools from new and to-be discovered biomarkers. Yet biomarker validation, particularly in organ transplantation, remains challenging because of the lack of a true gold standard for diagnostic categories and analytical bottlenecks that face high-throughput data deconvolution. Even though microarray technique is relatively mature, proteomics is still growing with regards to data normalization and analysis methods. Study design, sample selection, and rigorous data analysis are the critical issues for biomarker discovery using high-throughout proteogenomic technologies that combine the use and strengths of both genomics and proteomics. In this review, we look into the current status and latest developments in the field of biomarker discovery using genomics and proteomics related to organ transplantation, with an emphasis on the evolution of proteomic technologies.
PMCID: PMC2574627  PMID: 18764911
Biomarker discovery; proteogenomics; genomics; proteomics; microarray; transplantation; acute rejection; peptidomics
7.  A Novel Information Retrieval Model for High-Throughput Molecular Medicine Modalities 
Cancer Informatics  2009;8:1-17.
Significant research has been devoted to predicting diagnosis, prognosis, and response to treatment using high-throughput assays. Rapid translation into clinical results hinges upon efficient access to up-to-date and high-quality molecular medicine modalities.
We first explain why this goal is inadequately supported by existing databases and portals and then introduce a novel semantic indexing and information retrieval model for clinical bioinformatics. The formalism provides the means for indexing a variety of relevant objects (e.g. papers, algorithms, signatures, datasets) and includes a model of the research processes that creates and validates these objects in order to support their systematic presentation once retrieved.
We test the applicability of the model by constructing proof-of-concept encodings and visual presentations of evidence and modalities in molecular profiling and prognosis of: (a) diffuse large B-cell lymphoma (DLBCL) and (b) breast cancer.
PMCID: PMC2664697  PMID: 19458790
information retrieval; molecular medicine; semantic model; clinical bioinformatics; predictive computational models
8.  Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput “omics” Data 
Advances in Bioinformatics  2010;2010:423589.
High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge.
PMCID: PMC2847380  PMID: 20369061
9.  Advances in Omics and Bioinformatics Tools for Systems Analyses of Plant Functions 
Plant and Cell Physiology  2011;52(12):2017-2038.
Omics and bioinformatics are essential to understanding the molecular systems that underlie various plant functions. Recent game-changing sequencing technologies have revitalized sequencing approaches in genomics and have produced opportunities for various emerging analytical applications. Driven by technological advances, several new omics layers such as the interactome, epigenome and hormonome have emerged. Furthermore, in several plant species, the development of omics resources has progressed to address particular biological properties of individual species. Integration of knowledge from omics-based research is an emerging issue as researchers seek to identify significance, gain biological insights and promote translational research. From these perspectives, we provide this review of the emerging aspects of plant systems research based on omics and bioinformatics analyses together with their associated resources and technological advances.
PMCID: PMC3233218  PMID: 22156726
Bioinformatics; Data integration; Genome-scale approach; Omics; Systems analysis
10.  Integrative Bioinformatics for Genomics and Proteomics 
Systems integration is becoming the driving force for 21st century biology. Researchers are systematically tackling gene functions and complex regulatory processes by studying organisms at different levels of organization, from genomes and transcriptomes to proteomes and interactomes. To fully realize the value of such high-throughput data requires advanced bioinformatics for integration, mining, comparative analysis, and functional interpretation. We are developing a bioinformatics research infrastructure that links data mining with text mining and network analysis in the systems biology context for biological network discovery. The system features include: (i) integration of over 100 molecular and omics databases, along with gene/protein ID mapping from disparate data sources; (ii) data mining and text mining capabilities for literature-based knowledge extraction; and (iii) interoperability with ontologies to capture functional properties of proteins and complexes. The system further connects with a data analysis pipeline for next-generation sequencing, linking genomics data to functional annotation. The integrative approach will reveal hidden interrelationships among the various components of the biological systems, allowing researchers to ask complex biological questions and gain better understanding of biological and disease processes, thereby facilitating target discovery.
PMCID: PMC3186664
11.  Integrative computational biology for cancer research 
Human Genetics  2011;130(4):465-481.
Over the past two decades, high-throughput (HTP) technologies such as microarrays and mass spectrometry have fundamentally changed clinical cancer research. They have revealed novel molecular markers of cancer subtypes, metastasis, and drug sensitivity and resistance. Some have been translated into the clinic as tools for early disease diagnosis, prognosis, and individualized treatment and response monitoring. Despite these successes, many challenges remain: HTP platforms are often noisy and suffer from false positives and false negatives; optimal analysis and successful validation require complex workflows; and great volumes of data are accumulating at a rapid pace. Here we discuss these challenges, and show how integrative computational biology can help diminish them by creating new software tools, analytical methods, and data standards.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-011-0983-z) contains supplementary material, which is available to authorized users.
PMCID: PMC3179275  PMID: 21691773
12.  CancerMA: a web-based tool for automatic meta-analysis of public cancer microarray data 
The identification of novel candidate markers is a key challenge in the development of cancer therapies. This can be facilitated by putting accessible and automated approaches analysing the current wealth of ‘omic’-scale data in the hands of researchers who are directly addressing biological questions. Data integration techniques and standardized, automated, high-throughput analyses are needed to manage the data available as well as to help narrow down the excessive number of target gene possibilities presented by modern databases and system-level resources. Here we present CancerMA, an online, integrated bioinformatic pipeline for automated identification of novel candidate cancer markers/targets; it operates by means of meta-analysing expression profiles of user-defined sets of biologically significant and related genes across a manually curated database of 80 publicly available cancer microarray datasets covering 13 cancer types. A simple-to-use web interface allows bioinformaticians and non-bioinformaticians alike to initiate new analyses as well as to view and retrieve the meta-analysis results. The functionality of CancerMA is shown by means of two validation datasets.
Database URL:
PMCID: PMC3522872  PMID: 23241162
13.  Omics-Based Molecular Target and Biomarker Identification 
Genomic, proteomic, and other omic-based approaches are now broadly used in biomedical research to facilitate the understanding of disease mechanisms and identification of molecular targets and biomarkers for therapeutic and diagnostic development. While the Omics technologies and bioinformatics tools for analyzing Omics data are rapidly advancing, the functional analysis and interpretation of the data remain challenging due to the inherent nature of the generally long workflows of Omics experiments. We adopt a strategy that emphasizes the use of curated knowledge resources coupled with expert-guided examination and interpretation of Omics data for the selection of potential molecular targets. We describe a downstream workflow and procedures for functional analysis that focus on biological pathways, from which molecular targets can be derived and proposed for experimental validation.
PMCID: PMC3742302  PMID: 21370102
Proteomics; Genomics; Bioinformatics; Biological pathways; Cell signaling; Databases; Molecular targets; Biomarkers
14.  Surface-enhanced laser desorption/ionization time-of-flight proteomic profiling of breast carcinomas identifies clinicopathologically relevant groups of patients similar to previously defined clusters from cDNA expression 
Microarray-based gene expression profiling represents a major breakthrough for understanding the molecular complexity of breast cancer. cDNA expression profiles cannot detect changes in activities that arise from post-translational modifications, however, and therefore do not provide a complete picture of all biologically important changes that occur in tumors. Additional opportunities to identify and/or validate molecular signatures of breast carcinomas are provided by proteomic approaches. Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) offers high-throughput protein profiling, leading to extraction of protein array data, calling for effective and appropriate use of bioinformatics and statistical tools.
Whole tissue lysates of 105 breast carcinomas were analyzed on IMAC 30 ProteinChip Arrays (Bio-Rad, Hercules, CA, USA) using the ProteinChip Reader Model PBS IIc (Bio-Rad) and Ciphergen ProteinChip software (Bio-Rad, Hercules, CA, USA). Cluster analysis of protein spectra was performed to identify protein patterns potentially related to established clinicopathological variables and/or tumor markers.
Unsupervised hierarchical clustering of 130 peaks detected in spectra from breast cancer tissue lysates provided six clusters of peaks and five groups of patients differing significantly in tumor type, nuclear grade, presence of hormonal receptors, mucin 1 and cytokeratin 5/6 or cytokeratin 14. These tumor groups resembled closely luminal types A and B, basal and HER2-like carcinomas.
Our results show similar clustering of tumors to those provided by cDNA expression profiles of breast carcinomas. This fact testifies the validity of the SELDI-TOF MS proteomic approach in such a type of study. As SELDI-TOF MS provides different information from cDNA expression profiles, the results suggest the technique's potential to supplement and expand our knowledge of breast cancer, to identify novel biomarkers and to produce clinically useful classifications of breast carcinomas.
PMCID: PMC2481497  PMID: 18510725
15.  Systems approaches to computational modeling of the oral microbiome 
Current microbiome research has generated tremendous amounts of data providing snapshots of molecular activity in a variety of organisms, environments, and cell types. However, turning this knowledge into whole system level of understanding on pathways and processes has proven to be a challenging task. In this review we highlight the applicability of bioinformatics and visualization techniques to large collections of data in order to better understand the information that contains related diet—oral microbiome—host mucosal transcriptome interactions. In particular, we focus on systems biology of Porphyromonas gingivalis in the context of high throughput computational methods tightly integrated with translational systems medicine. Those approaches have applications for both basic research, where we can direct specific laboratory experiments in model organisms and cell cultures, and human disease, where we can validate new mechanisms and biomarkers for prevention and treatment of chronic disorders.
PMCID: PMC3706740  PMID: 23847548
oral microbiome; in silico modeling; systems biology; systems medicine; Porphyromonas gingivalis; biomarkers; probiotics; vaccines
16.  Criteria for the use of omics-based predictors in clinical trials: explanation and elaboration 
BMC Medicine  2013;11:220.
High-throughput ?omics? technologies that generate molecular profiles for biospecimens have been extensively used in preclinical studies to reveal molecular subtypes and elucidate the biological mechanisms of disease, and in retrospective studies on clinical specimens to develop mathematical models to predict clinical endpoints. Nevertheless, the translation of these technologies into clinical tests that are useful for guiding management decisions for patients has been relatively slow. It can be difficult to determine when the body of evidence for an omics-based test is sufficiently comprehensive and reliable to support claims that it is ready for clinical use, or even that it is ready for definitive evaluation in a clinical trial in which it may be used to direct patient therapy. Reasons for this difficulty include the exploratory and retrospective nature of many of these studies, the complexity of these assays and their application to clinical specimens, and the many potential pitfalls inherent in the development of mathematical predictor models from the very high-dimensional data generated by these omics technologies. Here we present a checklist of criteria to consider when evaluating the body of evidence supporting the clinical use of a predictor to guide patient therapy. Included are issues pertaining to specimen and assay requirements, the soundness of the process for developing predictor models, expectations regarding clinical study design and conduct, and attention to regulatory, ethical, and legal issues. The proposed checklist should serve as a useful guide to investigators preparing proposals for studies involving the use of omics-based tests. The US National Cancer Institute plans to refer to these guidelines for review of proposals for studies involving omics tests, and it is hoped that other sponsors will adopt the checklist as well.
PMCID: PMC3852338  PMID: 24228635
Analytical validation; Biomarker; Diagnostic test; Genomic classifier; Model validation; Molecular profile; Omics; Personalized medicine; Precision Medicine; Treatment selection
17.  Applications of emerging molecular technologies in glioblastoma multiforme 
Expert review of neurotherapeutics  2008;8(10):1497-1506.
Glioblastoma multimorme (GBM) is the most common primary brain tumor in adults. Median survival from the time of diagnosis is less than a year, with less than 5% of patients surviving 5 years. These tumors are thought to arise through two different pathways. Primary GBMs represent de novo tumors, while secondary GBMs represent the malignant progression of lower-grade astrocytomas. Moreover, despite improvements in deciphering the complex biology of these tumors, the overall prognosis has not changed in the past three decades. The hope for improving the outlook for these glial-based malignancies is centered on the successful clinical application of current high-throughput technologies. For example, the complete sequencing of the human genome has brought both genomics and proteomics to the forefront of cancer research as a powerful approach to systematically identify large volumes of data that can be utilized to study the molecular and cellular basis of oncology. The organization of these data into a comprehensive view of tumor growth and progression translates into a unique opportunity to diagnose and treat cancer patients. In this review, we summarize current genomic and proteomic alterations associated with GBM and how these modalities may ultimately impact treatment and survival.
PMCID: PMC2579778  PMID: 18928343
genomics; glioblastoma; molecular signatures; proteomics
18.  A proposed minimum skill set for university graduates to meet the informatics needs and challenges of the "-omics" era 
BMC Genomics  2009;10(Suppl 3):S36.
The development of high throughput experimental technologies have given rise to the "-omics" era where terabyte-scale datasets for systems-level measurements of various cellular and molecular phenomena pose considerable challenges in data processing and extraction of biological meaning. Moreover, it has created an unmet need for the effective integration of these datasets to achieve insights into biological systems. While it has increased the demand for bioinformatics experts who can interface with biologists, it has also raised the requirement for biologists to possess a basic capability in bioinformatics and to communicate seamlessly with these experts. This may be achieved by embedding in their undergraduate and graduate life science education, basic training in bioinformatics geared towards acquiring a minimum skill set in computation and informatics.
Based on previous attempts to define curricula suitable for addressing the bioinformatics capability gap, an initiative was taken during the Workshops on Education in Bioinformatics and Computational Biology (WEBCB) in 2008 and 2009 to identify a minimum skill set for the training of future bioinformaticians and molecular biologists with informatics capabilities. The minimum skill set proposed is cross-disciplinary in nature, involving a combination of knowledge and proficiency from the fields of biology, computer science, mathematics and statistics, and can be tailored to the needs of the "-omics".
The proposed bioinformatics minimum skill set serves as a guideline for biology curriculum design and development in universities at both the undergraduate and graduate levels.
PMCID: PMC2788390  PMID: 19958501
19.  G-DOC: A Systems Medicine Platform for Personalized Oncology1 
Neoplasia (New York, N.Y.)  2011;13(9):771-783.
Currently, cancer therapy remains limited by a “one-size-fits-all” approach, whereby treatment decisions are based mainly on the clinical stage of disease, yet fail to reference the individual's underlying biology and its role driving malignancy. Identifying better personalized therapies for cancer treatment is hindered by the lack of high-quality “omics” data of sufficient size to produce meaningful results and the ability to integrate biomedical data from disparate technologies. Resolving these issues will help translation of therapies from research to clinic by helping clinicians develop patient-specific treatments based on the unique signatures of patient's tumor. Here we describe the Georgetown Database of Cancer (G-DOC), a Web platform that enables basic and clinical research by integrating patient characteristics and clinical outcome data with a variety of high-throughput research data in a unified environment. While several rich data repositories for high-dimensional research data exist in the public domain, most focus on a single-data type and do not support integration across multiple technologies. Currently, G-DOC contains data from more than 2500 breast cancer patients and 800 gastrointestinal cancer patients, G-DOC includes a broad collection of bioinformatics and systems biology tools for analysis and visualization of four major “omics” types: DNA, mRNA, microRNA, and metabolites. We believe that G-DOC will help facilitate systems medicine by providing identification of trends and patterns in integrated data sets and hence facilitate the use of better targeted therapies for cancer. A set of representative usage scenarios is provided to highlight the technical capabilities of this resource.
PMCID: PMC3182270  PMID: 21969811
20.  Unravelling the hidden heterogeneities of diffuse large B-cell lymphoma based on coupled two-way clustering 
BMC Genomics  2007;8:332.
It becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity. Of vital importance for refined clinical practice and improved intervention strategies is to define the hidden molecular distinct diseases using modern large-scale genomic approaches. Microarray omics technology has provided a powerful way to dissect hidden genetic heterogeneity of complex diseases. The aim of this study was thus to develop a bioinformatics approach to seek the transcriptional features leading to the hidden subtyping of a complex clinical phenotype. The basic strategy of the proposed method was to iteratively partition in two ways sample and feature space with super-paramagnetic clustering technique and to seek for hard and robust gene clusters that lead to a natural partition of disease samples and that have the highest functionally conceptual consensus evaluated with Gene Ontology.
We applied the proposed method to two publicly available microarray datasets of diffuse large B-cell lymphoma (DLBCL), a notoriously heterogeneous phenotype. A feature subset of 30 genes (38 probes) derived from analysis of the first dataset consisting of 4026 genes and 42 DLBCL samples identified three categories of patients with very different five-year overall survival rates (70.59%, 44.44% and 14.29% respectively; p = 0.0017). Analysis of the second dataset consisting of 7129 genes and 58 DLBCL samples revealed a feature subset of 13 genes (16 probes) that not only replicated the findings of the important DLBCL genes (e.g. JAW1 and BCL7A), but also identified three clinically similar subtypes (with 5-year overall survival rates of 63.13%, 34.92% and 15.38% respectively; p = 0.0009) to those identified in the first dataset. Finally, we built a multivariate Cox proportional-hazards prediction model for each feature subset and defined JAW1 as one of the most significant predictor (p = 0.005 and 0.014; hazard ratios = 0.02 and 0.03, respectively for two datasets) for both DLBCL cohorts under study.
Our results showed that the proposed algorithm is a promising computational strategy for peeling off the hidden genetic heterogeneity based on transcriptionally profiling disease samples, which may lead to an improved diagnosis and treatment of cancers.
PMCID: PMC2082044  PMID: 17888167
21.  Integrative analysis and variable selection with multiple high-dimensional data sets 
Biostatistics (Oxford, England)  2011;12(4):763-775.
In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.
PMCID: PMC3169668  PMID: 21415015
High-dimensional data; Integrative analysis; 2-norm group bridge
22.  Chapter 1: Biomedical Knowledge Integration 
PLoS Computational Biology  2012;8(12):e1002826.
The modern biomedical research and healthcare delivery domains have seen an unparalleled increase in the rate of innovation and novel technologies over the past several decades. Catalyzed by paradigm-shifting public and private programs focusing upon the formation and delivery of genomic and personalized medicine, the need for high-throughput and integrative approaches to the collection, management, and analysis of heterogeneous data sets has become imperative. This need is particularly pressing in the translational bioinformatics domain, where many fundamental research questions require the integration of large scale, multi-dimensional clinical phenotype and bio-molecular data sets. Modern biomedical informatics theory and practice has demonstrated the distinct benefits associated with the use of knowledge-based systems in such contexts. A knowledge-based system can be defined as an intelligent agent that employs a computationally tractable knowledge base or repository in order to reason upon data in a targeted domain and reproduce expert performance relative to such reasoning operations. The ultimate goal of the design and use of such agents is to increase the reproducibility, scalability, and accessibility of complex reasoning tasks. Examples of the application of knowledge-based systems in biomedicine span a broad spectrum, from the execution of clinical decision support, to epidemiologic surveillance of public data sets for the purposes of detecting emerging infectious diseases, to the discovery of novel hypotheses in large-scale research data sets. In this chapter, we will review the basic theoretical frameworks that define core knowledge types and reasoning operations with particular emphasis on the applicability of such conceptual models within the biomedical domain, and then go on to introduce a number of prototypical data integration requirements and patterns relevant to the conduct of translational bioinformatics that can be addressed via the design and use of knowledge-based systems.
PMCID: PMC3531314  PMID: 23300416
23.  Bioinformatics insights into acute lung injury/acute respiratory distress syndrome 
Bioinformatics is the application of omics science, information technology, mathematics and statistics in the field of biomarker detection. Clinical bioinformatics can be applied for identification and validation of new biomarkers to improve current methods of monitoring disease activity and identify new therapeutic targets. Acute lung injurt (ALI)/Acute respiratory distress syndrome (ARDS) affects a large number of patients with a poor prognosis. The present review mainly focused on the progress in understanding disease heterogeneity through the use of evolving biological, genomic, and genetic approaches and the role of clinical bioinformatics in the pathogenesis and treatment of ALI/ARDS. The remarkable advances in clinical bioinformatics can be a new way for understanding disease pathogenesis, diagnosis and treatment.
PMCID: PMC3560991  PMID: 23369517
Acute lung injury; Acute respiratory distress syndrome; Genomics; Proteomics; Metabolomics; Bioinformatics
24.  A High-Dimensional, Deep-Sequencing Study of Lung Adenocarcinoma in Female Never-Smokers 
PLoS ONE  2013;8(2):e55596.
Deep sequencing techniques provide a remarkable opportunity for comprehensive understanding of tumorigenesis at the molecular level. As omics studies become popular, integrative approaches need to be developed to move from a simple cataloguing of mutations and changes in gene expression to dissecting the molecular nature of carcinogenesis at the systemic level and understanding the complex networks that lead to cancer development.
Here, we describe a high-throughput, multi-dimensional sequencing study of primary lung adenocarcinoma tumors and adjacent normal tissues of six Korean female never-smoker patients. Our data encompass results from exome-seq, RNA-seq, small RNA-seq, and MeDIP-seq. We identified and validated novel genetic aberrations, including 47 somatic mutations and 19 fusion transcripts. One of the fusions involves the c-RET gene, which was recently reported to form fusion genes that may function as drivers of carcinogenesis in lung cancer patients. We also characterized gene expression profiles, which we integrated with genomic aberrations and gene regulations into functional networks. The most prominent gene network module that emerged indicates that disturbances in G2/M transition and mitotic progression are causally linked to tumorigenesis in these patients. Also, results from the analysis strongly suggest that several novel microRNA-target interactions represent key regulatory elements of the gene network.
Our study not only provides an overview of the alterations occurring in lung adenocarcinoma at multiple levels from genome to transcriptome and epigenome, but also offers a model for integrative genomics analysis and proposes potential target pathways for the control of lung adenocarcinoma.
PMCID: PMC3566005  PMID: 23405175
25.  Combining differential expression, chromosomal and pathway analyses for the molecular characterization of renal cell carcinoma 
Canadian Urological Association Journal  2007;1(2 Suppl):S21-S27.
Using high-throughput gene-expression profiling technology, we can now gain a better understanding of the complex biology that is taking place in cancer cells. This complexity is largely dictated by the abnormal genetic makeup of the cancer cells. This abnormal genetic makeup can have profound effects on cellular activities such as cell growth, cell survival and other regulatory processes. Based on the pattern of gene expression, or molecular signatures of the tumours, we can distinguish or subclassify different types of cancers according to their cell of origin, behaviour, and the way they respond to therapeutic agents and radiation. These approaches will lead to better molecular subclassification of tumours, the basis of personalized medicine. We have, to date, done whole-genome microarray gene-expression profiling on several hundreds of kidney tumours. We adopt a combined bioinformatic approach, based on an integrative analysis of the gene-expression data. These data are used to identify both cytogenetic abnormalities and molecular pathways that are deregulated in renal cell carcinoma (RCC). For example, we have identified the deregulation of the VHL-hypoxia pathway in clear-cell RCC, as previously known, and the c-Myc pathway in aggressive papillary RCC. Besides the more common clear-cell, papillary and chromophobe RCCs, we are currently characterizing the molecular signatures of rarer forms of renal neoplasia such as carcinoma of the collecting ducts, mixed epithelial and stromal tumours, chromosome Xp11 translocations associated with papillary RCC, renal medullary carcinoma, mucinous tubular and spindle-cell carcinoma, and a group of unclassified tumours. Continued development and improvement in the field of molecular profiling will better characterize cancer and provide more accurate diagnosis, prognosis and prediction of drug response.
PMCID: PMC2422953  PMID: 18542781

Results 1-25 (1046032)