PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-18 (18)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  New threats to health data privacy 
BMC Bioinformatics  2011;12(Suppl 12):S7.
Background
Along with the rapid digitalization of health data (e.g. Electronic Health Records), there is an increasing concern on maintaining data privacy while garnering the benefits, especially when the data are required to be published for secondary use. Most of the current research on protecting health data privacy is centered around data de-identification and data anonymization, which removes the identifiable information from the published health data to prevent an adversary from reasoning about the privacy of the patients. However, published health data is not the only source that the adversaries can count on: with a large amount of information that people voluntarily share on the Web, sophisticated attacks that join disparate information pieces from multiple sources against health data privacy become practical. Limited efforts have been devoted to studying these attacks yet.
Results
We study how patient privacy could be compromised with the help of today’s information technologies. In particular, we show that private healthcare information could be collected by aggregating and associating disparate pieces of information from multiple online data sources including online social networks, public records and search engine results. We demonstrate a real-world case study to show user identity and privacy are highly vulnerable to the attribution, inference and aggregation attacks. We also show that people are highly identifiable to adversaries even with inaccurate information pieces about the target, with real data analysis.
Conclusion
We claim that too much information has been made available electronic and available online that people are very vulnerable without effective privacy protection.
doi:10.1186/1471-2105-12-S12-S7
PMCID: PMC3247088  PMID: 22168526
2.  SLDR: a computational technique to identify novel genetic regulatory relationships 
BMC Bioinformatics  2014;15(Suppl 11):S1.
We developed a new computational technique called Step-Level Differential Response (SLDR) to identify genetic regulatory relationships. Our technique takes advantages of functional genomics data for the same species under different perturbation conditions, therefore complementary to current popular computational techniques. It can particularly identify "rare" activation/inhibition relationship events that can be difficult to find in experimental results. In SLDR, we model each candidate target gene as being controlled by N binary-state regulators that lead to ≤2N observable states ("step-levels") for the target. We applied SLDR to the study of the GEO microarray data set GSE25644, which consists of 158 different mutant S. cerevisiae gene expressional profiles. For each target gene t, we first clustered ordered samples into various clusters, each approximating an observable step-level of t to screen out the "de-centric" target. Then, we ordered each gene x as a candidate regulator and aligned t to x for the purpose of examining the step-level correlations between low expression set of x (Ro) and high expression set of x (Rh) from the regulator x to t, by finding max f(t, x): |Ro-Rh| over all candidate × in the genome for each t. We therefore obtained activation and inhibitions events from different combinations of Ro and Rh. Furthermore, we developed criteria for filtering out less-confident regulators, estimated the number of regulators for each target t, and evaluated identified top-ranking regulator-target relationship. Our results can be cross-validated with the Yeast Fitness database. SLDR is also computationally efficient with o(N2) complexity. In summary, we believe SLDR can be applied to the mining of functional genomics big data for future network biology and network medicine applications.
doi:10.1186/1471-2105-15-S11-S1
PMCID: PMC4251037  PMID: 25350940
3.  Breast cancer subtyping from plasma proteins 
BMC Medical Genomics  2013;6(Suppl 1):S6.
Background
Early detection of breast cancer in blood is both appealing clinically and challenging technically due to the disease's illusive nature and heterogeneity. Today, even though major breast cancer subtypes have been characterized, i.e., luminal A, luminal B, HER2+, and basal-like, little is known about the heterogeneity of breast cancer in blood, which could help to discover minimally invasive protein biomarkers with which clinical researchers can detect, classify, and monitor different breast cancer subtypes.
Results
In this study, we performed an integrative pathway-assisted clustering analysis of breast cancer subtypes from plasma proteome samples collected from 80 patients diagnosed with breast cancer and 80 healthy women. First, four breast cancer subtypes and additionally unknown subtype (according to existing annotation) were determined based on pathology lab test results in primary tumors of enrolled patients. Next, we developed and applied four distance metrics, i.e., Protein Intensity, Q-Value, Pathway Profile, and Distance Score Function, to measure and characterize these cancer subtypes. Then, we developed a permutation test to evaluate the significant protein level changes in each biological pathway for each breast cancer subtype, using q-value. Lastly, we developed a pathway-protein matrix for each of the four distance methods to estimate the distance between breast cancer subtypes, for which further Pathway Association Network analysis were performed.
Conclusions
We found that 1) the luminal group (luminal A and luminal B) are clustered together, as well as the basal group (basal-like and HER2+) and 2) luminal A and luminal B are more close to each other than basal-like and HER2+ to each other. Our results were consistent with a recent independent breast cancer research from the Cancer Genome Atlas Network using genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our results showed that changes of different breast cancer subtypes at the pathway level are more profound and less variable than those at the molecular level. Similar subtypes share distinct yet similar pathway activation networks, while dissimilar subtypes are different also at the level of pathway activation networks. The results also showed that distance or similarity of cancer subtypes based on pathway analysis might be able to provide further insight into the intrinsic relationship of breast cancer subtypes. We believe integrative pathway-assisted proteomics analysis described here can become a model for reliable clustering or classification of other cancer subtypes.
doi:10.1186/1755-8794-6-S1-S6
PMCID: PMC3552699  PMID: 23369492
4.  Genome-wide meta-analysis of genetic susceptible genes for Type 2 Diabetes 
BMC Systems Biology  2012;6(Suppl 3):S16.
Background
Many genetic studies, including single gene studies and Genome-wide association studies (GWAS), aim to identify risk alleles for genetic diseases such as Type II Diabetes (T2D). However, in T2D studies, there is a significant amount of the hereditary risk that cannot be simply explained by individual risk genes. There is a need for developing systems biology approaches to integrate comprehensive genetic information and provide new insight on T2D biology.
Methods
We performed comprehensive integrative analysis of Single Nucleotide Polymorphisms (SNP's) individually curated from T2D GWAS results and mapped them to T2D candidate risk genes. Using protein-protein interaction data, we constructed a T2D-specific molecular interaction network consisting of T2D genetic risk genes and their interacting gene partners. We then studied the relationship between these T2D genes and curated gene sets.
Results
We determined that T2D candidate risk genes are concentrated in certain parts of the genome, specifically in chromosome 20. Using the T2D genetic network, we identified highly-interconnected network "hub" genes. By incorporating T2D GWAS results, T2D pathways, and T2D genes' functional category information, we further ranked T2D risk genes, T2D-related pathways, and T2D-related functional categories. We found that highly-interconnected T2D disease network “hub” genes most highly associated to T2D genetic risks to be PI3KR1, ESR1, and ENPP1. The well-characterized TCF7L2, contractor to our expectation, was not among the highest-ranked T2D gene list. Many interacted pathways play a role in T2D genetic risks, which includes insulin signalling pathway, type II diabetes pathway, maturity onset diabetes of the young, adipocytokine signalling pathway, and pathways in cancer. We also observed significant crosstalk among T2D gene subnetworks which include insulin secretion, regulation of insulin secretion, response to peptide hormone stimulus, response to insulin stimulus, peptide secretion, glucose homeostasis, and hormone transport. Overview maps involving T2D genes, gene sets, pathways, and their interactions are all reported.
Conclusions
Large-scale systems biology meta-analyses of GWAS results can improve interpretations of genetic variations and genetic risk factors. T2D genetic risks can be attributable to the summative genetic effects of many genes involved in a broad range of signalling pathways and functional networks. The framework developed for T2D studies may serve as a guide for studying other complex diseases.
doi:10.1186/1752-0509-6-S3-S16
PMCID: PMC3524015  PMID: 23281828
5.  C2Maps: a network pharmacology database with comprehensive disease-gene-drug connectivity relationships 
BMC Genomics  2012;13(Suppl 6):S17.
Background
Network pharmacology has emerged as a new topic of study in recent years. It aims to study the myriad relationships among proteins, drugs, and disease phenotypes. The concept of molecular connectivity maps has been proposed to establish comprehensive knowledge links between molecules of interest in a given biological context. Molecular connectivity maps between drugs and genes/proteins in specific disease contexts can be particularly valuable, since the functional approach with these maps helps researchers gain global perspectives on both the therapeutic profiles and toxicological profiles of candidate drugs.
Methods
To assess drug pharmacological effect, we assume that "ideal" drugs for a patient can treat or prevent the disease by modulating gene expression profiles of this patient to the similar level with those in healthy people. Starting from this hypothesis, we build comprehensive disease-gene-drug connectivity relationships with drug-protein directionality (inhibit/activate) information based on a computational connectivity maps (C2Maps) platform. An interactive interface for directionality annotation of drug-protein pairs with literature evidences from PubMed has been added to the new version of C2Maps. We also upload the curated directionality information of drug-protein pairs specific for three complex diseases - breast cancer, colorectal cancer and Alzheimer disease.
Results
For relevant drug-protein pairs with directionality information, we use breast cancer as a case study to demonstrate the functionality of disease-specific searching. Based on the results obtained from searching, we perform pharmacological effect evaluation for two important breast cancer drugs on treating patients diagnosed with different breast cancer subtypes. The evaluation is performed on a well-studied breast cancer gene expression microarray dataset to portray how useful the updated C2Maps is in assessing drug efficacy and toxicity information.
Conclusions
The C2Maps platform is an online bioinformatics resource that provides biologists with directional relationships between drugs and genes/proteins in specific disease contexts based on network mining, literature mining, and drug effect annotating. A new insight to assess overall drug efficacy and toxicity can be provided by using the C2Maps platform to identify disease relevant proteins and drugs. The case study on breast cancer correlates very well with the existing pharmacology of the two breast cancer drugs and highlights the significance of C2Maps database.
doi:10.1186/1471-2164-13-S6-S17
PMCID: PMC3481399  PMID: 23134618
6.  PAGED: a pathway and gene-set enrichment database to enable molecular phenotype discoveries 
BMC Bioinformatics  2012;13(Suppl 15):S2.
Background
Over the past decade, pathway and gene-set enrichment analysis has evolved into the study of high-throughput functional genomics. Owing to poorly annotated and incomplete pathway data, researchers have begun to combine pathway and gene-set enrichment analysis as well as network module-based approaches to identify crucial relationships between different molecular mechanisms.
Methods
To meet the new challenge of molecular phenotype discovery, in this work, we have developed an integrated online database, the Pathway And Gene Enrichment Database (PAGED), to enable comprehensive searches for disease-specific pathways, gene signatures, microRNA targets, and network modules by integrating gene-set-based prior knowledge as molecular patterns from multiple levels: the genome, transcriptome, post-transcriptome, and proteome.
Results
The online database we developed, PAGED http://bio.informatics.iupui.edu/PAGED is by far the most comprehensive public compilation of gene sets. In its current release, PAGED contains a total of 25,242 gene sets, 61,413 genes, 20 organisms, and 1,275,560 records from five major categories. Beyond its size, the advantage of PAGED lies in the explorations of relationships between gene sets as gene-set association networks (GSANs). Using colorectal cancer expression data analysis as a case study, we demonstrate how to query this database resource to discover crucial pathways, gene signatures, and gene network modules specific to colorectal cancer functional genomics.
Conclusions
This integrated online database lays a foundation for developing tools beyond third-generation pathway analysis approaches on for discovering molecular phenotypes, especially for disease-associated pathway/gene-set enrichment analysis.
doi:10.1186/1471-2105-13-S15-S2
PMCID: PMC3439733  PMID: 23046413
7.  Reordering based integrative expression profiling for microarray classification 
BMC Bioinformatics  2012;13(Suppl 2):S1.
Background
Current network-based microarray analysis uses the information of interactions among concerned genes/gene products, but still considers each gene expression individually. We propose an organized knowledge-supervised approach - Integrative eXpression Profiling (IXP), to improve microarray classification accuracy, and help discover groups of genes that have been too weak to detect individually by traditional ways. To implement IXP, ant colony optimization reordering (ACOR) algorithm is used to group functionally related genes in an ordered way.
Results
Using Alzheimer's disease (AD) as an example, we demonstrate how to apply ACOR-based IXP approach into microarray classifications. Using a microarray dataset - GSE1297 with 31 samples as training set, the result for the blinded classification on another microarray dataset - GSE5281 with 151 samples, shows that our approach can improve accuracy from 74.83% to 82.78%. A recently-published 1372-probe signature for AD can only achieve 61.59% accuracy in the same condition. The ACOR-based IXP approach also has better performance than the IXP approach based on classic network ranking, graph clustering, and random-ordering methods in an overall classification performance comparison.
Conclusions
The ACOR-based IXP approach can serve as a knowledge-supervised feature transformation approach to increase classification accuracy dramatically, by transforming each gene expression profile to an integrated expression files as features inputting into standard classifiers. The IXP approach integrates both gene expression information and organized knowledge - disease gene/protein network topology information, which is represented as both network node weights (local topological properties) and network node orders (global topological characteristics).
doi:10.1186/1471-2105-13-S2-S1
PMCID: PMC3583189  PMID: 22536860
8.  Predicting adverse side effects of drugs 
BMC Genomics  2011;12(Suppl 5):S11.
Background
Studies of toxicity and unintended side effects can lead to improved drug safety and efficacy. One promising form of study comes from molecular systems biology in the form of "systems pharmacology". Systems pharmacology combines data from clinical observation and molecular biology. This approach is new, however, and there are few examples of how it can practically predict adverse reactions (ADRs) from an experimental drug with acceptable accuracy.
Results
We have developed a new and practical computational framework to accurately predict ADRs of trial drugs. We combine clinical observation data with drug target data, protein-protein interaction (PPI) networks, and gene ontology (GO) annotations. We use cardiotoxicity, one of the major causes for drug withdrawals, as a case study to demonstrate the power of the framework. Our results show that an in silico model built on this framework can achieve a satisfactory cardiotoxicity ADR prediction performance (median AUC = 0.771, Accuracy = 0.675, Sensitivity = 0.632, and Specificity = 0.789). Our results also demonstrate the significance of incorporating prior knowledge, including gene networks and gene annotations, to improve future ADR assessments.
Conclusions
Biomolecular network and gene annotation information can significantly improve the predictive accuracy of ADR of drugs under development. The use of PPI networks can increase prediction specificity and the use of GO annotations can increase prediction sensitivity. Using cardiotoxicity as an example, we are able to further identify cardiotoxicity-related proteins among drug target expanding PPI networks. The systems pharmacology approach that we developed in this study can be generally applicable to all future developmental drug ADR assessments and predictions.
doi:10.1186/1471-2164-12-S5-S11
PMCID: PMC3287493  PMID: 22369493
9.  An integrated proteomics analysis of bone tissues in response to mechanical stimulation 
BMC Systems Biology  2011;5(Suppl 3):S7.
Bone cells can sense physical forces and convert mechanical stimulation conditions into biochemical signals that lead to expression of mechanically sensitive genes and proteins. However, it is still poorly understood how genes and proteins in bone cells are orchestrated to respond to mechanical stimulations. In this research, we applied integrated proteomics, statistical, and network biology techniques to study proteome-level changes to bone tissue cells in response to two different conditions, normal loading and fatigue loading. We harvested ulna midshafts and isolated proteins from the control, loaded, and fatigue loaded Rats. Using a label-free liquid chromatography tandem mass spectrometry (LC-MS/MS) experimental proteomics technique, we derived a comprehensive list of 1,058 proteins that are differentially expressed among normal loading, fatigue loading, and controls. By carefully developing protein selection filters and statistical models, we were able to identify 42 proteins representing 21 Rat genes that were significantly associated with bone cells' response to quantitative changes between normal loading and fatigue loading conditions. We further applied network biology techniques by building a fatigue loading activated protein-protein interaction subnetwork involving 9 of the human-homolog counterpart of the 21 rat genes in a large connected network component. Our study shows that the combination of decreased anti-apoptotic factor, Raf1, and increased pro-apoptotic factor, PDCD8, results in significant increase in the number of apoptotic osteocytes following fatigue loading. We believe controlling osteoblast differentiation/proliferation and osteocyte apoptosis could be promising directions for developing future therapeutic solutions for related bone diseases.
doi:10.1186/1752-0509-5-S3-S7
PMCID: PMC3287575  PMID: 22784626
Bone Stress; Biomarker Discovery; Pathway Analysis; Tandem Mass Spectrometry
10.  HOMER: a human organ-specific molecular electronic repository 
BMC Bioinformatics  2011;12(Suppl 10):S4.
Background
Each organ has a specific function in the body. “Organ-specificity” refers to differential expressions of the same gene across different organs. An organ-specific gene/protein is defined as a gene/protein whose expression is significantly elevated in a specific human organ. An “organ-specific marker” is defined as an organ-specific gene/protein that is also implicated in human diseases related to the organ. Previous studies have shown that identifying specificity for the organ in which a gene or protein is significantly differentially expressed, can lead to discovery of its function. Most currently available resources for organ-specific genes/proteins either allow users to access tissue-specific expression over a limited range of organs, or do not contain disease information such as disease-organ relationship and disease-gene relationship.
Results
We designed an integrated Human Organ-specific Molecular Electronic Repository (HOMER, http://bio.informatics.iupui.edu/homer), defining human organ-specific genes/proteins, based on five criteria: 1) comprehensive organ coverage; 2) gene/protein to disease association; 3) disease-organ association; 4) quantification of organ-specificity; and 5) cross-linking of multiple available data sources.
HOMER is a comprehensive database covering about 22,598 proteins, 52 organs, and 4,290 diseases integrated and filtered from organ-specific proteins/genes and disease databases like dbEST, TiSGeD, HPA, CTD, and Disease Ontology. The database has a Web-based user interface that allows users to find organ-specific genes/proteins by gene, protein, organ or disease, to explore the histogram of an organ-specific gene/protein, and to identify disease-related organ-specific genes by browsing the disease data online.
Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) an association analysis of organ-specific genes with disease and 2) a gene set enrichment analysis of organ-specific gene expression data.
Conclusions
HOMER is a new resource for analyzing, identifying, and characterizing organ-specific molecules in association with disease-organ and disease-gene relationships. The statistical method we developed for organ-specific gene identification can be applied to other organism. The current HOMER database can successfully answer a variety of questions related to organ specificity in human diseases and can help researchers in discovering and characterizing organ-specific genes/proteins with disease relevance.
doi:10.1186/1471-2105-12-S10-S4
PMCID: PMC3236847  PMID: 22165817
11.  Systems-Scale Analysis Reveals Pathways Involved in Cellular Response to Methamphetamine 
PLoS ONE  2011;6(4):e18215.
Background
Methamphetamine (METH), an abused illicit drug, disrupts many cellular processes, including energy metabolism, spermatogenesis, and maintenance of oxidative status. However, many components of the molecular underpinnings of METH toxicity have yet to be established. Network analyses of integrated proteomic, transcriptomic and metabolomic data are particularly well suited for identifying cellular responses to toxins, such as METH, which might otherwise be obscured by the numerous and dynamic changes that are induced.
Methodology/Results
We used network analyses of proteomic and transcriptomic data to evaluate pathways in Drosophila melanogaster that are affected by acute METH toxicity. METH exposure caused changes in the expression of genes involved with energy metabolism, suggesting a Warburg-like effect (aerobic glycolysis), which is normally associated with cancerous cells. Therefore, we tested the hypothesis that carbohydrate metabolism plays an important role in METH toxicity. In agreement with our hypothesis, we observed that increased dietary sugars partially alleviated the toxic effects of METH. Our systems analysis also showed that METH impacted genes and proteins known to be associated with muscular homeostasis/contraction, maintenance of oxidative status, oxidative phosphorylation, spermatogenesis, iron and calcium homeostasis. Our results also provide numerous candidate genes for the METH-induced dysfunction of spermatogenesis, which have not been previously characterized at the molecular level.
Conclusion
Our results support our overall hypothesis that METH causes a toxic syndrome that is characterized by the altered carbohydrate metabolism, dysregulation of calcium and iron homeostasis, increased oxidative stress, and disruption of mitochondrial functions.
doi:10.1371/journal.pone.0018215
PMCID: PMC3080363  PMID: 21533132
12.  Unraveling human complexity and disease with systems biology and personalized medicine 
Personalized medicine  2010;7(3):275-289.
We are all perplexed that current medical practice often appears maladroit in curing our individual illnesses or disease. However, as is often the case, a lack of understanding, tools and technologies are the root cause of such situations. Human individuality is an often-quoted term but, in the context of human biology, it is poorly understood. This is compounded when there is a need to consider the variability of human populations. In the case of the former, it is possible to quantify human complexity as determined by the 35,000 genes of the human genome, the 1–10 million proteins (including antibodies) and the 2000–3000 metabolites of the human metabolome. Human variability is much more difficult to assess, since many of the variables, such as the definition of race, are not even clearly agreed on. In order to accommodate human complexity, variability and its influence on health and disease, it is necessary to undertake a systematic approach. In the past decade, the emergence of analytical platforms and bioinformatics tools has led to the development of systems biology. Such an approach offers enormous potential in defining key pathways and networks involved in optimal human health, as well as disease onset, progression and treatment. The tools and technologies now available in systems biology analyses offer exciting opportunities to exploit the emerging areas of personalized medicine. In this article, we discuss the current status of human complexity, and how systems biology and personalized medicine can impact at the individual and population level.
doi:10.2217/pme.10.16
PMCID: PMC2888109  PMID: 20577569
complexity; human health; ’omics; personalized medicine; systems biology; variability
13.  A new approach to construct pathway connected networks and its application in dose responsive gene expression profiles of rat liver regulated by 2,4DNT 
BMC Genomics  2010;11(Suppl 3):S4.
Abstract
Background
Military and industrial activities have lead to reported release of 2,4-dinitrotoluene (2,4DNT) into soil, groundwater or surface water. It has been reported that 2,4DNT can induce toxic effects on humans and other organisms. However the mechanism of 2,4DNT induced toxicity is still unclear. Although a series of methods for gene network construction have been developed, few instances of applying such technology to generate pathway connected networks have been reported.
Results
Microarray analyses were conducted using liver tissue of rats collected 24h after exposure to a single oral gavage with one of five concentrations of 2,4DNT. We observed a strong dose response of differentially expressed genes after 2,4DNT treatment. The most affected pathways included: long term depression, breast cancer regulation by stathmin1, WNT Signaling; and PI3K signaling pathways. In addition, we propose a new approach to construct pathway connected networks regulated by 2,4DNT. We also observed clear dose response pathway networks regulated by 2,4DNT.
Conclusions
We developed a new method for constructing pathway connected networks. This new method was successfully applied to microarray data from liver tissue of 2,4DNT exposed animals and resulted in the identification of unique dose responsive biomarkers in regards to affected pathways.
doi:10.1186/1471-2164-11-S3-S4
PMCID: PMC2999349  PMID: 21143786
14.  Discovery of pathway biomarkers from coupled proteomics and systems biology methods 
BMC Genomics  2010;11(Suppl 2):S12.
Background
Breast cancer is worldwide the second most common type of cancer after lung cancer. Plasma proteome profiling may have a higher chance to identify protein changes between plasma samples such as normal and breast cancer tissues. Breast cancer cell lines have long been used by researches as model system for identifying protein biomarkers. A comparison of the set of proteins which change in plasma with previously published findings from proteomic analysis of human breast cancer cell lines may identify with a higher confidence a subset of candidate protein biomarker.
Results
In this study, we analyzed a liquid chromatography (LC) coupled tandem mass spectrometry (MS/MS) proteomics dataset from plasma samples of 40 healthy women and 40 women diagnosed with breast cancer. Using a two-sample t-statistics and permutation procedure, we identified 254 statistically significant, differentially expressed proteins, among which 208 are over-expressed and 46 are under-expressed in breast cancer plasma. We validated this result against previously published proteomic results of human breast cancer cell lines and signaling pathways to derive 25 candidate protein biomarkers in a panel. Using the pathway analysis, we observed that the 25 “activated” plasma proteins were present in several cancer pathways, including ‘Complement and coagulation cascades’, ‘Regulation of actin cytoskeleton’, and ‘Focal adhesion’, and match well with previously reported studies. Additional gene ontology analysis of the 25 proteins also showed that cellular metabolic process and response to external stimulus (especially proteolysis and acute inflammatory response) were enriched functional annotations of the proteins identified in the breast cancer plasma samples. By cross-validation using two additional proteomics studies, we obtained 86% and 83% similarities in pathway-protein matrix between the first study and the two testing studies, which is much better than the similarity we measured with proteins.
Conclusions
We presented a ‘systems biology’ method to identify, characterize, analyze and validate panel biomarkers in breast cancer proteomics data, which includes 1) t statistics and permutation process, 2) network, pathway and function annotation analysis, and 3) cross-validation of multiple studies. Our results showed that the systems biology approach is essential to the understanding molecular mechanisms of panel protein biomarkers.
doi:10.1186/1471-2164-11-S2-S12
PMCID: PMC2975409  PMID: 21047379
15.  PEPPI: a peptidomic database of human protein isoforms for proteomics experiments 
BMC Bioinformatics  2010;11(Suppl 6):S7.
Abstract
Background
Protein isoform generation, which may derive from alternative splicing, genetic polymorphism, and posttranslational modification, is an essential source of achieving molecular diversity by eukaryotic cells. Previous studies have shown that protein isoforms play critical roles in disease diagnosis, risk assessment, sub-typing, prognosis, and treatment outcome predictions. Understanding the types, presence, and abundance of different protein isoforms in different cellular and physiological conditions is a major task in functional proteomics, and may pave ways to molecular biomarker discovery of human diseases. In tandem mass spectrometry (MS/MS) based proteomics analysis, peptide peaks with exact matches to protein sequence records in the proteomics database may be identified with mass spectrometry (MS) search software. However, due to limited annotation and poor coverage of protein isoforms in proteomics databases, high throughput protein isoform identifications, particularly those arising from alternative splicing and genetic polymorphism, have not been possible.
Results
Therefore, we present the PEPtidomics Protein Isoform Database (PEPPI, http://bio.informatics.iupui.edu/peppi), a comprehensive database of computationally-synthesized human peptides that can identify protein isoforms derived from either alternatively spliced mRNA transcripts or SNP variations. We collected genome, pre-mRNA alternative splicing and SNP information from Ensembl. We synthesized in silico isoform transcripts that cover all exons and theoretically possible junctions of exons and introns, as well as all their variations derived from known SNPs. With three case studies, we further demonstrated that the database can help researchers discover and characterize new protein isoform biomarkers from experimental proteomics data.
Conclusions
We developed a new tool for the proteomics community to characterize protein isoforms from MS-based proteomics experiments. By cataloguing each peptide configurations in the PEPPI database, users can study genetic variations and alternative splicing events at the proteome level. They can also batch-download peptide sequences in FASTA format to search for MS/MS spectra derived from human samples. The database can help generate novel hypotheses on molecular risk factors and molecular mechanisms of complex diseases, leading to identification of potentially highly specific protein isoform biomarkers.
doi:10.1186/1471-2105-11-S6-S7
PMCID: PMC3026381  PMID: 20946618
16.  Tumor Necrosis Factor-α -and Interleukin-1-Induced Cellular Responses: Coupling Proteomic and Genomic Information 
Journal of proteome research  2007;6(6):2176-2185.
The pro-inflammatory cytokines, Tumor Necrosis Factor-alpha (TNFα) and Interleukin-1 (IL-1) mediate the innate immune response. Dysregulation of the innate immune response contributes to the pathogenesis of cancer, arthritis, and congestive heart failure. TNFα- and IL-1-induced changes in gene expression are mediated by similar transcription factors; however, TNFα and IL-1 receptor knock-out mice differ in their sensitivities to a known initiator (lipopolysaccharide, LPS) of the innate immune response. The contrasting responses to LPS indicate that TNFα and IL-1 regulate different processes. A large-scale proteomic analysis of TNFα- and IL-1-induced responses was undertaken to identify processes uniquely regulated by TNFα and IL-1. When combined with genomic studies, our results indicate that TNFα, but not IL-1, mediates cell cycle arrest.
doi:10.1021/pr060665l
PMCID: PMC2877378  PMID: 17503796
Tumor Necrosis Factor-alpha; Interleukin-1; Cell Cycle Arrest; Permeability Transition Pore; Protein Interaction Network; Mass Spectrometry; Microarray
17.  HPD: an online integrated human pathway database enabling systems biology studies 
BMC Bioinformatics  2009;10(Suppl 11):S5.
Background
Pathway-oriented experimental and computational studies have led to a significant accumulation of biological knowledge concerning three major types of biological pathway events: molecular signaling events, gene regulation events, and metabolic reaction events. A pathway consists of a series of molecular pathway events that link molecular entities such as proteins, genes, and metabolites. There are approximately 300 biological pathway resources as of April 2009 according to the Pathguide database; however, these pathway databases generally have poor coverage or poor quality, and are difficult to integrate, due to syntactic-level and semantic-level data incompatibilities.
Results
We developed the Human Pathway Database (HPD) by integrating heterogeneous human pathway data that are either curated at the NCI Pathway Interaction Database (PID), Reactome, BioCarta, KEGG or indexed from the Protein Lounge Web sites. Integration of pathway data at syntactic, semantic, and schematic levels was based on a unified pathway data model and data warehousing-based integration techniques. HPD provides a comprehensive online view that connects human proteins, genes, RNA transcripts, enzymes, signaling events, metabolic reaction events, and gene regulatory events. At the time of this writing HPD includes 999 human pathways and more than 59,341 human molecular entities. The HPD software provides both a user-friendly Web interface for online use and a robust relational database backend for advanced pathway querying. This pathway tool enables users to 1) search for human pathways from different resources by simply entering genes/proteins involved in pathways or words appearing in pathway names, 2) analyze pathway-protein association, 3) study pathway-pathway similarity, and 4) build integrated pathway networks. We demonstrated the usage and characteristics of the new HPD through three breast cancer case studies.
Conclusion
HPD http://bio.informatics.iupui.edu/HPD is a new resource for searching, managing, and studying human biological pathways. Users of HPD can search against large collections of human biological pathways, compare related pathways and their molecular entity compositions, and build high-quality, expanded-scope disease pathway models. The current HPD software can help users address a wide range of pathway-related questions in human disease biology studies.
doi:10.1186/1471-2105-10-S11-S5
PMCID: PMC3226194  PMID: 19811689
18.  ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining 
BMC Bioinformatics  2008;9(Suppl 9):S5.
Background
New systems biology studies require researchers to understand how interplay among myriads of biomolecular entities is orchestrated in order to achieve high-level cellular and physiological functions. Many software tools have been developed in the past decade to help researchers visually navigate large networks of biomolecular interactions with built-in template-based query capabilities. To further advance researchers' ability to interrogate global physiological states of cells through multi-scale visual network explorations, new visualization software tools still need to be developed to empower the analysis. A robust visual data analysis platform driven by database management systems to perform bi-directional data processing-to-visualizations with declarative querying capabilities is needed.
Results
We developed ProteoLens as a JAVA-based visual analytic software tool for creating, annotating and exploring multi-scale biological networks. It supports direct database connectivity to either Oracle or PostgreSQL database tables/views, on which SQL statements using both Data Definition Languages (DDL) and Data Manipulation languages (DML) may be specified. The robust query languages embedded directly within the visualization software help users to bring their network data into a visualization context for annotation and exploration. ProteoLens supports graph/network represented data in standard Graph Modeling Language (GML) formats, and this enables interoperation with a wide range of other visual layout tools. The architectural design of ProteoLens enables the de-coupling of complex network data visualization tasks into two distinct phases: 1) creating network data association rules, which are mapping rules between network node IDs or edge IDs and data attributes such as functional annotations, expression levels, scores, synonyms, descriptions etc; 2) applying network data association rules to build the network and perform the visual annotation of graph nodes and edges according to associated data values. We demonstrated the advantages of these new capabilities through three biological network visualization case studies: human disease association network, drug-target interaction network and protein-peptide mapping network.
Conclusion
The architectural design of ProteoLens makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations. ProteoLens is a promising visual analytic platform that will facilitate knowledge discoveries in future network and systems biology studies.
doi:10.1186/1471-2105-9-S9-S5
PMCID: PMC2537576  PMID: 18793469

Results 1-18 (18)